UDF bulk implementation with some arguments as scalars

Hi, When defining a UDF, e.g. CREATE FUNCTION myfunc(a int, b int) RETURNS int we can have: - a scalar implementation command myfunc(a:int, b:int) :int - a bulk implementation command myfunc(a:bat[:int], b:bat[:int]) :bat[:int] - a mixed implementation (e.g. a is a column, b is scalar): command myfunc(a:bat[:int], b:int) :bat[:int] I'm trying to understand what the restrictions are, if any, for the third case. Is it expected to be allowed in (a combination of) the following cases? - Interleaved bulk/scalar arguments command myfunc(a:bat[:int], b:int, c:bat[:int]) :bat[:int] - Table-returning function command myfunc(a:bat[:int], b:int) (:bat[:int], :bat[:int]) - Table-returning function, used with a sub-select in input SELECT * FROM myfunc( (SELECT a, 10 FROM mytable) ); In particular I can't get the last one to work. Even though the sub-select has a constant as the second column, this constant seems to be first replicated into a full column and then given to myfunc() as a column. When the constant parameters are a few, that becomes a waste of time/space. Plus the annoyance of selecting the first value of a constant column for the actual internal implementation, which of course expects a scalar for those arguments. Am I missing something? Is there a way to make that work as I would expect? Thanks, Roberto

Hi Roberto, I would argue that replicating the constant is the correct behavior. With SELECT a, 10 FROM mytable you are actually requesting two columns, it just so happens that the second column has a constant value. In general a SQL SELECT statement returns a SQL relation. It *has* to replicate the constant, otherwise the result would not make sense. Would it work if you called your UDF like as follows? SELECT * FROM myfunc((SELECT a FROM mytable), 10); Best regards, Panos. On 11/4/19 12:08 PM, Roberto Cornacchia wrote:
Hi,
When defining a UDF, e.g. CREATE FUNCTION myfunc(a int, b int) RETURNS int we can have:
- a scalar implementation command myfunc(a:int, b:int) :int
- a bulk implementation command myfunc(a:bat[:int], b:bat[:int]) :bat[:int]
- a mixed implementation (e.g. a is a column, b is scalar): command myfunc(a:bat[:int], b:int) :bat[:int]
I'm trying to understand what the restrictions are, if any, for the third case. Is it expected to be allowed in (a combination of) the following cases?
- Interleaved bulk/scalar arguments command myfunc(a:bat[:int], b:int, c:bat[:int]) :bat[:int]
- Table-returning function command myfunc(a:bat[:int], b:int) (:bat[:int], :bat[:int])
- Table-returning function, used with a sub-select in input SELECT * FROM myfunc( (SELECT a, 10 FROM mytable) );
In particular I can't get the last one to work. Even though the sub-select has a constant as the second column, this constant seems to be first replicated into a full column and then given to myfunc() as a column.
When the constant parameters are a few, that becomes a waste of time/space. Plus the annoyance of selecting the first value of a constant column for the actual internal implementation, which of course expects a scalar for those arguments.
Am I missing something? Is there a way to make that work as I would expect?
Thanks, Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hi both, I agree that the logical view should be one of a column with many duplications of the constant. However, MonetDB used to never create this column physically, but instead create a view. (Not sure how it works today!) I suppose that Roberto wonders about the physical level query plan, i.e., why generate a column with the value replicated N times, instead of handling over to the UDF a view representing the same solution. Greetings, Arjen On Mon, 4 Nov 2019 at 13:13, Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Roberto,
I would argue that replicating the constant is the correct behavior. With
SELECT a, 10 FROM mytable
you are actually requesting two columns, it just so happens that the second column has a constant value. In general a SQL SELECT statement returns a SQL relation. It *has* to replicate the constant, otherwise the result would not make sense.
Would it work if you called your UDF like as follows?
SELECT * FROM myfunc((SELECT a FROM mytable), 10);
Best regards, Panos.
On 11/4/19 12:08 PM, Roberto Cornacchia wrote:
Hi,
When defining a UDF, e.g. CREATE FUNCTION myfunc(a int, b int) RETURNS int we can have:
- a scalar implementation command myfunc(a:int, b:int) :int
- a bulk implementation command myfunc(a:bat[:int], b:bat[:int]) :bat[:int]
- a mixed implementation (e.g. a is a column, b is scalar): command myfunc(a:bat[:int], b:int) :bat[:int]
I'm trying to understand what the restrictions are, if any, for the third case. Is it expected to be allowed in (a combination of) the following cases?
- Interleaved bulk/scalar arguments command myfunc(a:bat[:int], b:int, c:bat[:int]) :bat[:int]
- Table-returning function command myfunc(a:bat[:int], b:int) (:bat[:int], :bat[:int])
- Table-returning function, used with a sub-select in input SELECT * FROM myfunc( (SELECT a, 10 FROM mytable) );
In particular I can't get the last one to work. Even though the sub-select has a constant as the second column, this constant seems to be first replicated into a full column and then given to myfunc() as a column.
When the constant parameters are a few, that becomes a waste of time/space. Plus the annoyance of selecting the first value of a constant column for the actual internal implementation, which of course expects a scalar for those arguments.
Am I missing something? Is there a way to make that work as I would expect?
Thanks, Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- ==================================================================== ICIS, office M1.02.07 Radboud University Mercator 1 Faculty of Science Toernooiveld 212 arjen@cs.ru.nl NL-6525 EC Nijmegen, The Netherlands +31-(0)24-365 2354 ===================== http://www.informagus.nl/ ====================

Hi Panos, Thanks for your input. As Arjen pointed out, I completely understand why MonetDB would replicate a constant into a column, at least from a logical point of view or as a default physical point of view. However, this is not the only route. Take for example function substring(start,index) It has two implementations: command batstr.substring( s:bat[:str], start:bat[:int], index:bat[:int]) :bat[:str] address STRbatsubstring; command batstr.substring( s:bat[:str], start:int, index:int):bat[:str] address STRbatsubstringcst; When we call it from SQL with select substring(a, 0,5) from myTable then it can directly use the second implementation above, instead replicating the constant parameters to fit the first implementation. Best, Roberto On Mon, 4 Nov 2019 at 13:15, Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Roberto,
I would argue that replicating the constant is the correct behavior. With
SELECT a, 10 FROM mytable
you are actually requesting two columns, it just so happens that the second column has a constant value. In general a SQL SELECT statement returns a SQL relation. It *has* to replicate the constant, otherwise the result would not make sense.
Would it work if you called your UDF like as follows?
SELECT * FROM myfunc((SELECT a FROM mytable), 10);
Best regards, Panos.
On 11/4/19 12:08 PM, Roberto Cornacchia wrote:
Hi,
When defining a UDF, e.g. CREATE FUNCTION myfunc(a int, b int) RETURNS int we can have:
- a scalar implementation command myfunc(a:int, b:int) :int
- a bulk implementation command myfunc(a:bat[:int], b:bat[:int]) :bat[:int]
- a mixed implementation (e.g. a is a column, b is scalar): command myfunc(a:bat[:int], b:int) :bat[:int]
I'm trying to understand what the restrictions are, if any, for the third case. Is it expected to be allowed in (a combination of) the following cases?
- Interleaved bulk/scalar arguments command myfunc(a:bat[:int], b:int, c:bat[:int]) :bat[:int]
- Table-returning function command myfunc(a:bat[:int], b:int) (:bat[:int], :bat[:int])
- Table-returning function, used with a sub-select in input SELECT * FROM myfunc( (SELECT a, 10 FROM mytable) );
In particular I can't get the last one to work. Even though the sub-select has a constant as the second column, this constant seems to be first replicated into a full column and then given to myfunc() as a column.
When the constant parameters are a few, that becomes a waste of time/space. Plus the annoyance of selecting the first value of a constant column for the actual internal implementation, which of course expects a scalar for those arguments.
Am I missing something? Is there a way to make that work as I would expect?
Thanks, Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

The MAL signatures are ordered and the first one matching is used. Such ordering is not enforced at the SQL function layer. In general it would require a type resolution that either works on matching or you have to encode the order of resolution. On 04/11/2019 15:14, Roberto Cornacchia wrote:
Hi Panos,
Thanks for your input.
As Arjen pointed out, I completely understand why MonetDB would replicate a constant into a column, at least from a logical point of view or as a default physical point of view.
However, this is not the only route. Take for example function substring(start,index) It has two implementations:
command batstr.substring( s:bat[:str], start:bat[:int], index:bat[:int]) :bat[:str] address STRbatsubstring;
command batstr.substring( s:bat[:str], start:int, index:int):bat[:str] address STRbatsubstringcst;
When we call it from SQL with select substring(a, 0,5) from myTable
then it can directly use the second implementation above, instead replicating the constant parameters to fit the first implementation.
Best, Roberto
On Mon, 4 Nov 2019 at 13:15, Panagiotis Koutsourakis <panagiotis.koutsourakis@monetdbsolutions.com <mailto:panagiotis.koutsourakis@monetdbsolutions.com>> wrote:
Hi Roberto,
I would argue that replicating the constant is the correct behavior. With
SELECT a, 10 FROM mytable
you are actually requesting two columns, it just so happens that the second column has a constant value. In general a SQL SELECT statement returns a SQL relation. It *has* to replicate the constant, otherwise the result would not make sense.
Would it work if you called your UDF like as follows?
SELECT * FROM myfunc((SELECT a FROM mytable), 10);
Best regards, Panos.
On 11/4/19 12:08 PM, Roberto Cornacchia wrote: > Hi, > > When defining a UDF, e.g. > CREATE FUNCTION myfunc(a int, b int) RETURNS int > we can have: > > - a scalar implementation > command myfunc(a:int, b:int) :int > > - a bulk implementation > command myfunc(a:bat[:int], b:bat[:int]) :bat[:int] > > - a mixed implementation (e.g. a is a column, b is scalar): > command myfunc(a:bat[:int], b:int) :bat[:int] > > I'm trying to understand what the restrictions are, if any, for the third > case. > Is it expected to be allowed in (a combination of) the following cases? > > - Interleaved bulk/scalar arguments > command myfunc(a:bat[:int], b:int, c:bat[:int]) :bat[:int] > > - Table-returning function > command myfunc(a:bat[:int], b:int) (:bat[:int], :bat[:int]) > > - Table-returning function, used with a sub-select in input > SELECT * FROM myfunc( (SELECT a, 10 FROM mytable) ); > > In particular I can't get the last one to work. Even though the sub-select > has a constant as the second column, this constant seems to be first > replicated into a full column and then given to myfunc() as a column. > > When the constant parameters are a few, that becomes a waste of time/space. > Plus the annoyance of selecting the first value of a constant column for > the actual internal implementation, which of course expects a scalar for > those arguments. > > Am I missing something? Is there a way to make that work as I would expect? > > Thanks, Roberto > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list >
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Thanks Martin, I'm still a bit confused. I assume that by "order" you mean the order with which the signatures were created. If the first matching is taken, I don't understand the substring() example that I showed. The first declared is the fully bulk version. Shouldn't that one always match (by replicating the constant columns)? I would have expected the "optimized" version with constants to be declared first, and the fully bulk version second. Unfortunately I can't get my case working as I would expect. To be more specific: CREATE OR REPLACE FUNCTION tokenize(id integer, s string, d string, m tinyint, g tinyint, stemmer string, cs boolean, prob double) RETURNS TABLE (id integer, token string, prob double) EXTERNAL NAME spinque."UTF8tokenize"; # The *bold* ones are scalars command batspinque.UTF8tokenize_c(id:bat[:int], s:bat[:str], *delims:str*, *min_tok_len:bte*, *grams:bte*, *stemmer:str*, *cs:bit*, prob:bat[:dbl]) (:bat[:int], :bat[:str], :bat[:dbl]) address SPQbat_utf8_tokenize; sql>create table t (id int,value string,prob double); operation successful explain select * from tokenize( (select id, value, *'SP'*, *2*, *1*, *'snowball-english'*, *false*, prob from t) ); Program contains errors.:(NONE).multiplex (NONE).multiplex, I think, indicates that: - A fully bulk version was not found - A multiplex with a fully scalar version is attempted, but no fully scalar signature was found (BTW, a more informative error message could help here) I don't understand why the mixed bulk/scalar MAL signature above wouldn't match the query. Again, from a logical point of view, a fully bulk signature is expected, but we saw already that in other cases (e.g. the substring() example) this shortcut is allowed. Why not here? Roberto On Mon, 4 Nov 2019 at 22:29, Martin Kersten <martin@monetdb.org> wrote:
The MAL signatures are ordered and the first one matching is used.
Such ordering is not enforced at the SQL function layer. In general it would require a type resolution that either works on matching or you have to encode the order of resolution.
Hi Panos,
Thanks for your input.
As Arjen pointed out, I completely understand why MonetDB would replicate a constant into a column, at least from a logical point of view or as a default physical point of view.
However, this is not the only route. Take for example function substring(start,index) It has two implementations:
command batstr.substring( s:bat[:str], start:bat[:int], index:bat[:int]) :bat[:str] address STRbatsubstring;
command batstr.substring( s:bat[:str], start:int, index:int):bat[:str] address STRbatsubstringcst;
When we call it from SQL with select substring(a, 0,5) from myTable
then it can directly use the second implementation above, instead replicating the constant parameters to fit the first implementation.
Best, Roberto
On Mon, 4 Nov 2019 at 13:15, Panagiotis Koutsourakis <
On 04/11/2019 15:14, Roberto Cornacchia wrote: panagiotis.koutsourakis@monetdbsolutions.com <mailto: panagiotis.koutsourakis@monetdbsolutions.com>> wrote:
Hi Roberto,
I would argue that replicating the constant is the correct behavior.
With
SELECT a, 10 FROM mytable
you are actually requesting two columns, it just so happens that the
second column has a constant value. In general a SQL SELECT statement returns a SQL relation. It *has* to replicate the constant, otherwise the result would not make sense.
Would it work if you called your UDF like as follows?
SELECT * FROM myfunc((SELECT a FROM mytable), 10);
Best regards, Panos.
On 11/4/19 12:08 PM, Roberto Cornacchia wrote: > Hi, > > When defining a UDF, e.g. > CREATE FUNCTION myfunc(a int, b int) RETURNS int > we can have: > > - a scalar implementation > command myfunc(a:int, b:int) :int > > - a bulk implementation > command myfunc(a:bat[:int], b:bat[:int]) :bat[:int] > > - a mixed implementation (e.g. a is a column, b is scalar): > command myfunc(a:bat[:int], b:int) :bat[:int] > > I'm trying to understand what the restrictions are, if any, for
the third
> case. > Is it expected to be allowed in (a combination of) the following
cases?
> > - Interleaved bulk/scalar arguments > command myfunc(a:bat[:int], b:int, c:bat[:int]) :bat[:int] > > - Table-returning function > command myfunc(a:bat[:int], b:int) (:bat[:int], :bat[:int]) > > - Table-returning function, used with a sub-select in input > SELECT * FROM myfunc( (SELECT a, 10 FROM mytable) ); > > In particular I can't get the last one to work. Even though the
sub-select
> has a constant as the second column, this constant seems to be
first
> replicated into a full column and then given to myfunc() as a
column.
> > When the constant parameters are a few, that becomes a waste of
time/space.
> Plus the annoyance of selecting the first value of a constant
column for
> the actual internal implementation, which of course expects a
scalar for
> those arguments. > > Am I missing something? Is there a way to make that work as I
would expect?
> > Thanks, Roberto > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list >
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (4)
-
Arjen P. de Vries
-
Martin Kersten
-
Panagiotis Koutsourakis
-
Roberto Cornacchia