Function with table as argument that return a table

Hi, I want to create a Python function that accept a query as argument, do some processing and return another table, In addition to the table columns I want to have some additional scalar parameters and i am getting the following error: function requires a single sub query. CREATE function f1(col1 string, col2 int, col3 double, scalar1 int) RETURNS TABLE(col1 string, col2 double, col3 double, col4 double) LANGUAGE PYTHON Is this the normal behavior of Table function, as in aggregate function I can combine columns with scalar parameters, any workaround for this issue. Thank you.

Hello Imad, Yes, that’s the expected behaviour. For such table-returning python functions, you can either pass all parameters as scalars, or as database columns. There are several alternatives: 1. expand the scalar parameter as a datacolumn, e.g.: SELECT * FROM f1( (SELECT col1, col2, col3, 42 FROM mytable) ); 2. Store the value for scalar1 in SQL, and retrieve it in your Python UDF body using a loopback query [1], e.g.: DELCARE scalar1 INT; SET scalar1 = 42; -- Note the changed function signature: CREATE function f1(col1 string, col2 int, col3 double) RETURNS TABLE(col1 string, col2 double, col3 double, col4 double) LANGUAGE PYTHON { ... scalar1 = (_conn.execute(“SELECT scalar1;”).values())[0][0] ... }; 3. Use loopback queries in the Python UDF body to retrieve the columns, e.g.: CREATE function f1(scalar1 int) RETURNS TABLE(col1 string, col2 double, col3 double, col4 double) LANGUAGE PYTHON { … col1 = list(_conn.execute("SELECT col1 FROM mytable;”)[‘col1’]) col2 = list(_conn.execute("SELECT col2 FROM mytable;”)[‘col2']) col3 = list(_conn.execute("SELECT col3 FROM mytable;”)[‘col3’]) ... }; However, I’ve just noticed that different ways of accessing column data (i.e. option 1 vs option 3) have very different speed. In my own little program, option 3 is ~40 times slower than option 1, which is rather unexpected. You’re welcome to try different options, and please let us know if you notice any significant differences in execution times. Cheers, Jennie [1] https://www.monetdb.org/blog/voter-classification-using-monetdbpython
On 28 Aug 2017, at 11:52, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:
Hi,
I want to create a Python function that accept a query as argument, do some processing and return another table, In addition to the table columns I want to have some additional scalar parameters and i am getting the following error: function requires a single sub query.
CREATE function f1(col1 string, col2 int, col3 double, scalar1 int) RETURNS TABLE(col1 string, col2 double, col3 double, col4 double) LANGUAGE PYTHON
Is this the normal behavior of Table function, as in aggregate function I can combine columns with scalar parameters, any workaround for this issue.
Thank you. _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Thank you Jennie, The way I am handling this now is by creating different version of the same function hardcoding the target parameter, this was working fine until recently the parameters count start to increase. I think option1 is best suited for my case, my only concern is memory wise as I will have arrays of several millions in length filled with scalar values to just take the first value of each array, does monetdb optimizing engine handle very well these types of arrays? Regards. On Mon, Sep 4, 2017 at 7:11 PM, Ying Zhang <Y.Zhang@cwi.nl> wrote:
Hello Imad,
Yes, that’s the expected behaviour. For such table-returning python functions, you can either pass all parameters as scalars, or as database columns. There are several alternatives:
1. expand the scalar parameter as a datacolumn, e.g.:
SELECT * FROM f1( (SELECT col1, col2, col3, 42 FROM mytable) );
2. Store the value for scalar1 in SQL, and retrieve it in your Python UDF body using a loopback query [1], e.g.:
DELCARE scalar1 INT; SET scalar1 = 42;
-- Note the changed function signature: CREATE function f1(col1 string, col2 int, col3 double) RETURNS TABLE(col1 string, col2 double, col3 double, col4 double) LANGUAGE PYTHON { ... scalar1 = (_conn.execute(“SELECT scalar1;”).values())[0][0] ... };
3. Use loopback queries in the Python UDF body to retrieve the columns, e.g.:
CREATE function f1(scalar1 int) RETURNS TABLE(col1 string, col2 double, col3 double, col4 double) LANGUAGE PYTHON { … col1 = list(_conn.execute("SELECT col1 FROM mytable;”)[‘col1’]) col2 = list(_conn.execute("SELECT col2 FROM mytable;”)[‘col2']) col3 = list(_conn.execute("SELECT col3 FROM mytable;”)[‘col3’]) ... };
However, I’ve just noticed that different ways of accessing column data (i.e. option 1 vs option 3) have very different speed. In my own little program, option 3 is ~40 times slower than option 1, which is rather unexpected. You’re welcome to try different options, and please let us know if you notice any significant differences in execution times.
Cheers,
Jennie
[1] https://www.monetdb.org/blog/voter-classification-using-monetdbpython
On 28 Aug 2017, at 11:52, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:
Hi,
I want to create a Python function that accept a query as argument, do some processing and return another table, In addition to the table columns I want to have some additional scalar parameters and i am getting the following error: function requires a single sub query.
CREATE function f1(col1 string, col2 int, col3 double, scalar1 int) RETURNS TABLE(col1 string, col2 double, col3 double, col4 double) LANGUAGE PYTHON
Is this the normal behavior of Table function, as in aggregate function I can combine columns with scalar parameters, any workaround for this issue.
Thank you. _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hello Imad, If you’re worried about performance, I’d recommend option 2. In my use case, I only have one database column (with 17 million integers) but 3 scalar parameters. I first used option 1 as a quick hacky solution, but I’m switching over to option 2 to avoid potential performance problem with even more data. Best, Jennie
On 4 Sep 2017, at 18:41, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:
Thank you Jennie,
The way I am handling this now is by creating different version of the same function hardcoding the target parameter, this was working fine until recently the parameters count start to increase. I think option1 is best suited for my case, my only concern is memory wise as I will have arrays of several millions in length filled with scalar values to just take the first value of each array, does monetdb optimizing engine handle very well these types of arrays?
Regards.
On Mon, Sep 4, 2017 at 7:11 PM, Ying Zhang <Y.Zhang@cwi.nl> wrote: Hello Imad,
Yes, that’s the expected behaviour. For such table-returning python functions, you can either pass all parameters as scalars, or as database columns. There are several alternatives:
1. expand the scalar parameter as a datacolumn, e.g.:
SELECT * FROM f1( (SELECT col1, col2, col3, 42 FROM mytable) );
2. Store the value for scalar1 in SQL, and retrieve it in your Python UDF body using a loopback query [1], e.g.:
DELCARE scalar1 INT; SET scalar1 = 42;
-- Note the changed function signature: CREATE function f1(col1 string, col2 int, col3 double) RETURNS TABLE(col1 string, col2 double, col3 double, col4 double) LANGUAGE PYTHON { ... scalar1 = (_conn.execute(“SELECT scalar1;”).values())[0][0] ... };
3. Use loopback queries in the Python UDF body to retrieve the columns, e.g.:
CREATE function f1(scalar1 int) RETURNS TABLE(col1 string, col2 double, col3 double, col4 double) LANGUAGE PYTHON { … col1 = list(_conn.execute("SELECT col1 FROM mytable;”)[‘col1’]) col2 = list(_conn.execute("SELECT col2 FROM mytable;”)[‘col2']) col3 = list(_conn.execute("SELECT col3 FROM mytable;”)[‘col3’]) ... };
However, I’ve just noticed that different ways of accessing column data (i.e. option 1 vs option 3) have very different speed. In my own little program, option 3 is ~40 times slower than option 1, which is rather unexpected. You’re welcome to try different options, and please let us know if you notice any significant differences in execution times.
Cheers,
Jennie
[1] https://www.monetdb.org/blog/voter-classification-using-monetdbpython
On 28 Aug 2017, at 11:52, imad hajj chahine <imad.hajj.chahine@gmail.com> wrote:
Hi,
I want to create a Python function that accept a query as argument, do some processing and return another table, In addition to the table columns I want to have some additional scalar parameters and i am getting the following error: function requires a single sub query.
CREATE function f1(col1 string, col2 int, col3 double, scalar1 int) RETURNS TABLE(col1 string, col2 double, col3 double, col4 double) LANGUAGE PYTHON
Is this the normal behavior of Table function, as in aggregate function I can combine columns with scalar parameters, any workaround for this issue.
Thank you. _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (2)
-
imad hajj chahine
-
Ying Zhang