[MonetDB-users] ERROR GDKload failed....CMDthetajoin; operation failed

Hello again kind people. This is an sql self join on TableA which contains a serial primary key column called 'seq'. The select is limited to less than 400 rows of output by 'a1.seq<=400'. sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq<=400 and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc; This select statement results in 384 rows, as expected, with a newly calculated column for sum(a2.num)/17 which yields a running average of 'num' between any given value for 'seq' and 'seq+17'. This next sql statement is exactly the same as above except the limitation of 384 rows of output imposed by 'a1.seq<=400' has been removed. Note that TableA contained 40 million plus rows. sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc; Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed. Perhaps there is a better way to accomplish a running aggregate on 40 million rows? Thanks.........George

I assume that the first query was executed successfully in a separate transaction. The second query builds a huge intermediate result incrementally. To understand such queries you might try a toy example first (e.g. a table of 10 elements and look at the intermediate results) Consider a1.seq = 18 then there are 40M-18 combinations for which a1.seq <a2.seq+17, this is stored in the intermediate... and then add a1.seq= 19 .... roughly leading to a table of 40M^2 You probably mean a2.seq <= a1.seq <a2.seq+17, which should be translated by the SQL front-end in a bandjoin(). I am almost sure it doesnot. The SQL standard contains WINDOWS functions to handle this case. They are not supported yet. A hack would be to generate 18 statements, which accumulate the result along the lines: insert into accum(first,last,total) select seq,seq,num from tableA; update accum set last=last +1 set total = total + (select num from tableA where seq= last+1); ...17 times... ...sort/group... regards, Martin George H wrote:
Hello again kind people.
This is an sql self join on TableA which contains a serial primary key column called 'seq'. The select is limited to less than 400 rows of output by 'a1.seq<=400'.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq<=400 and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
This select statement results in 384 rows, as expected, with a newly calculated column for sum(a2.num)/17 which yields a running average of 'num' between any given value for 'seq' and 'seq+17'.
This next sql statement is exactly the same as above except the limitation of 384 rows of output imposed by 'a1.seq<=400' has been removed. Note that TableA contained 40 million plus rows.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed.
Perhaps there is a better way to accomplish a running aggregate on 40 million rows?
Thanks.........George
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

On Tue, Oct 10, 2006 at 11:27:31AM -0700, George H wrote:
Hello again kind people.
This is an sql self join on TableA which contains a serial primary key column called 'seq'. The select is limited to less than 400 rows of output by 'a1.seq<=400'.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq<=400 and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
This select statement results in 384 rows, as expected, with a newly calculated column for sum(a2.num)/17 which yields a running average of 'num' between any given value for 'seq' and 'seq+17'.
This next sql statement is exactly the same as above except the limitation of 384 rows of output imposed by 'a1.seq<=400' has been removed. Note that TableA contained 40 million plus rows.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc; I guess you want some limitation on the lower and of the band join here.
ie select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq >= 16 -- your window is 17, ie we need to skip the first 16 and a1.seq >= a2.seq and a1.seq < a2.seq+17 etc Not sure this will give a solution to the data explosion as our current mapping will have problems with turn both into bat theta joins, which will be close to the crosstable of 40M*40M rows. This will be an other item on my list to fix... Niels
Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed.
Perhaps there is a better way to accomplish a running aggregate on 40 million rows?
Thanks.........George
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl

Sorry Niels I did not correctly report the statement I used. I used the limitation on the lower end of the band just as you wrote it with 'a1.seq>=a2.seq': select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq >= 16 -- your window is 17, ie we need to skip the first 16 and a1.seq >= a2.seq and a1.seq < a2.seq+17 etc result is same error, same data explosion. Thanks for your comment though Best...George Niels Nes wrote:
On Tue, Oct 10, 2006 at 11:27:31AM -0700, George H wrote:
Hello again kind people.
This is an sql self join on TableA which contains a serial primary key column called 'seq'. The select is limited to less than 400 rows of output by 'a1.seq<=400'.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq<=400 and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
This select statement results in 384 rows, as expected, with a newly calculated column for sum(a2.num)/17 which yields a running average of 'num' between any given value for 'seq' and 'seq+17'.
This next sql statement is exactly the same as above except the limitation of 384 rows of output imposed by 'a1.seq<=400' has been removed. Note that TableA contained 40 million plus rows.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
I guess you want some limitation on the lower and of the band join here.
ie
select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq >= 16 -- your window is 17, ie we need to skip the first 16 and a1.seq >= a2.seq and a1.seq < a2.seq+17 etc
Not sure this will give a solution to the data explosion as our current mapping will have problems with turn both into bat theta joins, which will be close to the crosstable of 40M*40M rows.
This will be an other item on my list to fix...
Niels
Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed.
Perhaps there is a better way to accomplish a running aggregate on 40 million rows?
Thanks.........George
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Sorry: The statement that generated the error is inccorectly reported in the initial post. The statement was actually: sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq>=a2.seq and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc; Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed. George George H wrote:
Hello again kind people.
This is an sql self join on TableA which contains a serial primary key column called 'seq'. The select is limited to less than 400 rows of output by 'a1.seq<=400'.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where a1.seq<=400 and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
This select statement results in 384 rows, as expected, with a newly calculated column for sum(a2.num)/17 which yields a running average of 'num' between any given value for 'seq' and 'seq+17'.
This next sql statement is exactly the same as above except the limitation of 384 rows of output imposed by 'a1.seq<=400' has been removed. Note that TableA contained 40 million plus rows.
sql->select a1.*, sum(a2.num)/17 from TableA a1, TableA a2 where and a1.seq>=17 and a1.seq<a2.seq+17 group by a1.seq, a1.num order by a1.seq asc;
Error: ERROR: GDKload failed: name=14/1457, ext=buns.priv ERROR: CMDthetajoin: operation failed.
Perhaps there is a better way to accomplish a running aggregate on 40 million rows?
Thanks.........George
participants (3)
-
George H
-
Martin Kersten
-
Niels Nes