
I've been tinkering a little more, if I use either the minimal_pipe, no_mitosis_pipe or the sequential_pipe optimizers the error no longer occurs. I've specified the set of optimizers to use, and it seems that the optimizer step that's problematic is the *mitosis* step. Where can I learn more about this optimizer? Can someone shed some light? Thanks! -- Miguel On 08/13/2013 03:59 PM, Miguel Ping wrote:
I'm trying to come up with a small sample, but it is hard.
- I currently have a small dataset (~600 rows) which gives me the error. - BUT if I export that data onto a *new* table, the error doesn't happen. This makes it hard to provide a proper sample.
Running explain on the same query for these two tables, the plan is different: the error plan seems to use the new subgroup feature for aggregates which was released in a recent MonetDB version (11.15.x?) .
* Does monetdb keep some sort of internal table statistics that feed the MAL planner? * Can I force the query to use the 'old' aggregate function? * Can you guys point me to the part where the group BAT is calculated? my guess is that the group ids are not being calculated correctly, hence the difference between g->U->count and b->U->count. I'm guessing the culprit is around here:
... | (X_16,r1_34,X_140) := group.subgroupdone(X_15); | X_18 := algebra.leftfetchjoin(r1_34,X_15); ... | X_27:bat[:oid,:str] := batudf.subhllagg(X_26,X_16,r1_34,true); #r1_r4 should be the bat* gid
I have tried the same dataset with MonetDB 11.13.7 (which had the 'old' aggregate definition) and the query works as expected. I don't want to do a downgrade since I don't even know if the data files are compatible.
Thanks, Miguel
On 08/13/2013 11:35 AM, Martin Kersten wrote:
If it is hitting before your code, then please provide the smallest (SQL) test case to reproduce it locally.
Thanks, Martin
On 8/13/13 11:27 AM, Miguel Ping wrote:
The query calls a custom aggregate function, but the error occurs before hitting my code; the code path just starts to prepare things (it's just the boilerplate to run a custom aggr function), and it hits the error while calling BATgroupaggrinit. In fact, BATgroupaggrinit is the very first thing that the BAThllaggr function calls; Also according to Sjoerd, it's a bug:
"If this happens when running a SQL query, it's a bug. I don't think NULLs have anything to do with it. NULL values are stored in-line. You might want to look at b->U->count, g->U->count, b->H->seq, g->H->seq, b->H->dense, g->H->dense when the misalignment happens (either in the debugger or by using printf--but realize that count and seq are not int, so %d is not going to work). Also things like the MAL plan (prepend SQL query with EXPLAIN) and the stack trace might be useful."
Thanks.
On 08/13/2013 08:51 AM, Stefan Manegold wrote:
Dear Miguel,
I am not aware of any "hllaggr()" function in the MonetDB release, so I assume the error occurs in your code. Not know your code at all, I'm afraid we cannot be of much help.
Best, Stefan
On Mon, Aug 12, 2013 at 06:59:33PM +0100, Miguel Ping wrote:
Some more info:
SELECT count(*) FROM wa_sapo_pt_audience.kpi_2013_07 WHERE ts>=1373410800000 AND ts<=1374706799000; ==> 764314
SELECT count(distinct(ts_day)) FROM wa_sapo_pt_audience.kpi_2013_07 WHERE ts>=1373410800000 AND ts<=1374706799000; ==> 15
it seems that b.count is the row count, while g.count is something like the distinct count, with 2 more values?
On 08/12/2013 05:50 PM, Miguel Ping wrote:
Hi, I'm resurrecting this since I've been out of town and only today I got a chance to investigate further. I've recompiled with -O0 to prevent optimizations from "hiding" the values, and in the debugger I got this:
b->U->count BUN 764314 g->U->count BUN 17 b->H->seq oid 0 g->H->seq oid 0 b->H->dense unsigned int 1 g->H->dense unsigned int 1
The stack call is as follows:
BAThllaggr() at udf.c AGGRsubhllaggcand() at udf.c AGGRsubhllagg() at udf.c malCommandCall() at mal_interpreter.c runMALsequence() at mal_interpreter.c DFLOWworker() at mal_dataflow.c start_thread() at pthread_create.c clone() at clone.S 0x0
-------- Original Message -------- Subject: Re: b and g must be aligned Date: Fri, 26 Jul 2013 11:34:53 +0100 From: Sjoerd Mullender <sjoerd@acm.org> Reply-To: Communication channel for MonetDB users <users-list@monetdb.org> To: Communication channel for MonetDB users <users-list@monetdb.org>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 2013-07-26 12:23, Miguel Ping wrote: > On 07/25/2013 01:51 PM, Sjoerd Mullender wrote: On 2013-07-25 > 14:04, Miguel Ping wrote: >>>> Hi all, >>>> >>>> We're hitting this error "b and g must be aligned". I tracked >>>> the src to a commit about some alignment code thing in >>>> gdk_calc: >>>> http://www.mail-archive.com/checkin-list@monetdb.org/msg09731.html >>>> >>>> (Fix alignment conversion in compatibility code for grouped >>>> aggregates.) >>>> >>>> Can you guys please explain what's the reason behind this >>>> error? I can't understand by just looking to the src of >>>> gdk_calc.c >>>> >>>> Thanks! > When using grouped aggregates, the grouping bat must be aligned > with the value bat. The value bat is b and contains the values you > want to aggregate. The group bat is g and contains for each value > in b the group (an oid) it belongs to. Equal group ids means the > same group. These bats must be aligned, because we need to know for > each value in b to which group it belongs. Aligned means: same > length, and same head column values. The head columns must be > dense (a sequence of numbers starting at some value, and each next > value exactly one larger than the previous). Dense sequences are > usually not stored explicitly in MonetDB. We only store the first > value in the hseqbase field. So the hseqbase fields of b and g > must be equal. The one exception to this is when the bats are both > empty. This last exception is the change to gdk_calc.c in that > changeset. > > -- Sjoerd Mullender >> _______________________________________________ users-list >> mailing listusers-list@monetdb.org >> http://mail.monetdb.org/mailman/listinfo/users-list >> >> > Thanks for your explanation. I still don't understand how can > there be a misalignment. I would expect MonetDB to feed my hll > aggregate functions with the correct values. Can it be that I may > have some NULL values and the validation is failing because of > that? If this happens when running a SQL query, it's a bug. I don't think NULLs have anything to do with it. NULL values are stored in-line. You might want to look at b->U->count, g->U->count, b->H->seq, g->H->seq, b->H->dense, g->H->dense when the misalignment happens (either in the debugger or by using printf--but realize that count and seq are not int, so %d is not going to work). Also things like the MAL plan (prepend SQL query with EXPLAIN) and the stack trace might be useful.
- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) Comment: Using GnuPG with Thunderbird -http://www.enigmail.net/
iQCVAwUBUfJQyT7g04AjvIQpAQK/JAP9HCp/aFaYWv0jodfPUnSRVgFSsdjTn/VL ttSsmAF+yomGMDIne2311f/D51F3/nte7Utx+01lgvArapWErhjGN1hzPSr5LQbs PZ6dUfNcH8Rt2AtT3uSxfkFZy9VRDCNNXPei43IgMS2HxVZ48pnAVkNcpBbW3Gms GwdvU7bSZtM= =2zSg -----END PGP SIGNATURE----- _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list