
Thank you Stefan, So as I understand that the algebra.(left)fetchjoin() is the most optimized way in monetdb. I was hoping in my case to gain some precious seconds by trying to instruct monetdb to use other algorithm, I have a small table around 50k rows (that should fit easily in memory) that need to be joined with a big table of 12M rows, the join results is around 150M rows. The join key is a char(21) column which is the first column of a primary key defined on both tables (I think this why its reflected in the algebra.(left)fetchjoin()). On the join I am doing an aggregate on a combination of columns from both tables. The execution time is around 80s, returning 250 rows, similar queries on different columns needs to be executed around 50 times. I am trying to avoid to materialize the join results in a table and then execute the aggregation on the materialized table, the reason is that it will consume a 35GB of disk space. Is it hard to achieve a better performance by relying only on the join? Another performance issue i am facing is with range join on date columns, does monetdb has a specific method/index for this kind of join? Regards. On Mon, Nov 2, 2015 at 7:48 PM, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:
Hi,
whenever algebra.(left)fetchjoin() is used, you surely do not want to use hash-join or merge-join, instead. fetch-join is MonetDB lingo for the binary match on OIDs that can exploit positional lookups because (at least) one of the arguments is a dense (ascending) (non-materialized) sequence of OIDs (aka., "Virtual OIDs" or "VOID"). This implicitly performs a (very efficient) merge whenever also the second argument is sorted, and otherwise implicitly a very efficient hash using identity as a perfect hash function (no collisions, no false positives). This operation does not come from / represent a table-join on SQL level, but rather the projection (aka. column reconstruction) required in (late-materialization) columnar query execution.
Having said that, in "real" join cases, i.e., when algebra.join() is used on value (rather than OID) columns, the actual join algorithm is chosen on-the-fly in side the algebra.join() MAL operation, and thus does not show in the MAL plan. You can see (some of) the internal algorithm decisions by starting mserver5 with the --algorithm command line switch (see the mserver5 man page for details).
There is no way to "overrule" the runtime algorithm decisions made inside MonetDB's kernel (GDK).
Best, Stefan
----- On Nov 2, 2015, at 5:37 PM, imad hajj chahine imad.hajj.chahine@gmail.com wrote:
Hi,
How can we control the algorithm used for the join operation? In the trace i am always getting algebra.leftfetchjoin, what could possibly be done to use merge join or hash join that will be reflected in the trace by algebra.join.
Thanks.
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list