Re: Join optimization

2 Nov 2015

      Thank you Stefan,

So as I understand that the algebra.(left)fetchjoin() is the most optimized
way in monetdb.

I was hoping in my case to gain some precious seconds by trying to instruct
monetdb to use other algorithm,
I have a small table around 50k rows (that should fit easily in memory)
that need to be joined with a big table of 12M rows, the join results
is around 150M rows.
The join key is a char(21) column which is the first column of a primary
key defined on both tables (I think this why its reflected in the
algebra.(left)fetchjoin()).
On the join I am doing an aggregate on a combination of columns from both
tables.

The execution time is around 80s, returning 250 rows, similar queries on
different columns needs to be executed around 50 times.
I am trying to avoid to materialize the join results in a table and then
execute the aggregation on the materialized table, the reason is that it
will consume a 35GB of disk space.

Is it hard to achieve a better performance by relying only on the join?

Another performance issue i am facing is with range join on date columns,
does monetdb has a specific method/index for this kind of join?

Regards.

On Mon, Nov 2, 2015 at 7:48 PM, Stefan Manegold <Stefan.Manegold@cwi.nl>
wrote:
...
Hi,
whenever algebra.(left)fetchjoin() is used, you surely do not want to use
hash-join or merge-join, instead.
fetch-join is MonetDB lingo for the binary match on OIDs that can exploit
positional lookups because (at least)
one of the arguments is a dense (ascending) (non-materialized) sequence of
OIDs (aka., "Virtual OIDs" or "VOID").
This implicitly performs a (very efficient) merge whenever also the second
argument is sorted,
and otherwise implicitly a very efficient hash using identity as a perfect
hash function (no collisions, no false positives).
This operation does not come from / represent a table-join on SQL level,
but rather the projection (aka.
column reconstruction) required in (late-materialization) columnar query
execution.
Having said that, in "real" join cases, i.e., when algebra.join() is used
on value (rather than OID) columns,
the actual join algorithm is chosen on-the-fly in side the algebra.join()
MAL operation, and thus does not
show in the MAL plan.
You can see (some of) the internal algorithm decisions by starting
mserver5 with the --algorithm command line switch
(see the mserver5 man page for details).
There is no way to "overrule" the runtime algorithm decisions made inside
MonetDB's kernel (GDK).
Best,
Stefan
----- On Nov 2, 2015, at 5:37 PM, imad hajj chahine
imad.hajj.chahine@gmail.com wrote:
...
Hi,
How can we control the algorithm used for the join operation? In the
trace i am
always getting algebra.leftfetchjoin, what could possibly be done to use
merge
join or hash join that will be reflected in the trace by algebra.join.
Thanks.
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list