
Arjen van der Meijden wrote: Hi arjen, thanks for this detailed report. For completeness, please include MonetDB/SQL version information and the platform you are working on (Win/Linux, HW). At first sight, I don't see any suspicious code. This hints at a situation that some property in the kernel might not be set correctly, or a rewrite in the SQL front-end went astray. For this we have to rerun the test with possibly debugging enabled. If you stumble upon such unexpected cases, the 'trace select ...' is often helpful, because it gives the execution time for each instruction and the size of intermediates. To assess if a property is not set correctly, --algorithms and --xproperties may provide good clues (for the experts ;-)) regards, Martin
Hi list,
I was just doing some basic queries on MonetDB5/SQL to see if it is suitable for my application, I'm doing lots of aggregates on some logfile-abstractions. Basically they all boil down to 'how many unique visitors and total pageviews where there in period X-Y in section Z'.
I have this table: pageviews ( timestamp timestamp not null, clientip varchar(15) not null, sectionid smallint not null, itemid integer not null, channelid smallint default 0 )
Currently it only contains data for last september, with about 2M records/day, and 5.6M in total.
There are no additional indexes in this case.
When doing a query like this, monetdb very fast. Once the data is in the memory cache, it returns (according to trace) in about half a second.
select count(*) from pageviews where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00';
The result is 1916813
This one is also pretty fast, taking about 1.7 second
select count(distinct clientip) from pageviews where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00';
The result is 165700
And the third which is pretty fast: select channelid, count(*) from pageviews where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00' group by channelid;
Here's the distribution, and its returned in about 0.6 second [ 0, 538187 ] [ 1, 1108478 ] [ 4, 42867 ] [ 3, 145565 ] [ 2, 81716 ]
But when I combine those last two queries, the result isn't returned in a reasonable amount of time, I waited for more than half an hour and it still hadn't returned the results.
select channelid, count(distinct clientip), count(*) from pageviews where timestamp between '2007-09-21 00:00:00' and '2007-09-22 00:00:00' group by channelid;
I'm not very good at reading your explain output yet, so I've attached the resulting explain for that query.
Is there a way to speed up this type of query? It seems a bit odd that it's taking more than half an hour (postgresql does it in about 20 seconds) while the other queries return much faster (postgresql does them in about 14 seconds).
Best regards,
Arjen ------------------------------------------------------------------------
------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users