Re: Failing to load a table

4 Jul 2013

      On Thu, Jul 04, 2013 at 02:41:38PM +0200, Franck Routier wrote:
...
Hi all,
I am trying to copy a table from a postgresql database into monetdb,
but it fails, with monetdb filling the disks.
I am using latest version of Monetdb (Feb2013-SP3) on Ubuntu Linux,
with 32 GB Ram and SSD disks.
To copy into Monetdb, I have a java program that runs mclient as:
mclient --language=sql --databas=database --host=host --user=user
/tmp/command.ctl
with command.ctl :
COPY nnn RECORDS INTO table FROM STDIN USING DELIMITERS'|', '\n',
'\' NULL AS ''
Then the Postgresql resultset is read from jdbc, formatted and
written into STDIN.
So far so good, it work with my dimensions tables (small), and with
a 295 millions rows fact table that has 109 columns.
This fact table takes 86 GB on the disk for Postgresql storage (plus
indices).
I have another form of the fact table, with less rows (184 millions)
but more columns (505).
While the first form of the table has a measureId column and only
one value column, the other form of the table has one value column
per measure.
(It is a "sum(case when measureId=X then value else null ... group
by" version of the first table).
Loading this "columnar" table fails, with monetdb filling the disks
(something like 500 GB was left before the loading process) and
quitting (not sure what happens exactly here on the server, but I
get a broken pipe error in java, and the monetdb database is lost).
Please check what the monetdbd / merovingian log file says.
...
The table size on disk for Postgresql is 37 GB.
Using VectorWise, the table loads, and take only 7 GB on disks, but
the loading process (very similar to what we do with MonetDB) takes
a very long time( 25 hours), much longer than what happens with the
109 columns tables.
I am trying to make sense of that and understand why this table is
giving a (very) hard time to monetdb (and somehow VectorWise as
well).
Does anyone have an explanation ?
Can you share the schemas (column types) of both tables?
For each type used the number of column with that type might be sufficient.
If there are no string columns at all, that would be sufficient info for us
to estimate the expected storage size.

E.g., if all columns were 4 byt integers, your first fact table would take
295 M * 109 * 4 ~= 128 GB, while your second table would take
184 M * 505 * 4 ~= 372 GB (all withou constraints & indexes).

Other than duplicate elimination (dictionary encoding) for string columns,
MonetDB does not yet use compressed storage.  We do have good experiences
with compressed file systems, though.

Do you have constraints and/or keys defined on your tables prior to bulk
loading?
If so, try bulk loading without constraints/keys defined, and add them
afterwards via alter table statements.

Best,
Stefan
...
thanks,
Franck

...
_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list
-- 
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
|  www.CWI.nl/~manegold  | Science Park 123 (L321) |
|   +31 (0)20 592-4212   | 1098 XG Amsterdam  (NL) |

Re: Failing to load a table

Stefan Manegold