
On Thu, Jul 04, 2013 at 02:41:38PM +0200, Franck Routier wrote:
Hi all,
I am trying to copy a table from a postgresql database into monetdb, but it fails, with monetdb filling the disks. I am using latest version of Monetdb (Feb2013-SP3) on Ubuntu Linux, with 32 GB Ram and SSD disks.
To copy into Monetdb, I have a java program that runs mclient as:
mclient --language=sql --databas=database --host=host --user=user /tmp/command.ctl
with command.ctl :
COPY nnn RECORDS INTO table FROM STDIN USING DELIMITERS'|', '\n', '\' NULL AS ''
Then the Postgresql resultset is read from jdbc, formatted and written into STDIN. So far so good, it work with my dimensions tables (small), and with a 295 millions rows fact table that has 109 columns. This fact table takes 86 GB on the disk for Postgresql storage (plus indices).
I have another form of the fact table, with less rows (184 millions) but more columns (505). While the first form of the table has a measureId column and only one value column, the other form of the table has one value column per measure. (It is a "sum(case when measureId=X then value else null ... group by" version of the first table).
Loading this "columnar" table fails, with monetdb filling the disks (something like 500 GB was left before the loading process) and quitting (not sure what happens exactly here on the server, but I get a broken pipe error in java, and the monetdb database is lost).
Please check what the monetdbd / merovingian log file says.
The table size on disk for Postgresql is 37 GB. Using VectorWise, the table loads, and take only 7 GB on disks, but the loading process (very similar to what we do with MonetDB) takes a very long time( 25 hours), much longer than what happens with the 109 columns tables.
I am trying to make sense of that and understand why this table is giving a (very) hard time to monetdb (and somehow VectorWise as well). Does anyone have an explanation ?
Can you share the schemas (column types) of both tables? For each type used the number of column with that type might be sufficient. If there are no string columns at all, that would be sufficient info for us to estimate the expected storage size. E.g., if all columns were 4 byt integers, your first fact table would take 295 M * 109 * 4 ~= 128 GB, while your second table would take 184 M * 505 * 4 ~= 372 GB (all withou constraints & indexes). Other than duplicate elimination (dictionary encoding) for string columns, MonetDB does not yet use compressed storage. We do have good experiences with compressed file systems, though. Do you have constraints and/or keys defined on your tables prior to bulk loading? If so, try bulk loading without constraints/keys defined, and add them afterwards via alter table statements. Best, Stefan
thanks, Franck
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |