
Le 04/07/2013 15:12, Stefan Manegold a écrit :
Please check what the monetdbd / merovingian log file says. Hum... /var/log/monetdb is empty. And I can't locate any monet* log file.
It's called merovingian.log and is located in your dbfarm.
The table size on disk for Postgresql is 37 GB. Using VectorWise, the table loads, and take only 7 GB on disks, but the loading process (very similar to what we do with MonetDB) takes a very long time( 25 hours), much longer than what happens with the 109 columns tables.
I am trying to make sense of that and understand why this table is giving a (very) hard time to monetdb (and somehow VectorWise as well). Does anyone have an explanation ? Can you share the schemas (column types) of both tables? Yes, sure.
The first table is called dwhinv and has :
89 columns of type varchar(32). 2 timestamps (not null) 2 decimal (at least one of them is set on each line) 19 integers
The second table is called dwhcol and has :
83 columns of type varchar(32) 2 timestamps 420 columns of type decimal
integers are 4 byte. timestamps are IMHO also 4 byte I'm not sure what we use for decimals without digit specification; I assume we then use the largest, i.e., 8 byte. strings highly depend on the actual data (due to (best effort) duplicate elimination and dictionary encoding).
For each type used the number of column with that type might be sufficient. If there are no string columns at all, that would be sufficient info for us to estimate the expected storage size.
Other than duplicate elimination (dictionary encoding) for string columns, MonetDB does not yet use compressed storage. We do have good experiences with compressed file systems, though. What about null values ? I was under the impression that null values where not stored in monetdb (just non existant entries in the BATs) ?
NULL values are stored. For fixed width columns (e.g., all numbers), they take as much storage as any value of that type. For variable length types (e.g., strings, (both char & varchar)), they take 1 to 8 byte, depending on the number of distinct values in that column. Stefan
Because each decimal column has at most 1 million non null values, and many have a few thousands... Doesn't it matter ?
Do you have constraints and/or keys defined on your tables prior to bulk loading?
No, there are no foreign keys, nor constraints of any sort.
Franck
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |