Here's some additional observations on this puzzling issue:
The slow table is "cmdb", the normal table is "cmdb2". Both have the same
number of rows according to "SELECT count(*)" (73553310). However,
sys.statistics shows something a bit different:
sql>SELECT x.name, count(*) AS columns_with, z.count FROM sys.tables AS x,
sys.columns AS y, sys.statistics AS z WHERE x.id = y.table_id AND y.id =
z.column_id GROUP BY x.name, z.count;
+-------+--------------+----------+
| name | columns_with | count |
+=======+==============+==========+
| cmdb | 8 | 73553310 |
| cmdb | 145 | 73553311 |
| cmdb2 | 153 | 73553310 |
+-------+--------------+----------+
3 tuples
sql>
For the slow table, "cmdb", there are 145/153 columns that have an extra
"row". This row count mismatch causes it to skip parallelization
optimizations in "monetdb5/optimizer/opt_mitosis.c", line 212.
A few related questions:
1) In the columnar model, is is valid for conceptual tuples to exist which
don't have values for each column like this? (This seems strange as now
there's a difference between an explicit NULL and "not existing"). How
might this situation have happened? (Only transactional full-row inserts
were applied to this DB). Can this be repaired?
2) Is the conditional in "monetdb5/optimizer/opt_mitosis.c", line 212
sufficiently robust? I may be misunderstanding the full context of this
line, but it seems like we could still benefit from
splitting/parallelization when r < rowcnt, no?
Thanks,
-Jeremy Norris