
Hello, Could you tell the limitation for number of columns in a relation and the main reason for that limitation. So far I could create relations with up to 15000 columns. When the number of columns is bigger mserver5 stops responding. Thank you and best regards, Oksana Dolmatova

Hai Oksana, A.f.a.I.k., there is no explicit limit on the #columns you can put in a relation (did you mean “table”?). The reason that mserver5 stopped responding is probably because it is busy with I/O. A long time ago, I once tried to create 1M tables, each with 3 columns. After about a day or something, MonetDB has created ~400K tables, and I could hear my HDD being constantly busy. MonetDB was still working, but that was already a sufficient evidence that that idea wasn’t going to work. So the question is actually what do you want to achieve? Regards, Jennie
On 9 Oct 2018, at 13:43, Oxana Dolmatova <oxana.dolmatova@gmail.com> wrote:
Hello,
Could you tell the limitation for number of columns in a relation and the main reason for that limitation. So far I could create relations with up to 15000 columns. When the number of columns is bigger mserver5 stops responding.
Thank you and best regards, Oksana Dolmatova _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

I'll add two observations to what Zhang Ying said. 1. Internally in MonetDB, the index of a column (or a BAT) in MonetDB's BBP (BAT Buffer Pool) is a signed 32-bit integer (on most platforms), so there isn't a theoretical problem having more than 15,000 columns; but MonetDB was not designed with that many tables in mind. The "long dimension" is expected to be the length of columns. 2. When you have as many as 15,000 columns, you're not simply finding more attributes of the same items. So, for example, a person has a first name, last name, year of birth, etc. etc. - but after several dozens of these you're probably adding columns such as "data relevant to this person at point X", "data relevant to this person at point Y" and so on. In your case, I'm guessing you have a lot of points and hence a lot of columns with data for your different items. What you should consider doing instead is having an extra "coordinate" column. Thus, in the example I gave, I would drop most of the columns, in favor of a second table named "person_data". This table's columns would be "person_id" (or some other key columns), "sample_point" and "value". Thus instead of M columns and N records you would have N x M records with 3 columns. Eyal On 09/10/2018 16:50, Ying Zhang wrote:
Hai Oksana,
A.f.a.I.k., there is no explicit limit on the #columns you can put in a relation (did you mean “table”?). The reason that mserver5 stopped responding is probably because it is busy with I/O.
A long time ago, I once tried to create 1M tables, each with 3 columns. After about a day or something, MonetDB has created ~400K tables, and I could hear my HDD being constantly busy. MonetDB was still working, but that was already a sufficient evidence that that idea wasn’t going to work.
So the question is actually what do you want to achieve?
Regards, Jennie
On 9 Oct 2018, at 13:43, Oxana Dolmatova <oxana.dolmatova@gmail.com> wrote:
Hello,
Could you tell the limitation for number of columns in a relation and the main reason for that limitation. So far I could create relations with up to 15000 columns. When the number of columns is bigger mserver5 stops responding.
Thank you and best regards, Oksana Dolmatova _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hello, Thank you for all the replies and information! The version is 11.23, plus I'm using the cloud platform with 4cpu-16ram-hpc. The number of rows was just 1K for 15K attributes in a table. I understand that MonetDB was made with no intention to create and process tables with that many columns. But I'm interested in matrix operations integration, so I would like to know what is the main bottleneck to process such data (matrix-like sized) in the system. Best regards, Oksana On Tue, 9 Oct 2018 at 21:02, Eyal Rozenberg <eyalroz@technion.ac.il> wrote:
I'll add two observations to what Zhang Ying said.
1. Internally in MonetDB, the index of a column (or a BAT) in MonetDB's BBP (BAT Buffer Pool) is a signed 32-bit integer (on most platforms), so there isn't a theoretical problem having more than 15,000 columns; but MonetDB was not designed with that many tables in mind. The "long dimension" is expected to be the length of columns.
2. When you have as many as 15,000 columns, you're not simply finding more attributes of the same items. So, for example, a person has a first name, last name, year of birth, etc. etc. - but after several dozens of these you're probably adding columns such as "data relevant to this person at point X", "data relevant to this person at point Y" and so on.
In your case, I'm guessing you have a lot of points and hence a lot of columns with data for your different items. What you should consider doing instead is having an extra "coordinate" column. Thus, in the example I gave, I would drop most of the columns, in favor of a second table named "person_data". This table's columns would be "person_id" (or some other key columns), "sample_point" and "value". Thus instead of M columns and N records you would have N x M records with 3 columns.
Eyal
Hai Oksana,
A.f.a.I.k., there is no explicit limit on the #columns you can put in a relation (did you mean “table”?). The reason that mserver5 stopped responding is probably because it is busy with I/O.
A long time ago, I once tried to create 1M tables, each with 3 columns. After about a day or something, MonetDB has created ~400K tables, and I could hear my HDD being constantly busy. MonetDB was still working, but
On 09/10/2018 16:50, Ying Zhang wrote: that was already a sufficient evidence that that idea wasn’t going to work.
So the question is actually what do you want to achieve?
Regards, Jennie
On 9 Oct 2018, at 13:43, Oxana Dolmatova <oxana.dolmatova@gmail.com>
Hello,
Could you tell the limitation for number of columns in a relation and
wrote: the main reason for that limitation. So far I could create relations with up to 15000 columns. When the number of columns is bigger mserver5 stops responding.
Thank you and best regards, Oksana Dolmatova _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Best regards, Oksana Dolmatova

If you want to model an NxM matrix of elements of type MYTYPE in a DBMS (not just MonetDB), you would typically have a table with 3 columns: CREATE TABLE my_matrix ( "row" INTEGER NOT NULL, "col" INTEGER NOT NULL, "cell_value" MYTYPE NOT NULL, PRIMARY KEY ("row", "col") ); of length N x M. Anyways, I'm not the expert on Arrays in our group so I'll let others chime in. Eyal On 10/10/2018 13:01, Oxana Dolmatova wrote:
Hello,
Thank you for all the replies and information! The version is 11.23, plus I'm using the cloud platform with 4cpu-16ram-hpc. The number of rows was just 1K for 15K attributes in a table. I understand that MonetDB was made with no intention to create and process tables with that many columns. But I'm interested in matrix operations integration, so I would like to know what is the main bottleneck to process such data (matrix-like sized) in the system.
Best regards, Oksana
On Tue, 9 Oct 2018 at 21:02, Eyal Rozenberg <eyalroz@technion.ac.il <mailto:eyalroz@technion.ac.il>> wrote:
I'll add two observations to what Zhang Ying said.
1. Internally in MonetDB, the index of a column (or a BAT) in MonetDB's BBP (BAT Buffer Pool) is a signed 32-bit integer (on most platforms), so there isn't a theoretical problem having more than 15,000 columns; but MonetDB was not designed with that many tables in mind. The "long dimension" is expected to be the length of columns.
2. When you have as many as 15,000 columns, you're not simply finding more attributes of the same items. So, for example, a person has a first name, last name, year of birth, etc. etc. - but after several dozens of these you're probably adding columns such as "data relevant to this person at point X", "data relevant to this person at point Y" and so on.
In your case, I'm guessing you have a lot of points and hence a lot of columns with data for your different items. What you should consider doing instead is having an extra "coordinate" column. Thus, in the example I gave, I would drop most of the columns, in favor of a second table named "person_data". This table's columns would be "person_id" (or some other key columns), "sample_point" and "value". Thus instead of M columns and N records you would have N x M records with 3 columns.
Eyal
On 09/10/2018 16:50, Ying Zhang wrote: > Hai Oksana, > > A.f.a.I.k., there is no explicit limit on the #columns you can put in a relation (did you mean “table”?). The reason that mserver5 stopped responding is probably because it is busy with I/O. > > A long time ago, I once tried to create 1M tables, each with 3 columns. After about a day or something, MonetDB has created ~400K tables, and I could hear my HDD being constantly busy. MonetDB was still working, but that was already a sufficient evidence that that idea wasn’t going to work. > > So the question is actually what do you want to achieve? > > Regards, > Jennie > > >> On 9 Oct 2018, at 13:43, Oxana Dolmatova <oxana.dolmatova@gmail.com <mailto:oxana.dolmatova@gmail.com>> wrote: >> >> Hello, >> >> Could you tell the limitation for number of columns in a relation and the main reason for that limitation. So far I could create relations with up to 15000 columns. When the number of columns is bigger mserver5 stops responding. >> >> Thank you and best regards, >> Oksana Dolmatova >> _______________________________________________ >> users-list mailing list >> users-list@monetdb.org <mailto:users-list@monetdb.org> >> https://www.monetdb.org/mailman/listinfo/users-list > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list >
-- Best regards, Oksana Dolmatova
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

There are a couple of limits that you need to consider. There is a *hard* limit on the number of BATs in MonetDB. This limit is 16384000 on a 64-bit machine. It can only be increased at compile time. Each column of each table in MonetDB is mapped to a BAT. In addition, there are a few more BATs per table. During each transaction that modifies a table, you get another BAT per column (and some extra). Each query produces intermediate results. Again, that can potentially be a lot of BATs, very much depending on the query. Each BAT that is being accessed during a query (column of a table, or intermediate) has to be available in (virtual) memory. If the BATs are small, that may be allocated memory, but if they are a bit larger, that means memory-mapped file. There is a limit to the number of memory-mapped files a process can have. This is set by the operating system (i.e. not by MonetDB). I've seen limits at around 65000. This also means that if your memory is limited, data is transferred between disk and memory as MonetDB accesses it. The disk can either be a file in the file system (for memory-mapped files) or swap space (for allocated data). If this starts happening, processing becomes excruciatingly slow. This can potentially be solved by buying more memory. I agree with Eyal that generally limiting the number of columns in the way he describes is a good idea. On 10/10/2018 13.40, Eyal Rozenberg wrote:
If you want to model an NxM matrix of elements of type MYTYPE in a DBMS (not just MonetDB), you would typically have a table with 3 columns:
CREATE TABLE my_matrix ( "row" INTEGER NOT NULL, "col" INTEGER NOT NULL, "cell_value" MYTYPE NOT NULL, PRIMARY KEY ("row", "col") );
of length N x M.
Anyways, I'm not the expert on Arrays in our group so I'll let others chime in.
Eyal
On 10/10/2018 13:01, Oxana Dolmatova wrote:
Hello,
Thank you for all the replies and information! The version is 11.23, plus I'm using the cloud platform with 4cpu-16ram-hpc. The number of rows was just 1K for 15K attributes in a table. I understand that MonetDB was made with no intention to create and process tables with that many columns. But I'm interested in matrix operations integration, so I would like to know what is the main bottleneck to process such data (matrix-like sized) in the system.
Best regards, Oksana
On Tue, 9 Oct 2018 at 21:02, Eyal Rozenberg <eyalroz@technion.ac.il <mailto:eyalroz@technion.ac.il>> wrote:
I'll add two observations to what Zhang Ying said.
1. Internally in MonetDB, the index of a column (or a BAT) in MonetDB's BBP (BAT Buffer Pool) is a signed 32-bit integer (on most platforms), so there isn't a theoretical problem having more than 15,000 columns; but MonetDB was not designed with that many tables in mind. The "long dimension" is expected to be the length of columns.
2. When you have as many as 15,000 columns, you're not simply finding more attributes of the same items. So, for example, a person has a first name, last name, year of birth, etc. etc. - but after several dozens of these you're probably adding columns such as "data relevant to this person at point X", "data relevant to this person at point Y" and so on.
In your case, I'm guessing you have a lot of points and hence a lot of columns with data for your different items. What you should consider doing instead is having an extra "coordinate" column. Thus, in the example I gave, I would drop most of the columns, in favor of a second table named "person_data". This table's columns would be "person_id" (or some other key columns), "sample_point" and "value". Thus instead of M columns and N records you would have N x M records with 3 columns.
Eyal
On 09/10/2018 16:50, Ying Zhang wrote: > Hai Oksana, > > A.f.a.I.k., there is no explicit limit on the #columns you can put in a relation (did you mean “table”?). The reason that mserver5 stopped responding is probably because it is busy with I/O. > > A long time ago, I once tried to create 1M tables, each with 3 columns. After about a day or something, MonetDB has created ~400K tables, and I could hear my HDD being constantly busy. MonetDB was still working, but that was already a sufficient evidence that that idea wasn’t going to work. > > So the question is actually what do you want to achieve? > > Regards, > Jennie > > >> On 9 Oct 2018, at 13:43, Oxana Dolmatova <oxana.dolmatova@gmail.com <mailto:oxana.dolmatova@gmail.com>> wrote: >> >> Hello, >> >> Could you tell the limitation for number of columns in a relation and the main reason for that limitation. So far I could create relations with up to 15000 columns. When the number of columns is bigger mserver5 stops responding. >> >> Thank you and best regards, >> Oksana Dolmatova >> _______________________________________________ >> users-list mailing list >> users-list@monetdb.org <mailto:users-list@monetdb.org> >> https://www.monetdb.org/mailman/listinfo/users-list > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list >
-- Best regards, Oksana Dolmatova
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender

Hi Eyal, Sorry for the late reply. Yes, there is no problem with large number of columns in sparse representation of arrays (i.e., dimension, dimension, value). However, in order to perform relational operations, this structure should be pivoted first. When matrices are represented by tables, both algebras can be applied to the same object, so I'm specifically looking for the main bottleneck of processing hundreds of thousands of columns in MonetDB. Best regards, Oksana On Wed, 10 Oct 2018 at 13:40, Eyal Rozenberg <E.Rozenberg@cwi.nl> wrote:
If you want to model an NxM matrix of elements of type MYTYPE in a DBMS (not just MonetDB), you would typically have a table with 3 columns:
CREATE TABLE my_matrix ( "row" INTEGER NOT NULL, "col" INTEGER NOT NULL, "cell_value" MYTYPE NOT NULL, PRIMARY KEY ("row", "col") );
of length N x M.
Anyways, I'm not the expert on Arrays in our group so I'll let others chime in.
Eyal
Hello,
Thank you for all the replies and information! The version is 11.23, plus I'm using the cloud platform with 4cpu-16ram-hpc. The number of rows was just 1K for 15K attributes in a table. I understand that MonetDB was made with no intention to create and process tables with that many columns. But I'm interested in matrix operations integration, so I would like to know what is the main bottleneck to process such data (matrix-like sized) in the system.
Best regards, Oksana
On Tue, 9 Oct 2018 at 21:02, Eyal Rozenberg <eyalroz@technion.ac.il <mailto:eyalroz@technion.ac.il>> wrote:
I'll add two observations to what Zhang Ying said.
1. Internally in MonetDB, the index of a column (or a BAT) in MonetDB's BBP (BAT Buffer Pool) is a signed 32-bit integer (on most
On 10/10/2018 13:01, Oxana Dolmatova wrote: platforms), so
there isn't a theoretical problem having more than 15,000 columns;
but
MonetDB was not designed with that many tables in mind. The "long dimension" is expected to be the length of columns.
2. When you have as many as 15,000 columns, you're not simply finding more attributes of the same items. So, for example, a person has a
first
name, last name, year of birth, etc. etc. - but after several dozens
of
these you're probably adding columns such as "data relevant to this person at point X", "data relevant to this person at point Y" and so
on.
In your case, I'm guessing you have a lot of points and hence a lot
of
columns with data for your different items. What you should consider doing instead is having an extra "coordinate" column. Thus, in the example I gave, I would drop most of the columns, in favor of a
second
table named "person_data". This table's columns would be "person_id"
(or
some other key columns), "sample_point" and "value". Thus instead of
M
columns and N records you would have N x M records with 3 columns.
Eyal
On 09/10/2018 16:50, Ying Zhang wrote: > Hai Oksana, > > A.f.a.I.k., there is no explicit limit on the #columns you can put in a relation (did you mean “table”?). The reason that mserver5 stopped responding is probably because it is busy with I/O. > > A long time ago, I once tried to create 1M tables, each with 3 columns. After about a day or something, MonetDB has created ~400K tables, and I could hear my HDD being constantly busy. MonetDB was still working, but that was already a sufficient evidence that that idea wasn’t going to work. > > So the question is actually what do you want to achieve? > > Regards, > Jennie > > >> On 9 Oct 2018, at 13:43, Oxana Dolmatova <oxana.dolmatova@gmail.com <mailto:oxana.dolmatova@gmail.com>>
wrote:
>> >> Hello, >> >> Could you tell the limitation for number of columns in a relation and the main reason for that limitation. So far I could create relations with up to 15000 columns. When the number of columns is bigger mserver5 stops responding. >> >> Thank you and best regards, >> Oksana Dolmatova >> _______________________________________________ >> users-list mailing list >> users-list@monetdb.org <mailto:users-list@monetdb.org> >> https://www.monetdb.org/mailman/listinfo/users-list > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list >
-- Best regards, Oksana Dolmatova
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Best regards, Oksana Dolmatova

Hi Oksana, the main challenge (and hence bottleneck) for most (if not all) DBMSs --- i.e., not only, but also, MonetDB --- is that many columns/attributes with few tuples/rows (rather than few columns/attributes with many tuples/rows) stresses the catalogue management (and related "things" such as query parsing incl. syntax and semantics checks, query translation, query optimization, and (physical) plan interpretation/execution) much more than pure data management. Most (if not all) DBMSs --- and also MonetDB --- are more optimized for handling data than for handling "unexpectedly" huge catalogues. While Sjoerd mentioned the hard limited, performance bottlenecks due to "inefficient" catalogue management are much more vague and hard to summarize let alone predict or quantify in general. Did you by any chance try any other DBMSs with your "non-standard" workload? If so, how do those experiences and limitations compare to MonetDB? Best, Stefan ----- On Oct 15, 2018, at 5:14 PM, Oksana Dolmatova oxana.dolmatova@gmail.com wrote:
Hi Eyal,
Sorry for the late reply. Yes, there is no problem with large number of columns in sparse representation of arrays (i.e., dimension, dimension, value). However, in order to perform relational operations, this structure should be pivoted first. When matrices are represented by tables, both algebras can be applied to the same object, so I'm specifically looking for the main bottleneck of processing hundreds of thousands of columns in MonetDB.
Best regards, Oksana
On Wed, 10 Oct 2018 at 13:40, Eyal Rozenberg < E.Rozenberg@cwi.nl > wrote:
If you want to model an NxM matrix of elements of type MYTYPE in a DBMS (not just MonetDB), you would typically have a table with 3 columns:
CREATE TABLE my_matrix ( "row" INTEGER NOT NULL, "col" INTEGER NOT NULL, "cell_value" MYTYPE NOT NULL, PRIMARY KEY ("row", "col") );
of length N x M.
Anyways, I'm not the expert on Arrays in our group so I'll let others chime in.
Eyal
On 10/10/2018 13:01, Oxana Dolmatova wrote:
Hello,
Thank you for all the replies and information! The version is 11.23, plus I'm using the cloud platform with 4cpu-16ram-hpc. The number of rows was just 1K for 15K attributes in a table. I understand that MonetDB was made with no intention to create and process tables with that many columns. But I'm interested in matrix operations integration, so I would like to know what is the main bottleneck to process such data (matrix-like sized) in the system.
Best regards, Oksana
On Tue, 9 Oct 2018 at 21:02, Eyal Rozenberg < eyalroz@technion.ac.il <mailto: eyalroz@technion.ac.il >> wrote:
I'll add two observations to what Zhang Ying said.
1. Internally in MonetDB, the index of a column (or a BAT) in MonetDB's BBP (BAT Buffer Pool) is a signed 32-bit integer (on most platforms), so there isn't a theoretical problem having more than 15,000 columns; but MonetDB was not designed with that many tables in mind. The "long dimension" is expected to be the length of columns.
2. When you have as many as 15,000 columns, you're not simply finding more attributes of the same items. So, for example, a person has a first name, last name, year of birth, etc. etc. - but after several dozens of these you're probably adding columns such as "data relevant to this person at point X", "data relevant to this person at point Y" and so on.
In your case, I'm guessing you have a lot of points and hence a lot of columns with data for your different items. What you should consider doing instead is having an extra "coordinate" column. Thus, in the example I gave, I would drop most of the columns, in favor of a second table named "person_data". This table's columns would be "person_id" (or some other key columns), "sample_point" and "value". Thus instead of M columns and N records you would have N x M records with 3 columns.
Eyal
On 09/10/2018 16:50, Ying Zhang wrote:
Hai Oksana,
A.f.a.I.k., there is no explicit limit on the #columns you can put in a relation (did you mean “table”?). The reason that mserver5 stopped responding is probably because it is busy with I/O.
A long time ago, I once tried to create 1M tables, each with 3
columns. After about a day or something, MonetDB has created ~400K tables, and I could hear my HDD being constantly busy. MonetDB was still working, but that was already a sufficient evidence that that idea wasn’t going to work. > > So the question is actually what do you want to achieve? > > Regards, > Jennie > > >> On 9 Oct 2018, at 13:43, Oxana Dolmatova < oxana.dolmatova@gmail.com <mailto: oxana.dolmatova@gmail.com >> wrote: >> >> Hello, >> >> Could you tell the limitation for number of columns in a relation and the main reason for that limitation. So far I could create relations with up to 15000 columns. When the number of columns is bigger mserver5 stops responding. >> >> Thank you and best regards, >> Oksana Dolmatova >> _______________________________________________ >> users-list mailing list >> users-list@monetdb.org <mailto: users-list@monetdb.org > >> https://www.monetdb.org/mailman/listinfo/users-list > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto: users-list@monetdb.org > > https://www.monetdb.org/mailman/listinfo/users-list >
-- Best regards, Oksana Dolmatova
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Best regards, Oksana Dolmatova
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
participants (6)
-
Eyal Rozenberg
-
Eyal Rozenberg
-
Oxana Dolmatova
-
Sjoerd Mullender
-
Stefan Manegold
-
Ying Zhang