Fwd: BULK UPLOAD very poor performance

Hi! After a big battle against the DB, I found 2 issues. 1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally different in terms of performance. While the first one last 24 minutes, the second one last only *90 secs.* 2) As I had generated all those files thru msqldump (which it states the "CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run the COPY *XXXX OFFSET *RECORDS (in order to avoid the create table statement)... and the query, because of the word "STDIN" in the file. It was not OFFSETTING all the rows I was asking for. I hope I've been clear... if not please let me know! Ariel On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten <martin.kersten@cwi.nl> wrote:
Hi
I am not sure what issue you are now dealing with, but since you are a company that seems to use (or considering) MonetDB in your commercial offerings, I gladly refer you to MonetDBSolutions(*), where we can give you the commercial support needed to diagnose your issues.
regards Martin
(*) https://www.monetdbsolutions.com/
Thanks Martin for your prompt response But I did and msqldump from other database, and now the upload is being done thru the BULK LOAD.
/COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING DELIMITERS '\t','\n','"';/
Any other idea ? :'(
On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten <martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl>> wrote:
Hi,
given the numbers I suspect you are loading them as SQL insert statements under autocommit mode, Please read the documentation on Bulk loading.
regards,Martin Sent from my iPad
On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto:
aabadi@starbi.com>> wrote:
Hi! Maybe somebody can help me, please!
I'm trying to upload into Monet a file with 14MM records. Is taking
more than *24 minutes.*
It is a fresh new database.
Enclosed you will find the server configuration.
Monetdb Version Nov2019-SP3
*[root@mn tmp]$ lscpu* Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz Stepping: 4 CPU MHz: 800.479 BogoMIPS: 4400.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 14080K NUMA node0 CPU(s):
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s):
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
On 24/03/2020 23:55, Ariel Abadi wrote: popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3
invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow
vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw
avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d
*[root@mn tmp]$ cat /proc/meminfo*
MemTotal: 83886080 kB MemFree: 10535812 kB MemAvailable: 80159744 kB Buffers: 0 kB Cached: 69623932 kB SwapCached: 0 kB Active: 6625436 kB Inactive: 63737872 kB Active(anon): 324456 kB Inactive(anon): 417184 kB Active(file): 6300980 kB Inactive(file): 63320688 kB Unevictable: 0 kB Mlocked: 5320 kB SwapTotal: 8388604 kB SwapFree: 8381988 kB Dirty: 220 kB Writeback: 0 kB AnonPages: 13626052 kB Mapped: 186368 kB Shmem: 1800 kB KReclaimable: 4134960 kB Slab: 0 kB SReclaimable: 0 kB SUnreclaim: 0 kB KernelStack: 16720 kB PageTables: 56500 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 74123472 kB Committed_AS: 18337812 kB VmallocTotal: 34359738367 kB VmallocUsed: 415836 kB VmallocChunk: 0 kB Percpu: 39552 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 816396 kB DirectMap2M: 82513920 kB DirectMap1G: 52428800 kB
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hi Ariel, The first issue is not totally surprising to me. My best guess is that reading from a file is using buffered input while reading from stdin is not. See for example [1] and [2]. I am not sure there is a lot we can do at the MonetDB side about this. About the second issue could you please post the exact COPY INTO command and the first few lines of the file you are trying to load so that we can figure out the exact issue? Best regards, Panos. [1] https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-... [2] https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/172... On 3/26/20 11:51 AM, Ariel Abadi wrote:
Hi!
After a big battle against the DB, I found 2 issues.
1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally different in terms of performance. While the first one last 24 minutes, the second one last only *90 secs.* 2) As I had generated all those files thru msqldump (which it states the "CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run the COPY *XXXX OFFSET *RECORDS (in order to avoid the create table statement)... and the query, because of the word "STDIN" in the file. It was not OFFSETTING all the rows I was asking for.
I hope I've been clear... if not please let me know!
Ariel
On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten <martin.kersten@cwi.nl> wrote:
Hi
I am not sure what issue you are now dealing with, but since you are a company that seems to use (or considering) MonetDB in your commercial offerings, I gladly refer you to MonetDBSolutions(*), where we can give you the commercial support needed to diagnose your issues.
regards Martin
(*) https://www.monetdbsolutions.com/
Thanks Martin for your prompt response But I did and msqldump from other database, and now the upload is being done thru the BULK LOAD.
/COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING DELIMITERS '\t','\n','"';/
Any other idea ? :'(
On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten <martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl>> wrote:
Hi,
given the numbers I suspect you are loading them as SQL insert statements under autocommit mode, Please read the documentation on Bulk loading.
regards,Martin Sent from my iPad
On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto:
aabadi@starbi.com>> wrote:
Hi! Maybe somebody can help me, please!
I'm trying to upload into Monet a file with 14MM records. Is taking
more than *24 minutes.*
It is a fresh new database.
Enclosed you will find the server configuration.
Monetdb Version Nov2019-SP3
*[root@mn tmp]$ lscpu* Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz Stepping: 4 CPU MHz: 800.479 BogoMIPS: 4400.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 14080K NUMA node0 CPU(s):
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s):
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
On 24/03/2020 23:55, Ariel Abadi wrote: popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3
invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow
vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw
avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d
*[root@mn tmp]$ cat /proc/meminfo*
MemTotal: 83886080 kB MemFree: 10535812 kB MemAvailable: 80159744 kB Buffers: 0 kB Cached: 69623932 kB SwapCached: 0 kB Active: 6625436 kB Inactive: 63737872 kB Active(anon): 324456 kB Inactive(anon): 417184 kB Active(file): 6300980 kB Inactive(file): 63320688 kB Unevictable: 0 kB Mlocked: 5320 kB SwapTotal: 8388604 kB SwapFree: 8381988 kB Dirty: 220 kB Writeback: 0 kB AnonPages: 13626052 kB Mapped: 186368 kB Shmem: 1800 kB KReclaimable: 4134960 kB Slab: 0 kB SReclaimable: 0 kB SUnreclaim: 0 kB KernelStack: 16720 kB PageTables: 56500 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 74123472 kB Committed_AS: 18337812 kB VmallocTotal: 34359738367 kB VmallocUsed: 415836 kB VmallocChunk: 0 kB Percpu: 39552 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 816396 kB DirectMap2M: 82513920 kB DirectMap1G: 52428800 kB
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hi Panos! Enclosed you can find the file (truncated) that I used and the statement I run on mclient. COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt' USING DELIMITERS '\t','\n','"'; Thks Ariel On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The first issue is not totally surprising to me. My best guess is that reading from a file is using buffered input while reading from stdin is not. See for example [1] and [2]. I am not sure there is a lot we can do at the MonetDB side about this.
About the second issue could you please post the exact COPY INTO command and the first few lines of the file you are trying to load so that we can figure out the exact issue?
Best regards, Panos.
[1] https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-... [2] https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/172...
Hi!
After a big battle against the DB, I found 2 issues.
1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally different in terms of performance. While the first one last 24 minutes,
second one last only *90 secs.* 2) As I had generated all those files thru msqldump (which it states the "CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run the COPY *XXXX OFFSET *RECORDS (in order to avoid the create table statement)... and the query, because of the word "STDIN" in the file. It was not OFFSETTING all the rows I was asking for.
I hope I've been clear... if not please let me know!
Ariel
On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten <martin.kersten@cwi.nl> wrote:
Hi
I am not sure what issue you are now dealing with, but since you are a company that seems to use (or considering) MonetDB in your commercial offerings, I gladly refer you to MonetDBSolutions(*), where we can give you the commercial support needed to diagnose your issues.
regards Martin
(*) https://www.monetdbsolutions.com/
On 24/03/2020 23:55, Ariel Abadi wrote:
Thanks Martin for your prompt response But I did and msqldump from other database, and now the upload is being done thru the BULK LOAD.
/COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING DELIMITERS '\t','\n','"';/
Any other idea ? :'(
On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten <martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl>> wrote:
Hi,
given the numbers I suspect you are loading them as SQL insert statements under autocommit mode, Please read the documentation on Bulk loading.
regards,Martin Sent from my iPad
On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto:
aabadi@starbi.com>> wrote:
Hi! Maybe somebody can help me, please!
I'm trying to upload into Monet a file with 14MM records. Is
taking more than *24 minutes.*
It is a fresh new database.
Enclosed you will find the server configuration.
Monetdb Version Nov2019-SP3
*[root@mn tmp]$ lscpu* Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz Stepping: 4 CPU MHz: 800.479 BogoMIPS: 4400.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 14080K NUMA node0 CPU(s):
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s):
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
On 3/26/20 11:51 AM, Ariel Abadi wrote: the pbe
cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3
invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow
vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw
avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d
*[root@mn tmp]$ cat /proc/meminfo*
MemTotal: 83886080 kB MemFree: 10535812 kB MemAvailable: 80159744 kB Buffers: 0 kB Cached: 69623932 kB SwapCached: 0 kB Active: 6625436 kB Inactive: 63737872 kB Active(anon): 324456 kB Inactive(anon): 417184 kB Active(file): 6300980 kB Inactive(file): 63320688 kB Unevictable: 0 kB Mlocked: 5320 kB SwapTotal: 8388604 kB SwapFree: 8381988 kB Dirty: 220 kB Writeback: 0 kB AnonPages: 13626052 kB Mapped: 186368 kB Shmem: 1800 kB KReclaimable: 4134960 kB Slab: 0 kB SReclaimable: 0 kB SUnreclaim: 0 kB KernelStack: 16720 kB PageTables: 56500 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 74123472 kB Committed_AS: 18337812 kB VmallocTotal: 34359738367 kB VmallocUsed: 415836 kB VmallocChunk: 0 kB Percpu: 39552 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 816396 kB DirectMap2M: 82513920 kB DirectMap1G: 52428800 kB
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hi Ariel, The COPY INTO command recognizes a series of records of the form: <data><field delimiter><data><field delimiter>...<data><record delimiter> For example the command COPY INTO tbl FROM '/path/to/file' USING DELIMITERS '\t','\n','"'; will recognize lines of the form: 1 "MIA" 7 "MIA|115331571990001804" 2000-08-01 but nothing else. Specifically the first 222 lines in the monet_list.txt file be rejected by COPY INTO. You need to delete those lines before you try to load data with copy into. Take a look at the documentation of COPY INTO in the MonetDB web page [1]. Hope this helps, Panos. [1] https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/CSV_bulk_loads On 3/30/20 7:42 PM, Ariel Abadi wrote:
Hi Panos!
Enclosed you can find the file (truncated) that I used and the statement I run on mclient.
COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt' USING DELIMITERS '\t','\n','"';
Thks Ariel
On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The first issue is not totally surprising to me. My best guess is that reading from a file is using buffered input while reading from stdin is not. See for example [1] and [2]. I am not sure there is a lot we can do at the MonetDB side about this.
About the second issue could you please post the exact COPY INTO command and the first few lines of the file you are trying to load so that we can figure out the exact issue?
Best regards, Panos.
[1] https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-... [2] https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/172...
Hi!
After a big battle against the DB, I found 2 issues.
1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally different in terms of performance. While the first one last 24 minutes,
second one last only *90 secs.* 2) As I had generated all those files thru msqldump (which it states the "CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run the COPY *XXXX OFFSET *RECORDS (in order to avoid the create table statement)... and the query, because of the word "STDIN" in the file. It was not OFFSETTING all the rows I was asking for.
I hope I've been clear... if not please let me know!
Ariel
On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten <martin.kersten@cwi.nl> wrote:
Hi
I am not sure what issue you are now dealing with, but since you are a company that seems to use (or considering) MonetDB in your commercial offerings, I gladly refer you to MonetDBSolutions(*), where we can give you the commercial support needed to diagnose your issues.
regards Martin
(*) https://www.monetdbsolutions.com/
On 24/03/2020 23:55, Ariel Abadi wrote:
Thanks Martin for your prompt response But I did and msqldump from other database, and now the upload is being done thru the BULK LOAD.
/COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING DELIMITERS '\t','\n','"';/
Any other idea ? :'(
On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten <martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl>> wrote:
Hi,
given the numbers I suspect you are loading them as SQL insert statements under autocommit mode, Please read the documentation on Bulk loading.
regards,Martin Sent from my iPad
On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto:
aabadi@starbi.com>> wrote:
Hi! Maybe somebody can help me, please!
I'm trying to upload into Monet a file with 14MM records. Is
taking more than *24 minutes.*
It is a fresh new database.
Enclosed you will find the server configuration.
Monetdb Version Nov2019-SP3
*[root@mn tmp]$ lscpu* Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz Stepping: 4 CPU MHz: 800.479 BogoMIPS: 4400.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 14080K NUMA node0 CPU(s):
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s):
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
On 3/26/20 11:51 AM, Ariel Abadi wrote: the pbe
cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3
invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow
vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw
avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d
*[root@mn tmp]$ cat /proc/meminfo*
MemTotal: 83886080 kB MemFree: 10535812 kB MemAvailable: 80159744 kB Buffers: 0 kB Cached: 69623932 kB SwapCached: 0 kB Active: 6625436 kB Inactive: 63737872 kB Active(anon): 324456 kB Inactive(anon): 417184 kB Active(file): 6300980 kB Inactive(file): 63320688 kB Unevictable: 0 kB Mlocked: 5320 kB SwapTotal: 8388604 kB SwapFree: 8381988 kB Dirty: 220 kB Writeback: 0 kB AnonPages: 13626052 kB Mapped: 186368 kB Shmem: 1800 kB KReclaimable: 4134960 kB Slab: 0 kB SReclaimable: 0 kB SUnreclaim: 0 kB KernelStack: 16720 kB PageTables: 56500 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 74123472 kB Committed_AS: 18337812 kB VmallocTotal: 34359738367 kB VmallocUsed: 415836 kB VmallocChunk: 0 kB Percpu: 39552 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 816396 kB DirectMap2M: 82513920 kB DirectMap1G: 52428800 kB
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hi Panos, Great assistance here! One thing: I can understand why Ariel tried to use OFFSET for that purpose, maybe nice to add to the documentation that skipped lines also should follow the format of the remainder! Greetings, Arjen On Tue, 31 Mar 2020 at 10:51, Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The COPY INTO command recognizes a series of records of the form:
<data><field delimiter><data><field delimiter>...<data><record delimiter>
For example the command
COPY INTO tbl FROM '/path/to/file' USING DELIMITERS '\t','\n','"';
will recognize lines of the form:
1 "MIA" 7 "MIA|115331571990001804" 2000-08-01
but nothing else. Specifically the first 222 lines in the monet_list.txt file be rejected by COPY INTO. You need to delete those lines before you try to load data with copy into.
Take a look at the documentation of COPY INTO in the MonetDB web page [1].
Hope this helps, Panos.
[1] https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/CSV_bulk_loads
Hi Panos!
Enclosed you can find the file (truncated) that I used and the statement I run on mclient.
COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt' USING DELIMITERS '\t','\n','"';
Thks Ariel
On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The first issue is not totally surprising to me. My best guess is that reading from a file is using buffered input while reading from stdin is not. See for example [1] and [2]. I am not sure there is a lot we can do at the MonetDB side about this.
About the second issue could you please post the exact COPY INTO command and the first few lines of the file you are trying to load so that we can figure out the exact issue?
Best regards, Panos.
[1]
https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-...
[2]
https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/172...
Hi!
After a big battle against the DB, I found 2 issues.
1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally different in terms of performance. While the first one last 24 minutes,
On 3/26/20 11:51 AM, Ariel Abadi wrote: the
second one last only *90 secs.* 2) As I had generated all those files thru msqldump (which it states
"CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run
On 3/30/20 7:42 PM, Ariel Abadi wrote: the the
COPY *XXXX OFFSET *RECORDS (in order to avoid the create table statement)... and the query, because of the word "STDIN" in the file. It was not OFFSETTING all the rows I was asking for.
I hope I've been clear... if not please let me know!
Ariel
On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten <martin.kersten@cwi.nl
wrote:
Hi
I am not sure what issue you are now dealing with, but since you are a company that seems to use (or considering) MonetDB in your commercial offerings, I gladly refer you to MonetDBSolutions(*), where we can give you the commercial support needed to diagnose your issues.
regards Martin
(*) https://www.monetdbsolutions.com/
Thanks Martin for your prompt response But I did and msqldump from other database, and now the upload is being done thru the BULK LOAD.
/COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING DELIMITERS '\t','\n','"';/
Any other idea ? :'(
On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten < martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl>> wrote:
Hi,
given the numbers I suspect you are loading them as SQL insert statements under autocommit mode, Please read the documentation on Bulk loading.
regards,Martin Sent from my iPad
> On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto: aabadi@starbi.com>> wrote: > > > Hi! > Maybe somebody can help me, please! > > I'm trying to upload into Monet a file with 14MM records. Is taking more than *24 minutes.* > It is a fresh new database. > > Enclosed you will find the server configuration. > > Monetdb Version Nov2019-SP3 > > > *[root@mn tmp]$ lscpu* > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 40 > On-line CPU(s) list: 0-39 > Thread(s) per core: 2 > Core(s) per socket: 10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 85 > Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz > Stepping: 4 > CPU MHz: 800.479 > BogoMIPS: 4400.00 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 1024K > L3 cache: 14080K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
On 24/03/2020 23:55, Ariel Abadi wrote: pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3
> invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw > avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d > > > *[root@mn tmp]$ cat /proc/meminfo* > > MemTotal: 83886080 kB > MemFree: 10535812 kB > MemAvailable: 80159744 kB > Buffers: 0 kB > Cached: 69623932 kB > SwapCached: 0 kB > Active: 6625436 kB > Inactive: 63737872 kB > Active(anon): 324456 kB > Inactive(anon): 417184 kB > Active(file): 6300980 kB > Inactive(file): 63320688 kB > Unevictable: 0 kB > Mlocked: 5320 kB > SwapTotal: 8388604 kB > SwapFree: 8381988 kB > Dirty: 220 kB > Writeback: 0 kB > AnonPages: 13626052 kB > Mapped: 186368 kB > Shmem: 1800 kB > KReclaimable: 4134960 kB > Slab: 0 kB > SReclaimable: 0 kB > SUnreclaim: 0 kB > KernelStack: 16720 kB > PageTables: 56500 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 74123472 kB > Committed_AS: 18337812 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 415836 kB > VmallocChunk: 0 kB > Percpu: 39552 kB > HardwareCorrupted: 0 kB > AnonHugePages: 0 kB > ShmemHugePages: 0 kB > ShmemPmdMapped: 0 kB > CmaTotal: 0 kB > CmaFree: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > Hugetlb: 0 kB > DirectMap4k: 816396 kB > DirectMap2M: 82513920 kB > DirectMap1G: 52428800 kB > > > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- ==================================================================== ICIS, office M1.00.05 Radboud University Mercator 1 Faculty of Science Toernooiveld 212 arjen@cs.ru.nl NL-6525 EC Nijmegen, The Netherlands +31-(0)24-365 2354 ===================== http://www.informagus.nl/ ====================

Hi Arjen, please see my more recent email about the subject. Best regards, Panos. On 3/31/20 11:13 AM, Arjen P. de Vries wrote:
Hi Panos,
Great assistance here!
One thing: I can understand why Ariel tried to use OFFSET for that purpose, maybe nice to add to the documentation that skipped lines also should follow the format of the remainder!
Greetings,
Arjen
On Tue, 31 Mar 2020 at 10:51, Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The COPY INTO command recognizes a series of records of the form:
<data><field delimiter><data><field delimiter>...<data><record delimiter>
For example the command
COPY INTO tbl FROM '/path/to/file' USING DELIMITERS '\t','\n','"';
will recognize lines of the form:
1 "MIA" 7 "MIA|115331571990001804" 2000-08-01
but nothing else. Specifically the first 222 lines in the monet_list.txt file be rejected by COPY INTO. You need to delete those lines before you try to load data with copy into.
Take a look at the documentation of COPY INTO in the MonetDB web page [1].
Hope this helps, Panos.
[1] https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/CSV_bulk_loads
Hi Panos!
Enclosed you can find the file (truncated) that I used and the statement I run on mclient.
COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt' USING DELIMITERS '\t','\n','"';
Thks Ariel
On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The first issue is not totally surprising to me. My best guess is that reading from a file is using buffered input while reading from stdin is not. See for example [1] and [2]. I am not sure there is a lot we can do at the MonetDB side about this.
About the second issue could you please post the exact COPY INTO command and the first few lines of the file you are trying to load so that we can figure out the exact issue?
Best regards, Panos.
[1]
https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-...
[2]
https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/172...
Hi!
After a big battle against the DB, I found 2 issues.
1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally different in terms of performance. While the first one last 24 minutes,
On 3/26/20 11:51 AM, Ariel Abadi wrote: the
second one last only *90 secs.* 2) As I had generated all those files thru msqldump (which it states
"CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run
On 3/30/20 7:42 PM, Ariel Abadi wrote: the the
COPY *XXXX OFFSET *RECORDS (in order to avoid the create table statement)... and the query, because of the word "STDIN" in the file. It was not OFFSETTING all the rows I was asking for.
I hope I've been clear... if not please let me know!
Ariel
On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten <martin.kersten@cwi.nl
wrote:
Hi
I am not sure what issue you are now dealing with, but since you are a company that seems to use (or considering) MonetDB in your commercial offerings, I gladly refer you to MonetDBSolutions(*), where we can give you the commercial support needed to diagnose your issues.
regards Martin
(*) https://www.monetdbsolutions.com/
On 24/03/2020 23:55, Ariel Abadi wrote: > Thanks Martin for your prompt response > But I did and msqldump from other database, and now the upload is being done thru the BULK LOAD. > > /COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING DELIMITERS '\t','\n','"';/ > > > Any other idea ? :'( > > > > > On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten < martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl>> wrote: > > Hi, > > given the numbers I suspect you are loading them as SQL insert statements under autocommit mode, > Please read the documentation on Bulk loading. > > regards,Martin > Sent from my iPad > >> On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto: aabadi@starbi.com>> wrote: >> >> >> Hi! >> Maybe somebody can help me, please! >> >> I'm trying to upload into Monet a file with 14MM records. Is taking more than *24 minutes.* >> It is a fresh new database. >> >> Enclosed you will find the server configuration. >> >> Monetdb Version Nov2019-SP3 >> >> >> *[root@mn tmp]$ lscpu* >> Architecture: x86_64 >> CPU op-mode(s): 32-bit, 64-bit >> Byte Order: Little Endian >> CPU(s): 40 >> On-line CPU(s) list: 0-39 >> Thread(s) per core: 2 >> Core(s) per socket: 10 >> Socket(s): 2 >> NUMA node(s): 2 >> Vendor ID: GenuineIntel >> CPU family: 6 >> Model: 85 >> Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz >> Stepping: 4 >> CPU MHz: 800.479 >> BogoMIPS: 4400.00 >> Virtualization: VT-x >> L1d cache: 32K >> L1i cache: 32K >> L2 cache: 1024K >> L3 cache: 14080K >> NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >> NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc >> cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 >> invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw >> avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d >> >> >> *[root@mn tmp]$ cat /proc/meminfo* >> >> MemTotal: 83886080 kB >> MemFree: 10535812 kB >> MemAvailable: 80159744 kB >> Buffers: 0 kB >> Cached: 69623932 kB >> SwapCached: 0 kB >> Active: 6625436 kB >> Inactive: 63737872 kB >> Active(anon): 324456 kB >> Inactive(anon): 417184 kB >> Active(file): 6300980 kB >> Inactive(file): 63320688 kB >> Unevictable: 0 kB >> Mlocked: 5320 kB >> SwapTotal: 8388604 kB >> SwapFree: 8381988 kB >> Dirty: 220 kB >> Writeback: 0 kB >> AnonPages: 13626052 kB >> Mapped: 186368 kB >> Shmem: 1800 kB >> KReclaimable: 4134960 kB >> Slab: 0 kB >> SReclaimable: 0 kB >> SUnreclaim: 0 kB >> KernelStack: 16720 kB >> PageTables: 56500 kB >> NFS_Unstable: 0 kB >> Bounce: 0 kB >> WritebackTmp: 0 kB >> CommitLimit: 74123472 kB >> Committed_AS: 18337812 kB >> VmallocTotal: 34359738367 kB >> VmallocUsed: 415836 kB >> VmallocChunk: 0 kB >> Percpu: 39552 kB >> HardwareCorrupted: 0 kB >> AnonHugePages: 0 kB >> ShmemHugePages: 0 kB >> ShmemPmdMapped: 0 kB >> CmaTotal: 0 kB >> CmaFree: 0 kB >> HugePages_Total: 0 >> HugePages_Free: 0 >> HugePages_Rsvd: 0 >> HugePages_Surp: 0 >> Hugepagesize: 2048 kB >> Hugetlb: 0 kB >> DirectMap4k: 816396 kB >> DirectMap2M: 82513920 kB >> DirectMap1G: 52428800 kB >> >> >> >> >> _______________________________________________ >> users-list mailing list >> users-list@monetdb.org <mailto:users-list@monetdb.org> >> https://www.monetdb.org/mailman/listinfo/users-list > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list >
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hi! Firstly, I would like to thank for the prompt answer!!! I will wait your answer Panos, to see wether is a bug or not. In the case this is NOT a bug, and you confirm what Arjen said regarding the format, I think it would be interesting to add such comment on the documentation ( https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/LoadingBulkData) which does not refer anything related to the format of the "titles". Also, it would be great to use the OFFSET clause disregards of the format of the remainder. Again, thank you very much to both!!! Ariel PS. The way I solved in this case, was to cut the first 222 rows of each file, and I imported using the Bulk load, which worked perfectly On Tue, Mar 31, 2020 at 7:16 AM Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Arjen,
please see my more recent email about the subject.
Best regards, Panos.
On 3/31/20 11:13 AM, Arjen P. de Vries wrote:
Hi Panos,
Great assistance here!
One thing: I can understand why Ariel tried to use OFFSET for that purpose, maybe nice to add to the documentation that skipped lines also should follow the format of the remainder!
Greetings,
Arjen
On Tue, 31 Mar 2020 at 10:51, Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The COPY INTO command recognizes a series of records of the form:
<data><field delimiter><data><field delimiter>...<data><record delimiter>
For example the command
COPY INTO tbl FROM '/path/to/file' USING DELIMITERS '\t','\n','"';
will recognize lines of the form:
1 "MIA" 7 "MIA|115331571990001804" 2000-08-01
but nothing else. Specifically the first 222 lines in the monet_list.txt file be rejected by COPY INTO. You need to delete those lines before you try to load data with copy into.
Take a look at the documentation of COPY INTO in the MonetDB web page [1].
Hope this helps, Panos.
[1]
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/CSV_bulk_loads
On 3/30/20 7:42 PM, Ariel Abadi wrote:
Hi Panos!
Enclosed you can find the file (truncated) that I used and the
statement
I
run on mclient.
COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt' USING DELIMITERS '\t','\n','"';
Thks Ariel
On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The first issue is not totally surprising to me. My best guess is that reading from a file is using buffered input while reading from stdin is not. See for example [1] and [2]. I am not sure there is a lot we can do at the MonetDB side about this.
About the second issue could you please post the exact COPY INTO command and the first few lines of the file you are trying to load so that we can figure out the exact issue?
Best regards, Panos.
[1]
https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-...
[2]
https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/172...
On 3/26/20 11:51 AM, Ariel Abadi wrote:
Hi!
After a big battle against the DB, I found 2 issues.
1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally different in terms of performance. While the first one last 24
the
second one last only *90 secs.* 2) As I had generated all those files thru msqldump (which it states
"CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run
minutes, the the
COPY *XXXX OFFSET *RECORDS (in order to avoid the create table statement)... and the query, because of the word "STDIN" in the file. It was not OFFSETTING all the rows I was asking for.
I hope I've been clear... if not please let me know!
Ariel
On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten < martin.kersten@cwi.nl
wrote:
> Hi > > I am not sure what issue you are now dealing with, but since you are a > company > that seems to use (or considering) MonetDB in your commercial offerings, > I gladly refer you to MonetDBSolutions(*), where we can give you the > commercial > support needed to diagnose your issues. > > regards Martin > > (*) https://www.monetdbsolutions.com/ > > On 24/03/2020 23:55, Ariel Abadi wrote: >> Thanks Martin for your prompt response >> But I did and msqldump from other database, and now the upload is being > done thru the BULK LOAD. >> >> /COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING > DELIMITERS '\t','\n','"';/ >> >> >> Any other idea ? :'( >> >> >> >> >> On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten < martin.kersten@cwi.nl > <mailto:martin.kersten@cwi.nl>> wrote: >> >> Hi, >> >> given the numbers I suspect you are loading them as SQL insert > statements under autocommit mode, >> Please read the documentation on Bulk loading. >> >> regards,Martin >> Sent from my iPad >> >>> On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto: > aabadi@starbi.com>> wrote: >>> >>> >>> Hi! >>> Maybe somebody can help me, please! >>> >>> I'm trying to upload into Monet a file with 14MM records. Is taking > more than *24 minutes.* >>> It is a fresh new database. >>> >>> Enclosed you will find the server configuration. >>> >>> Monetdb Version Nov2019-SP3 >>> >>> >>> *[root@mn tmp]$ lscpu* >>> Architecture: x86_64 >>> CPU op-mode(s): 32-bit, 64-bit >>> Byte Order: Little Endian >>> CPU(s): 40 >>> On-line CPU(s) list: 0-39 >>> Thread(s) per core: 2 >>> Core(s) per socket: 10 >>> Socket(s): 2 >>> NUMA node(s): 2 >>> Vendor ID: GenuineIntel >>> CPU family: 6 >>> Model: 85 >>> Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz >>> Stepping: 4 >>> CPU MHz: 800.479 >>> BogoMIPS: 4400.00 >>> Virtualization: VT-x >>> L1d cache: 32K >>> L1i cache: 32K >>> L2 cache: 1024K >>> L3 cache: 14080K >>> NUMA node0 CPU(s): > 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> NUMA node1 CPU(s): > 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep > mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe > syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts > rep_good nopl xtopology nonstop_tsc >>> cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est > tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe > popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm > 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 >>> invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow > vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep > bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap > clflushopt clwb intel_pt avx512cd avx512bw >>> avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc > cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear > flush_l1d >>> >>> >>> *[root@mn tmp]$ cat /proc/meminfo* >>> >>> MemTotal: 83886080 kB >>> MemFree: 10535812 kB >>> MemAvailable: 80159744 kB >>> Buffers: 0 kB >>> Cached: 69623932 kB >>> SwapCached: 0 kB >>> Active: 6625436 kB >>> Inactive: 63737872 kB >>> Active(anon): 324456 kB >>> Inactive(anon): 417184 kB >>> Active(file): 6300980 kB >>> Inactive(file): 63320688 kB >>> Unevictable: 0 kB >>> Mlocked: 5320 kB >>> SwapTotal: 8388604 kB >>> SwapFree: 8381988 kB >>> Dirty: 220 kB >>> Writeback: 0 kB >>> AnonPages: 13626052 kB >>> Mapped: 186368 kB >>> Shmem: 1800 kB >>> KReclaimable: 4134960 kB >>> Slab: 0 kB >>> SReclaimable: 0 kB >>> SUnreclaim: 0 kB >>> KernelStack: 16720 kB >>> PageTables: 56500 kB >>> NFS_Unstable: 0 kB >>> Bounce: 0 kB >>> WritebackTmp: 0 kB >>> CommitLimit: 74123472 kB >>> Committed_AS: 18337812 kB >>> VmallocTotal: 34359738367 kB >>> VmallocUsed: 415836 kB >>> VmallocChunk: 0 kB >>> Percpu: 39552 kB >>> HardwareCorrupted: 0 kB >>> AnonHugePages: 0 kB >>> ShmemHugePages: 0 kB >>> ShmemPmdMapped: 0 kB >>> CmaTotal: 0 kB >>> CmaFree: 0 kB >>> HugePages_Total: 0 >>> HugePages_Free: 0 >>> HugePages_Rsvd: 0 >>> HugePages_Surp: 0 >>> Hugepagesize: 2048 kB >>> Hugetlb: 0 kB >>> DirectMap4k: 816396 kB >>> DirectMap2M: 82513920 kB >>> DirectMap1G: 52428800 kB >>> >>> >>> >>> >>> _______________________________________________ >>> users-list mailing list >>> users-list@monetdb.org <mailto:users-list@monetdb.org> >>> https://www.monetdb.org/mailman/listinfo/users-list >> _______________________________________________ >> users-list mailing list >> users-list@monetdb.org <mailto:users-list@monetdb.org> >> https://www.monetdb.org/mailman/listinfo/users-list >> >> >> _______________________________________________ >> users-list mailing list >> users-list@monetdb.org >> https://www.monetdb.org/mailman/listinfo/users-list >> >
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

In principle, the OFFSET 223 should make that reading data starts at line 223. There is one problem, however, which makes that this doesn't work for you. The problem is that the quote character that you specified on the COPY INTO command is interpreted and so needs to be balanced. And the quote characters on the last line that is to be skipped are not balanced. Count them, there are 5. Because of this, the code that is skipping lines gets confused. The code should just count record delimiters (or arguably, but it doesn't, line endings--they can be different!). I think you did find a genuine bug. Can you please report this in out bugzilla https://bugs.monetdb.org/. On 31/03/2020 18.50, Ariel Abadi wrote:
Hi!
Firstly, I would like to thank for the prompt answer!!!
I will wait your answer Panos, to see wether is a bug or not.
In the case this is NOT a bug, and you confirm what Arjen said regarding the format, I think it would be interesting to add such comment on the documentation (https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/LoadingBulkData) which does not refer anything related to the format of the "titles".
Also, it would be great to use the OFFSET clause disregards of the format of the remainder.
Again, thank you very much to both!!!
Ariel
PS. The way I solved in this case, was to cut the first 222 rows of each file, and I imported using the Bulk load, which worked perfectly
On Tue, Mar 31, 2020 at 7:16 AM Panagiotis Koutsourakis <panagiotis.koutsourakis@monetdbsolutions.com <mailto:panagiotis.koutsourakis@monetdbsolutions.com>> wrote:
Hi Arjen,
please see my more recent email about the subject.
Best regards, Panos.
On 3/31/20 11:13 AM, Arjen P. de Vries wrote: > Hi Panos, > > Great assistance here! > > One thing: > I can understand why Ariel tried to use OFFSET for that purpose, maybe nice > to add to the documentation that skipped lines also should follow the > format of the remainder! > > Greetings, > > Arjen > > On Tue, 31 Mar 2020 at 10:51, Panagiotis Koutsourakis < > panagiotis.koutsourakis@monetdbsolutions.com <mailto:panagiotis.koutsourakis@monetdbsolutions.com>> wrote: > >> Hi Ariel, >> >> The COPY INTO command recognizes a series of records of the form: >> >> <data><field delimiter><data><field delimiter>...<data><record delimiter> >> >> For example the command >> >> COPY INTO tbl FROM '/path/to/file' USING DELIMITERS '\t','\n','"'; >> >> will recognize lines of the form: >> >> 1 "MIA" 7 "MIA|115331571990001804" 2000-08-01 >> >> but nothing else. Specifically the first 222 lines in the monet_list.txt >> file be rejected by COPY INTO. You need to delete those lines before you >> try to load data with copy into. >> >> Take a look at the documentation of COPY INTO in the MonetDB web page [1]. >> >> Hope this helps, >> Panos. >> >> [1] >> https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/CSV_bulk_loads >> >> On 3/30/20 7:42 PM, Ariel Abadi wrote: >>> Hi Panos! >>> >>> Enclosed you can find the file (truncated) that I used and the statement >> I >>> run on mclient. >>> >>> COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt' >>> USING DELIMITERS '\t','\n','"'; >>> >>> >>> Thks >>> Ariel >>> >>> >>> On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis < >>> panagiotis.koutsourakis@monetdbsolutions.com <mailto:panagiotis.koutsourakis@monetdbsolutions.com>> wrote: >>> >>>> Hi Ariel, >>>> >>>> The first issue is not totally surprising to me. My best guess is that >>>> reading from a file is using buffered input while reading from stdin is >>>> not. See for example [1] and [2]. I am not sure there is a lot we can >> do at >>>> the MonetDB side about this. >>>> >>>> About the second issue could you please post the exact COPY INTO command >>>> and the first few lines of the file you are trying to load so that we >> can >>>> figure out the exact issue? >>>> >>>> Best regards, >>>> Panos. >>>> >>>> [1] >>>> >> https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-... >>>> [2] >>>> >> https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/172... >>>> >>>> On 3/26/20 11:51 AM, Ariel Abadi wrote: >>>>> Hi! >>>>> >>>>> After a big battle against the DB, I found 2 issues. >>>>> >>>>> 1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally >>>>> different in terms of performance. While the first one last 24 minutes, >>>> the >>>>> second one last only *90 secs.* >>>>> 2) As I had generated all those files thru msqldump (which it states >> the >>>>> "CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run >> the >>>>> COPY *XXXX OFFSET *RECORDS (in order to avoid the create table >>>>> statement)... and the query, because of the word "STDIN" in the file. >> It >>>>> was not OFFSETTING all the rows I was asking for. >>>>> >>>>> I hope I've been clear... if not please let me know! >>>>> >>>>> Ariel >>>>> >>>>> On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten <martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl> >>> >>>>> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> I am not sure what issue you are now dealing with, but since you are a >>>>>> company >>>>>> that seems to use (or considering) MonetDB in your commercial >> offerings, >>>>>> I gladly refer you to MonetDBSolutions(*), where we can give you the >>>>>> commercial >>>>>> support needed to diagnose your issues. >>>>>> >>>>>> regards Martin >>>>>> >>>>>> (*) https://www.monetdbsolutions.com/ >>>>>> >>>>>> On 24/03/2020 23:55, Ariel Abadi wrote: >>>>>>> Thanks Martin for your prompt response >>>>>>> But I did and msqldump from other database, and now the upload is >> being >>>>>> done thru the BULK LOAD. >>>>>>> >>>>>>> /COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING >>>>>> DELIMITERS '\t','\n','"';/ >>>>>>> >>>>>>> >>>>>>> Any other idea ? :'( >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten < >> martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl> >>>>>> <mailto:martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> given the numbers I suspect you are loading them as SQL insert >>>>>> statements under autocommit mode, >>>>>>> Please read the documentation on Bulk loading. >>>>>>> >>>>>>> regards,Martin >>>>>>> Sent from my iPad >>>>>>> >>>>>>>> On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto:aabadi@starbi.com> >> <mailto: >>>>>> aabadi@starbi.com <mailto:aabadi@starbi.com>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi! >>>>>>>> Maybe somebody can help me, please! >>>>>>>> >>>>>>>> I'm trying to upload into Monet a file with 14MM records. Is >>>> taking >>>>>> more than *24 minutes.* >>>>>>>> It is a fresh new database. >>>>>>>> >>>>>>>> Enclosed you will find the server configuration. >>>>>>>> >>>>>>>> Monetdb Version Nov2019-SP3 >>>>>>>> >>>>>>>> >>>>>>>> *[root@mn tmp]$ lscpu* >>>>>>>> Architecture: x86_64 >>>>>>>> CPU op-mode(s): 32-bit, 64-bit >>>>>>>> Byte Order: Little Endian >>>>>>>> CPU(s): 40 >>>>>>>> On-line CPU(s) list: 0-39 >>>>>>>> Thread(s) per core: 2 >>>>>>>> Core(s) per socket: 10 >>>>>>>> Socket(s): 2 >>>>>>>> NUMA node(s): 2 >>>>>>>> Vendor ID: GenuineIntel >>>>>>>> CPU family: 6 >>>>>>>> Model: 85 >>>>>>>> Model name: Intel(R) Xeon(R) Silver 4114 CPU @ >> 2.20GHz >>>>>>>> Stepping: 4 >>>>>>>> CPU MHz: 800.479 >>>>>>>> BogoMIPS: 4400.00 >>>>>>>> Virtualization: VT-x >>>>>>>> L1d cache: 32K >>>>>>>> L1i cache: 32K >>>>>>>> L2 cache: 1024K >>>>>>>> L3 cache: 14080K >>>>>>>> NUMA node0 CPU(s): >>>>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>>>>>>> NUMA node1 CPU(s): >>>>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>>>>>>> Flags: fpu vme de pse tsc msr pae mce cx8 apic >> sep >>>>>> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht >> tm >>>> pbe >>>>>> syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts >>>>>> rep_good nopl xtopology nonstop_tsc >>>>>>>> cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est >>>>>> tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe >>>>>> popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm >>>>>> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 >>>>>>>> invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp >> tpr_shadow >>>>>> vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 >> smep >>>>>> bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap >>>>>> clflushopt clwb intel_pt avx512cd avx512bw >>>>>>>> avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc >>>>>> cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear >>>>>> flush_l1d >>>>>>>> >>>>>>>> >>>>>>>> *[root@mn tmp]$ cat /proc/meminfo* >>>>>>>> >>>>>>>> MemTotal: 83886080 kB >>>>>>>> MemFree: 10535812 kB >>>>>>>> MemAvailable: 80159744 kB >>>>>>>> Buffers: 0 kB >>>>>>>> Cached: 69623932 kB >>>>>>>> SwapCached: 0 kB >>>>>>>> Active: 6625436 kB >>>>>>>> Inactive: 63737872 kB >>>>>>>> Active(anon): 324456 kB >>>>>>>> Inactive(anon): 417184 kB >>>>>>>> Active(file): 6300980 kB >>>>>>>> Inactive(file): 63320688 kB >>>>>>>> Unevictable: 0 kB >>>>>>>> Mlocked: 5320 kB >>>>>>>> SwapTotal: 8388604 kB >>>>>>>> SwapFree: 8381988 kB >>>>>>>> Dirty: 220 kB >>>>>>>> Writeback: 0 kB >>>>>>>> AnonPages: 13626052 kB >>>>>>>> Mapped: 186368 kB >>>>>>>> Shmem: 1800 kB >>>>>>>> KReclaimable: 4134960 kB >>>>>>>> Slab: 0 kB >>>>>>>> SReclaimable: 0 kB >>>>>>>> SUnreclaim: 0 kB >>>>>>>> KernelStack: 16720 kB >>>>>>>> PageTables: 56500 kB >>>>>>>> NFS_Unstable: 0 kB >>>>>>>> Bounce: 0 kB >>>>>>>> WritebackTmp: 0 kB >>>>>>>> CommitLimit: 74123472 kB >>>>>>>> Committed_AS: 18337812 kB >>>>>>>> VmallocTotal: 34359738367 kB >>>>>>>> VmallocUsed: 415836 kB >>>>>>>> VmallocChunk: 0 kB >>>>>>>> Percpu: 39552 kB >>>>>>>> HardwareCorrupted: 0 kB >>>>>>>> AnonHugePages: 0 kB >>>>>>>> ShmemHugePages: 0 kB >>>>>>>> ShmemPmdMapped: 0 kB >>>>>>>> CmaTotal: 0 kB >>>>>>>> CmaFree: 0 kB >>>>>>>> HugePages_Total: 0 >>>>>>>> HugePages_Free: 0 >>>>>>>> HugePages_Rsvd: 0 >>>>>>>> HugePages_Surp: 0 >>>>>>>> Hugepagesize: 2048 kB >>>>>>>> Hugetlb: 0 kB >>>>>>>> DirectMap4k: 816396 kB >>>>>>>> DirectMap2M: 82513920 kB >>>>>>>> DirectMap1G: 52428800 kB >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users-list mailing list >>>>>>>> users-list@monetdb.org <mailto:users-list@monetdb.org> <mailto:users-list@monetdb.org <mailto:users-list@monetdb.org>> >>>>>>>> https://www.monetdb.org/mailman/listinfo/users-list >>>>>>> _______________________________________________ >>>>>>> users-list mailing list >>>>>>> users-list@monetdb.org <mailto:users-list@monetdb.org> <mailto:users-list@monetdb.org <mailto:users-list@monetdb.org>> >>>>>>> https://www.monetdb.org/mailman/listinfo/users-list >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users-list mailing list >>>>>>> users-list@monetdb.org <mailto:users-list@monetdb.org> >>>>>>> https://www.monetdb.org/mailman/listinfo/users-list >>>>>>> >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users-list mailing list >>>>> users-list@monetdb.org <mailto:users-list@monetdb.org> >>>>> https://www.monetdb.org/mailman/listinfo/users-list >>>>> >>>> >>>> _______________________________________________ >>>> users-list mailing list >>>> users-list@monetdb.org <mailto:users-list@monetdb.org> >>>> https://www.monetdb.org/mailman/listinfo/users-list >>>> >>> >>> >>> _______________________________________________ >>> users-list mailing list >>> users-list@monetdb.org <mailto:users-list@monetdb.org> >>> https://www.monetdb.org/mailman/listinfo/users-list >>> >> _______________________________________________ >> users-list mailing list >> users-list@monetdb.org <mailto:users-list@monetdb.org> >> https://www.monetdb.org/mailman/listinfo/users-list >> > > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list >
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender

I have made a change to the upcoming feature release, tentatively scheduled for June this year. The change is that the OFFSET now counts newlines instead of row separators (i.e. the second string in the USING DELIMITERS clause). Also, the content of the lines are no longer interpreted, so any quotes or backslashes, or even non-UTF-8 bytes, have no influence. On 31/03/2020 22.17, Sjoerd Mullender wrote:
In principle, the OFFSET 223 should make that reading data starts at line 223. There is one problem, however, which makes that this doesn't work for you. The problem is that the quote character that you specified on the COPY INTO command is interpreted and so needs to be balanced. And the quote characters on the last line that is to be skipped are not balanced. Count them, there are 5. Because of this, the code that is skipping lines gets confused.
The code should just count record delimiters (or arguably, but it doesn't, line endings--they can be different!).
I think you did find a genuine bug. Can you please report this in out bugzilla https://bugs.monetdb.org/.
On 31/03/2020 18.50, Ariel Abadi wrote:
Hi!
Firstly, I would like to thank for the prompt answer!!!
I will wait your answer Panos, to see wether is a bug or not.
In the case this is NOT a bug, and you confirm what Arjen said regarding the format, I think it would be interesting to add such comment on the documentation (https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/LoadingBulkData) which does not refer anything related to the format of the "titles".
Also, it would be great to use the OFFSET clause disregards of the format of the remainder.
Again, thank you very much to both!!!
Ariel
PS. The way I solved in this case, was to cut the first 222 rows of each file, and I imported using the Bulk load, which worked perfectly
On Tue, Mar 31, 2020 at 7:16 AM Panagiotis Koutsourakis <panagiotis.koutsourakis@monetdbsolutions.com <mailto:panagiotis.koutsourakis@monetdbsolutions.com>> wrote:
Hi Arjen,
please see my more recent email about the subject.
Best regards, Panos.
On 3/31/20 11:13 AM, Arjen P. de Vries wrote: > Hi Panos, > > Great assistance here! > > One thing: > I can understand why Ariel tried to use OFFSET for that purpose, maybe nice > to add to the documentation that skipped lines also should follow the > format of the remainder! > > Greetings, > > Arjen > > On Tue, 31 Mar 2020 at 10:51, Panagiotis Koutsourakis < > panagiotis.koutsourakis@monetdbsolutions.com <mailto:panagiotis.koutsourakis@monetdbsolutions.com>> wrote: > >> Hi Ariel, >> >> The COPY INTO command recognizes a series of records of the form: >> >> <data><field delimiter><data><field delimiter>...<data><record delimiter> >> >> For example the command >> >> COPY INTO tbl FROM '/path/to/file' USING DELIMITERS '\t','\n','"'; >> >> will recognize lines of the form: >> >> 1 "MIA" 7 "MIA|115331571990001804" 2000-08-01 >> >> but nothing else. Specifically the first 222 lines in the monet_list.txt >> file be rejected by COPY INTO. You need to delete those lines before you >> try to load data with copy into. >> >> Take a look at the documentation of COPY INTO in the MonetDB web page [1]. >> >> Hope this helps, >> Panos. >> >> [1] >> https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/CSV_bulk_loads >> >> On 3/30/20 7:42 PM, Ariel Abadi wrote: >>> Hi Panos! >>> >>> Enclosed you can find the file (truncated) that I used and the statement >> I >>> run on mclient. >>> >>> COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt' >>> USING DELIMITERS '\t','\n','"'; >>> >>> >>> Thks >>> Ariel >>> >>> >>> On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis < >>> panagiotis.koutsourakis@monetdbsolutions.com <mailto:panagiotis.koutsourakis@monetdbsolutions.com>> wrote: >>> >>>> Hi Ariel, >>>> >>>> The first issue is not totally surprising to me. My best guess is that >>>> reading from a file is using buffered input while reading from stdin is >>>> not. See for example [1] and [2]. I am not sure there is a lot we can >> do at >>>> the MonetDB side about this. >>>> >>>> About the second issue could you please post the exact COPY INTO command >>>> and the first few lines of the file you are trying to load so that we >> can >>>> figure out the exact issue? >>>> >>>> Best regards, >>>> Panos. >>>> >>>> [1] >>>> >> https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-... >>>> [2] >>>> >> https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/172... >>>> >>>> On 3/26/20 11:51 AM, Ariel Abadi wrote: >>>>> Hi! >>>>> >>>>> After a big battle against the DB, I found 2 issues. >>>>> >>>>> 1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally >>>>> different in terms of performance. While the first one last 24 minutes, >>>> the >>>>> second one last only *90 secs.* >>>>> 2) As I had generated all those files thru msqldump (which it states >> the >>>>> "CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run >> the >>>>> COPY *XXXX OFFSET *RECORDS (in order to avoid the create table >>>>> statement)... and the query, because of the word "STDIN" in the file. >> It >>>>> was not OFFSETTING all the rows I was asking for. >>>>> >>>>> I hope I've been clear... if not please let me know! >>>>> >>>>> Ariel >>>>> >>>>> On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten <martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl> >>> >>>>> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> I am not sure what issue you are now dealing with, but since you are a >>>>>> company >>>>>> that seems to use (or considering) MonetDB in your commercial >> offerings, >>>>>> I gladly refer you to MonetDBSolutions(*), where we can give you the >>>>>> commercial >>>>>> support needed to diagnose your issues. >>>>>> >>>>>> regards Martin >>>>>> >>>>>> (*) https://www.monetdbsolutions.com/ >>>>>> >>>>>> On 24/03/2020 23:55, Ariel Abadi wrote: >>>>>>> Thanks Martin for your prompt response >>>>>>> But I did and msqldump from other database, and now the upload is >> being >>>>>> done thru the BULK LOAD. >>>>>>> >>>>>>> /COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING >>>>>> DELIMITERS '\t','\n','"';/ >>>>>>> >>>>>>> >>>>>>> Any other idea ? :'( >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten < >> martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl> >>>>>> <mailto:martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> given the numbers I suspect you are loading them as SQL insert >>>>>> statements under autocommit mode, >>>>>>> Please read the documentation on Bulk loading. >>>>>>> >>>>>>> regards,Martin >>>>>>> Sent from my iPad >>>>>>> >>>>>>>> On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto:aabadi@starbi.com> >> <mailto: >>>>>> aabadi@starbi.com <mailto:aabadi@starbi.com>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi! >>>>>>>> Maybe somebody can help me, please! >>>>>>>> >>>>>>>> I'm trying to upload into Monet a file with 14MM records. Is >>>> taking >>>>>> more than *24 minutes.* >>>>>>>> It is a fresh new database. >>>>>>>> >>>>>>>> Enclosed you will find the server configuration. >>>>>>>> >>>>>>>> Monetdb Version Nov2019-SP3 >>>>>>>> >>>>>>>> >>>>>>>> *[root@mn tmp]$ lscpu* >>>>>>>> Architecture: x86_64 >>>>>>>> CPU op-mode(s): 32-bit, 64-bit >>>>>>>> Byte Order: Little Endian >>>>>>>> CPU(s): 40 >>>>>>>> On-line CPU(s) list: 0-39 >>>>>>>> Thread(s) per core: 2 >>>>>>>> Core(s) per socket: 10 >>>>>>>> Socket(s): 2 >>>>>>>> NUMA node(s): 2 >>>>>>>> Vendor ID: GenuineIntel >>>>>>>> CPU family: 6 >>>>>>>> Model: 85 >>>>>>>> Model name: Intel(R) Xeon(R) Silver 4114 CPU @ >> 2.20GHz >>>>>>>> Stepping: 4 >>>>>>>> CPU MHz: 800.479 >>>>>>>> BogoMIPS: 4400.00 >>>>>>>> Virtualization: VT-x >>>>>>>> L1d cache: 32K >>>>>>>> L1i cache: 32K >>>>>>>> L2 cache: 1024K >>>>>>>> L3 cache: 14080K >>>>>>>> NUMA node0 CPU(s): >>>>>> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>>>>>>> NUMA node1 CPU(s): >>>>>> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>>>>>>> Flags: fpu vme de pse tsc msr pae mce cx8 apic >> sep >>>>>> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht >> tm >>>> pbe >>>>>> syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts >>>>>> rep_good nopl xtopology nonstop_tsc >>>>>>>> cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est >>>>>> tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe >>>>>> popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm >>>>>> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 >>>>>>>> invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp >> tpr_shadow >>>>>> vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 >> smep >>>>>> bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap >>>>>> clflushopt clwb intel_pt avx512cd avx512bw >>>>>>>> avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc >>>>>> cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear >>>>>> flush_l1d >>>>>>>> >>>>>>>> >>>>>>>> *[root@mn tmp]$ cat /proc/meminfo* >>>>>>>> >>>>>>>> MemTotal: 83886080 kB >>>>>>>> MemFree: 10535812 kB >>>>>>>> MemAvailable: 80159744 kB >>>>>>>> Buffers: 0 kB >>>>>>>> Cached: 69623932 kB >>>>>>>> SwapCached: 0 kB >>>>>>>> Active: 6625436 kB >>>>>>>> Inactive: 63737872 kB >>>>>>>> Active(anon): 324456 kB >>>>>>>> Inactive(anon): 417184 kB >>>>>>>> Active(file): 6300980 kB >>>>>>>> Inactive(file): 63320688 kB >>>>>>>> Unevictable: 0 kB >>>>>>>> Mlocked: 5320 kB >>>>>>>> SwapTotal: 8388604 kB >>>>>>>> SwapFree: 8381988 kB >>>>>>>> Dirty: 220 kB >>>>>>>> Writeback: 0 kB >>>>>>>> AnonPages: 13626052 kB >>>>>>>> Mapped: 186368 kB >>>>>>>> Shmem: 1800 kB >>>>>>>> KReclaimable: 4134960 kB >>>>>>>> Slab: 0 kB >>>>>>>> SReclaimable: 0 kB >>>>>>>> SUnreclaim: 0 kB >>>>>>>> KernelStack: 16720 kB >>>>>>>> PageTables: 56500 kB >>>>>>>> NFS_Unstable: 0 kB >>>>>>>> Bounce: 0 kB >>>>>>>> WritebackTmp: 0 kB >>>>>>>> CommitLimit: 74123472 kB >>>>>>>> Committed_AS: 18337812 kB >>>>>>>> VmallocTotal: 34359738367 kB >>>>>>>> VmallocUsed: 415836 kB >>>>>>>> VmallocChunk: 0 kB >>>>>>>> Percpu: 39552 kB >>>>>>>> HardwareCorrupted: 0 kB >>>>>>>> AnonHugePages: 0 kB >>>>>>>> ShmemHugePages: 0 kB >>>>>>>> ShmemPmdMapped: 0 kB >>>>>>>> CmaTotal: 0 kB >>>>>>>> CmaFree: 0 kB >>>>>>>> HugePages_Total: 0 >>>>>>>> HugePages_Free: 0 >>>>>>>> HugePages_Rsvd: 0 >>>>>>>> HugePages_Surp: 0 >>>>>>>> Hugepagesize: 2048 kB >>>>>>>> Hugetlb: 0 kB >>>>>>>> DirectMap4k: 816396 kB >>>>>>>> DirectMap2M: 82513920 kB >>>>>>>> DirectMap1G: 52428800 kB >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users-list mailing list >>>>>>>> users-list@monetdb.org <mailto:users-list@monetdb.org> <mailto:users-list@monetdb.org <mailto:users-list@monetdb.org>> >>>>>>>> https://www.monetdb.org/mailman/listinfo/users-list >>>>>>> _______________________________________________ >>>>>>> users-list mailing list >>>>>>> users-list@monetdb.org <mailto:users-list@monetdb.org> <mailto:users-list@monetdb.org <mailto:users-list@monetdb.org>> >>>>>>> https://www.monetdb.org/mailman/listinfo/users-list >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users-list mailing list >>>>>>> users-list@monetdb.org <mailto:users-list@monetdb.org> >>>>>>> https://www.monetdb.org/mailman/listinfo/users-list >>>>>>> >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users-list mailing list >>>>> users-list@monetdb.org <mailto:users-list@monetdb.org> >>>>> https://www.monetdb.org/mailman/listinfo/users-list >>>>> >>>> >>>> _______________________________________________ >>>> users-list mailing list >>>> users-list@monetdb.org <mailto:users-list@monetdb.org> >>>> https://www.monetdb.org/mailman/listinfo/users-list >>>> >>> >>> >>> _______________________________________________ >>> users-list mailing list >>> users-list@monetdb.org <mailto:users-list@monetdb.org> >>> https://www.monetdb.org/mailman/listinfo/users-list >>> >> _______________________________________________ >> users-list mailing list >> users-list@monetdb.org <mailto:users-list@monetdb.org> >> https://www.monetdb.org/mailman/listinfo/users-list >> > > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list >
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender

Hai Ariel, Just a side question, why are you creating msqldump of your data to reload into MonetDB. Why don’t you immediately prepare CSV data and do the bulk loading? Regards, Jennie
On 31 Mar 2020, at 18:50, Ariel Abadi <aabadi@starbi.com> wrote:
Hi!
Firstly, I would like to thank for the prompt answer!!!
I will wait your answer Panos, to see wether is a bug or not.
In the case this is NOT a bug, and you confirm what Arjen said regarding the format, I think it would be interesting to add such comment on the documentation (https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/LoadingBulkData) which does not refer anything related to the format of the "titles". Also, it would be great to use the OFFSET clause disregards of the format of the remainder.
Again, thank you very much to both!!!
Ariel PS. The way I solved in this case, was to cut the first 222 rows of each file, and I imported using the Bulk load, which worked perfectly
On Tue, Mar 31, 2020 at 7:16 AM Panagiotis Koutsourakis <panagiotis.koutsourakis@monetdbsolutions.com> wrote: Hi Arjen,
please see my more recent email about the subject.
Best regards, Panos.
On 3/31/20 11:13 AM, Arjen P. de Vries wrote:
Hi Panos,
Great assistance here!
One thing: I can understand why Ariel tried to use OFFSET for that purpose, maybe nice to add to the documentation that skipped lines also should follow the format of the remainder!
Greetings,
Arjen
On Tue, 31 Mar 2020 at 10:51, Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The COPY INTO command recognizes a series of records of the form:
<data><field delimiter><data><field delimiter>...<data><record delimiter>
For example the command
COPY INTO tbl FROM '/path/to/file' USING DELIMITERS '\t','\n','"';
will recognize lines of the form:
1 "MIA" 7 "MIA|115331571990001804" 2000-08-01
but nothing else. Specifically the first 222 lines in the monet_list.txt file be rejected by COPY INTO. You need to delete those lines before you try to load data with copy into.
Take a look at the documentation of COPY INTO in the MonetDB web page [1].
Hope this helps, Panos.
[1] https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/CSV_bulk_loads
Hi Panos!
Enclosed you can find the file (truncated) that I used and the statement I run on mclient.
COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt' USING DELIMITERS '\t','\n','"';
Thks Ariel
On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The first issue is not totally surprising to me. My best guess is that reading from a file is using buffered input while reading from stdin is not. See for example [1] and [2]. I am not sure there is a lot we can do at the MonetDB side about this.
About the second issue could you please post the exact COPY INTO command and the first few lines of the file you are trying to load so that we can figure out the exact issue?
Best regards, Panos.
[1]
https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-...
[2]
https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/172...
Hi!
After a big battle against the DB, I found 2 issues.
1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally different in terms of performance. While the first one last 24 minutes,
On 3/26/20 11:51 AM, Ariel Abadi wrote: the
second one last only *90 secs.* 2) As I had generated all those files thru msqldump (which it states
"CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run
On 3/30/20 7:42 PM, Ariel Abadi wrote: the the
COPY *XXXX OFFSET *RECORDS (in order to avoid the create table statement)... and the query, because of the word "STDIN" in the file. It was not OFFSETTING all the rows I was asking for.
I hope I've been clear... if not please let me know!
Ariel
On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten <martin.kersten@cwi.nl
wrote:
> Hi > > I am not sure what issue you are now dealing with, but since you are a > company > that seems to use (or considering) MonetDB in your commercial offerings, > I gladly refer you to MonetDBSolutions(*), where we can give you the > commercial > support needed to diagnose your issues. > > regards Martin > > (*) https://www.monetdbsolutions.com/ > > On 24/03/2020 23:55, Ariel Abadi wrote: >> Thanks Martin for your prompt response >> But I did and msqldump from other database, and now the upload is being > done thru the BULK LOAD. >> >> /COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING > DELIMITERS '\t','\n','"';/ >> >> >> Any other idea ? :'( >> >> >> >> >> On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten < martin.kersten@cwi.nl > <mailto:martin.kersten@cwi.nl>> wrote: >> >> Hi, >> >> given the numbers I suspect you are loading them as SQL insert > statements under autocommit mode, >> Please read the documentation on Bulk loading. >> >> regards,Martin >> Sent from my iPad >> >>> On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto: > aabadi@starbi.com>> wrote: >>> >>> >>> Hi! >>> Maybe somebody can help me, please! >>> >>> I'm trying to upload into Monet a file with 14MM records. Is taking > more than *24 minutes.* >>> It is a fresh new database. >>> >>> Enclosed you will find the server configuration. >>> >>> Monetdb Version Nov2019-SP3 >>> >>> >>> *[root@mn tmp]$ lscpu* >>> Architecture: x86_64 >>> CPU op-mode(s): 32-bit, 64-bit >>> Byte Order: Little Endian >>> CPU(s): 40 >>> On-line CPU(s) list: 0-39 >>> Thread(s) per core: 2 >>> Core(s) per socket: 10 >>> Socket(s): 2 >>> NUMA node(s): 2 >>> Vendor ID: GenuineIntel >>> CPU family: 6 >>> Model: 85 >>> Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz >>> Stepping: 4 >>> CPU MHz: 800.479 >>> BogoMIPS: 4400.00 >>> Virtualization: VT-x >>> L1d cache: 32K >>> L1i cache: 32K >>> L2 cache: 1024K >>> L3 cache: 14080K >>> NUMA node0 CPU(s): > 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>> NUMA node1 CPU(s): > 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep > mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe > syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts > rep_good nopl xtopology nonstop_tsc >>> cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est > tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe > popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm > 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 >>> invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow > vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep > bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap > clflushopt clwb intel_pt avx512cd avx512bw >>> avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc > cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear > flush_l1d >>> >>> >>> *[root@mn tmp]$ cat /proc/meminfo* >>> >>> MemTotal: 83886080 kB >>> MemFree: 10535812 kB >>> MemAvailable: 80159744 kB >>> Buffers: 0 kB >>> Cached: 69623932 kB >>> SwapCached: 0 kB >>> Active: 6625436 kB >>> Inactive: 63737872 kB >>> Active(anon): 324456 kB >>> Inactive(anon): 417184 kB >>> Active(file): 6300980 kB >>> Inactive(file): 63320688 kB >>> Unevictable: 0 kB >>> Mlocked: 5320 kB >>> SwapTotal: 8388604 kB >>> SwapFree: 8381988 kB >>> Dirty: 220 kB >>> Writeback: 0 kB >>> AnonPages: 13626052 kB >>> Mapped: 186368 kB >>> Shmem: 1800 kB >>> KReclaimable: 4134960 kB >>> Slab: 0 kB >>> SReclaimable: 0 kB >>> SUnreclaim: 0 kB >>> KernelStack: 16720 kB >>> PageTables: 56500 kB >>> NFS_Unstable: 0 kB >>> Bounce: 0 kB >>> WritebackTmp: 0 kB >>> CommitLimit: 74123472 kB >>> Committed_AS: 18337812 kB >>> VmallocTotal: 34359738367 kB >>> VmallocUsed: 415836 kB >>> VmallocChunk: 0 kB >>> Percpu: 39552 kB >>> HardwareCorrupted: 0 kB >>> AnonHugePages: 0 kB >>> ShmemHugePages: 0 kB >>> ShmemPmdMapped: 0 kB >>> CmaTotal: 0 kB >>> CmaFree: 0 kB >>> HugePages_Total: 0 >>> HugePages_Free: 0 >>> HugePages_Rsvd: 0 >>> HugePages_Surp: 0 >>> Hugepagesize: 2048 kB >>> Hugetlb: 0 kB >>> DirectMap4k: 816396 kB >>> DirectMap2M: 82513920 kB >>> DirectMap1G: 52428800 kB

Hi Jennie! I chose the msqldump option, firstly because I thought it was the way to do the backup. Secondly, the msqldump creates the table and then it does a BULK LOAD, therefore for me was a good option. If I had chosen to export only the data, I would have had to create the tables either manually, or through another mysqldump (structure only). Rgds Ariel On Tue, Mar 31, 2020 at 6:16 PM Ying Zhang <Y.Zhang@cwi.nl> wrote:
Hai Ariel,
Just a side question, why are you creating msqldump of your data to reload into MonetDB. Why don’t you immediately prepare CSV data and do the bulk loading?
Regards, Jennie
On 31 Mar 2020, at 18:50, Ariel Abadi <aabadi@starbi.com> wrote:
Hi!
Firstly, I would like to thank for the prompt answer!!!
I will wait your answer Panos, to see wether is a bug or not.
In the case this is NOT a bug, and you confirm what Arjen said regarding the format, I think it would be interesting to add such comment on the documentation ( https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/LoadingBulkData) which does not refer anything related to the format of the "titles". Also, it would be great to use the OFFSET clause disregards of the format of the remainder.
Again, thank you very much to both!!!
Ariel PS. The way I solved in this case, was to cut the first 222 rows of each file, and I imported using the Bulk load, which worked perfectly
On Tue, Mar 31, 2020 at 7:16 AM Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote: Hi Arjen,
please see my more recent email about the subject.
Best regards, Panos.
Hi Panos,
Great assistance here!
One thing: I can understand why Ariel tried to use OFFSET for that purpose, maybe nice to add to the documentation that skipped lines also should follow the format of the remainder!
Greetings,
Arjen
On Tue, 31 Mar 2020 at 10:51, Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The COPY INTO command recognizes a series of records of the form:
<data><field delimiter><data><field delimiter>...<data><record delimiter>
For example the command
COPY INTO tbl FROM '/path/to/file' USING DELIMITERS '\t','\n','"';
will recognize lines of the form:
1 "MIA" 7 "MIA|115331571990001804" 2000-08-01
but nothing else. Specifically the first 222 lines in the monet_list.txt file be rejected by COPY INTO. You need to delete those lines before you try to load data with copy into.
Take a look at the documentation of COPY INTO in the MonetDB web page [1].
Hope this helps, Panos.
[1]
https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/CSV_bulk_loads
On 3/30/20 7:42 PM, Ariel Abadi wrote:
Hi Panos!
Enclosed you can find the file (truncated) that I used and the
statement
I
run on mclient.
COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt' USING DELIMITERS '\t','\n','"';
Thks Ariel
On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The first issue is not totally surprising to me. My best guess is
reading from a file is using buffered input while reading from stdin is not. See for example [1] and [2]. I am not sure there is a lot we can do at the MonetDB side about this.
About the second issue could you please post the exact COPY INTO command and the first few lines of the file you are trying to load so that we can figure out the exact issue?
Best regards, Panos.
[1]
https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-...
[2]
https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/172...
On 3/26/20 11:51 AM, Ariel Abadi wrote: > Hi! > > After a big battle against the DB, I found 2 issues. > > 1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is
> different in terms of performance. While the first one last 24 minutes, the > second one last only *90 secs.* > 2) As I had generated all those files thru msqldump (which it states
> "CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run
totally the the
> COPY *XXXX OFFSET *RECORDS (in order to avoid the create table > statement)... and the query, because of the word "STDIN" in the file. It > was not OFFSETTING all the rows I was asking for. > > I hope I've been clear... if not please let me know! > > Ariel > > On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten < martin.kersten@cwi.nl
> wrote: > >> Hi >> >> I am not sure what issue you are now dealing with, but since you are a >> company >> that seems to use (or considering) MonetDB in your commercial offerings, >> I gladly refer you to MonetDBSolutions(*), where we can give you
On 3/31/20 11:13 AM, Arjen P. de Vries wrote: that the
>> commercial >> support needed to diagnose your issues. >> >> regards Martin >> >> (*) https://www.monetdbsolutions.com/ >> >> On 24/03/2020 23:55, Ariel Abadi wrote: >>> Thanks Martin for your prompt response >>> But I did and msqldump from other database, and now the upload is being >> done thru the BULK LOAD. >>> >>> /COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING >> DELIMITERS '\t','\n','"';/ >>> >>> >>> Any other idea ? :'( >>> >>> >>> >>> >>> On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten < martin.kersten@cwi.nl >> <mailto:martin.kersten@cwi.nl>> wrote: >>> >>> Hi, >>> >>> given the numbers I suspect you are loading them as SQL insert >> statements under autocommit mode, >>> Please read the documentation on Bulk loading. >>> >>> regards,Martin >>> Sent from my iPad >>> >>>> On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto: >> aabadi@starbi.com>> wrote: >>>> >>>> >>>> Hi! >>>> Maybe somebody can help me, please! >>>> >>>> I'm trying to upload into Monet a file with 14MM records. Is taking >> more than *24 minutes.* >>>> It is a fresh new database. >>>> >>>> Enclosed you will find the server configuration. >>>> >>>> Monetdb Version Nov2019-SP3 >>>> >>>> >>>> *[root@mn tmp]$ lscpu* >>>> Architecture: x86_64 >>>> CPU op-mode(s): 32-bit, 64-bit >>>> Byte Order: Little Endian >>>> CPU(s): 40 >>>> On-line CPU(s) list: 0-39 >>>> Thread(s) per core: 2 >>>> Core(s) per socket: 10 >>>> Socket(s): 2 >>>> NUMA node(s): 2 >>>> Vendor ID: GenuineIntel >>>> CPU family: 6 >>>> Model: 85 >>>> Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz >>>> Stepping: 4 >>>> CPU MHz: 800.479 >>>> BogoMIPS: 4400.00 >>>> Virtualization: VT-x >>>> L1d cache: 32K >>>> L1i cache: 32K >>>> L2 cache: 1024K >>>> L3 cache: 14080K >>>> NUMA node0 CPU(s): >> 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 >>>> NUMA node1 CPU(s): >> 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 >>>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep >> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe >> syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts >> rep_good nopl xtopology nonstop_tsc >>>> cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est >> tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe >> popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm >> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 >>>> invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow >> vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep >> bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap >> clflushopt clwb intel_pt avx512cd avx512bw >>>> avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc >> cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear >> flush_l1d >>>> >>>> >>>> *[root@mn tmp]$ cat /proc/meminfo* >>>> >>>> MemTotal: 83886080 kB >>>> MemFree: 10535812 kB >>>> MemAvailable: 80159744 kB >>>> Buffers: 0 kB >>>> Cached: 69623932 kB >>>> SwapCached: 0 kB >>>> Active: 6625436 kB >>>> Inactive: 63737872 kB >>>> Active(anon): 324456 kB >>>> Inactive(anon): 417184 kB >>>> Active(file): 6300980 kB >>>> Inactive(file): 63320688 kB >>>> Unevictable: 0 kB >>>> Mlocked: 5320 kB >>>> SwapTotal: 8388604 kB >>>> SwapFree: 8381988 kB >>>> Dirty: 220 kB >>>> Writeback: 0 kB >>>> AnonPages: 13626052 kB >>>> Mapped: 186368 kB >>>> Shmem: 1800 kB >>>> KReclaimable: 4134960 kB >>>> Slab: 0 kB >>>> SReclaimable: 0 kB >>>> SUnreclaim: 0 kB >>>> KernelStack: 16720 kB >>>> PageTables: 56500 kB >>>> NFS_Unstable: 0 kB >>>> Bounce: 0 kB >>>> WritebackTmp: 0 kB >>>> CommitLimit: 74123472 kB >>>> Committed_AS: 18337812 kB >>>> VmallocTotal: 34359738367 kB >>>> VmallocUsed: 415836 kB >>>> VmallocChunk: 0 kB >>>> Percpu: 39552 kB >>>> HardwareCorrupted: 0 kB >>>> AnonHugePages: 0 kB >>>> ShmemHugePages: 0 kB >>>> ShmemPmdMapped: 0 kB >>>> CmaTotal: 0 kB >>>> CmaFree: 0 kB >>>> HugePages_Total: 0 >>>> HugePages_Free: 0 >>>> HugePages_Rsvd: 0 >>>> HugePages_Surp: 0 >>>> Hugepagesize: 2048 kB >>>> Hugetlb: 0 kB >>>> DirectMap4k: 816396 kB >>>> DirectMap2M: 82513920 kB >>>> DirectMap1G: 52428800 kB
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Hi again Ariel, Apparently i misread your previous email. Sorry about that! I changed the file you sent a little bit. In i replaced your records with these two: 1 "MIA" 2 "MIA" I was able to successfully load these tuples using COPY OFFSET 223 INTO foo FROM '/tmp/monet_list.txt' USING DELIMITERS '\t','\n'; where foo was created as CREATE TABLE foo (i INT, s TEXT); On the other hand the command COPY OFFSET 223 INTO foo FROM '/tmp/monet_list.txt' USING DELIMITERS '\t','\n','"'; failed with the message: Failed to import table 'foo', line 222: incomplete record at end of file This error is a little confusing to me and i would have to investigate further to determine if this is a bug. In order to avoid the issue I suggest that you create your TSV using the following command instead of msqldump: COPY SELECT * FROM "sb_traf"."nrm_20000801" INTO '/tmp/nrm_20000801.tsv' USING DELIMITERS '\t','\n','"'; which then you can load using COPY INTO foo FROM '/tmp/nrm_20000801.tsv' USING DELIMITERS '\t','\n','"'; Again I apologize for the previous mistake. I hope this is sufficiently clear, Panos On 3/31/20 10:47 AM, Panagiotis Koutsourakis wrote:
Hi Ariel,
The COPY INTO command recognizes a series of records of the form:
<data><field delimiter><data><field delimiter>...<data><record delimiter>
For example the command
COPY INTO tbl FROM '/path/to/file' USING DELIMITERS '\t','\n','"';
will recognize lines of the form:
1 "MIA" 7 "MIA|115331571990001804" 2000-08-01
but nothing else. Specifically the first 222 lines in the monet_list.txt file be rejected by COPY INTO. You need to delete those lines before you try to load data with copy into.
Take a look at the documentation of COPY INTO in the MonetDB web page [1].
Hope this helps, Panos.
[1] https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/CSV_bulk_loads
On 3/30/20 7:42 PM, Ariel Abadi wrote:
Hi Panos!
Enclosed you can find the file (truncated) that I used and the statement I run on mclient.
COPY OFFSET 223 INTO "sb_traf"."nrm_20000818" FROM '/tmp/monet_list.txt' USING DELIMITERS '\t','\n','"';
Thks Ariel
On Mon, Mar 30, 2020 at 8:07 AM Panagiotis Koutsourakis < panagiotis.koutsourakis@monetdbsolutions.com> wrote:
Hi Ariel,
The first issue is not totally surprising to me. My best guess is that reading from a file is using buffered input while reading from stdin is not. See for example [1] and [2]. I am not sure there is a lot we can do at the MonetDB side about this.
About the second issue could you please post the exact COPY INTO command and the first few lines of the file you are trying to load so that we can figure out the exact issue?
Best regards, Panos.
[1] https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-... [2] https://discourse.julialang.org/t/stdin-stdout-10x-slower-than-file-open/172...
Hi!
After a big battle against the DB, I found 2 issues.
1) Running the COPY RECORDS FROM *STDIN *and from *FILE*. Is totally different in terms of performance. While the first one last 24 minutes,
second one last only *90 secs.* 2) As I had generated all those files thru msqldump (which it states the "CREATE TABLE" instruction and then the COPY RECORDS)... I try ro run the COPY *XXXX OFFSET *RECORDS (in order to avoid the create table statement)... and the query, because of the word "STDIN" in the file. It was not OFFSETTING all the rows I was asking for.
I hope I've been clear... if not please let me know!
Ariel
On Tue, Mar 24, 2020 at 11:33 PM Martin Kersten <martin.kersten@cwi.nl> wrote:
Hi
I am not sure what issue you are now dealing with, but since you are a company that seems to use (or considering) MonetDB in your commercial offerings, I gladly refer you to MonetDBSolutions(*), where we can give you the commercial support needed to diagnose your issues.
regards Martin
(*) https://www.monetdbsolutions.com/
On 24/03/2020 23:55, Ariel Abadi wrote:
Thanks Martin for your prompt response But I did and msqldump from other database, and now the upload is being done thru the BULK LOAD.
/COPY 14173375 RECORDS INTO "sb_traf"."nrm_20180818" FROM stdin USING DELIMITERS '\t','\n','"';/
Any other idea ? :'(
On Tue, Mar 24, 2020 at 7:37 PM Martin Kersten <martin.kersten@cwi.nl <mailto:martin.kersten@cwi.nl>> wrote:
Hi,
given the numbers I suspect you are loading them as SQL insert statements under autocommit mode, Please read the documentation on Bulk loading.
regards,Martin Sent from my iPad
> On 24 Mar 2020, at 23:12, Ariel Abadi <aabadi@starbi.com <mailto: aabadi@starbi.com>> wrote: > > > Hi! > Maybe somebody can help me, please! > > I'm trying to upload into Monet a file with 14MM records. Is taking more than *24 minutes.* > It is a fresh new database. > > Enclosed you will find the server configuration. > > Monetdb Version Nov2019-SP3 > > > *[root@mn tmp]$ lscpu* > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 40 > On-line CPU(s) list: 0-39 > Thread(s) per core: 2 > Core(s) per socket: 10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 85 > Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz > Stepping: 4 > CPU MHz: 800.479 > BogoMIPS: 4400.00 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 1024K > L3 cache: 14080K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
On 3/26/20 11:51 AM, Ariel Abadi wrote: the pbe
> cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3
> invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw > avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d > > > *[root@mn tmp]$ cat /proc/meminfo* > > MemTotal: 83886080 kB > MemFree: 10535812 kB > MemAvailable: 80159744 kB > Buffers: 0 kB > Cached: 69623932 kB > SwapCached: 0 kB > Active: 6625436 kB > Inactive: 63737872 kB > Active(anon): 324456 kB > Inactive(anon): 417184 kB > Active(file): 6300980 kB > Inactive(file): 63320688 kB > Unevictable: 0 kB > Mlocked: 5320 kB > SwapTotal: 8388604 kB > SwapFree: 8381988 kB > Dirty: 220 kB > Writeback: 0 kB > AnonPages: 13626052 kB > Mapped: 186368 kB > Shmem: 1800 kB > KReclaimable: 4134960 kB > Slab: 0 kB > SReclaimable: 0 kB > SUnreclaim: 0 kB > KernelStack: 16720 kB > PageTables: 56500 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 74123472 kB > Committed_AS: 18337812 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 415836 kB > VmallocChunk: 0 kB > Percpu: 39552 kB > HardwareCorrupted: 0 kB > AnonHugePages: 0 kB > ShmemHugePages: 0 kB > ShmemPmdMapped: 0 kB > CmaTotal: 0 kB > CmaFree: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > Hugetlb: 0 kB > DirectMap4k: 816396 kB > DirectMap2M: 82513920 kB > DirectMap1G: 52428800 kB > > > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (5)
-
Ariel Abadi
-
Arjen P. de Vries
-
Panagiotis Koutsourakis
-
Sjoerd Mullender
-
Ying Zhang