[MonetDB-users] Copy Into problem

Hi there, I'm trying to use the copy into with csv file (utf-8 encoded) and I encounter some problem. The create statement is: CREATE TABLE "sa3e7e9f4d666429ea2c5bb61a532f5b9" ("A" DATE ,"B" VARCHAR (22) ,"C" DECIMAL (12, 5) ,"D" DECIMAL (12, 5) ,"E" DECIMAL (12, 5) ,"F" DECIMAL (12, 5) ) The copy into statement is: COPY INTO "sa3e7e9f4d666429ea2c5bb61a532f5b9" FROM 'T:\\Prism\\Applications\\Desktop\\PrismDesktop\\bin\\Debug\\LocalRepository\\ImportedCSV\\ImportedCSV0.csv' USING DELIMITERS '|' NULL AS 'A56A9FC261A143a48F6019F602ABF409'; And csv is attached. Any advice?!?!?! tnx, Alfred. http://www.nabble.com/file/p21200441/ImportedCSV0.csv ImportedCSV0.csv -- View this message in context: http://www.nabble.com/Copy-Into-problem-tp21200441p21200441.html Sent from the monetdb-users mailing list archive at Nabble.com.

Hi Alfred, On 29-12-2008 00:58:40 -0800, Alfred Nordman wrote:
Do you have any error message? I suspect the DATETIME to be a problem, as your table specifies a DATE, but my knowledge on SQL's date/time/timezone stuff is a bit rusty.

Hi Fabian, The error i get is: !SQLException:sql:value '?»¿1997-01-01 00:00:00' while parsing '?»¿1997-01-01 00:00:00|City Mall|1110|6165|33|8122' from line 0 field 0 not inserted, expecting type date !SQLException:importTable:failed to import table It seems like some chars (?»¿) are added to the first row inserted to the table. The chars do not appear in the csv file (attached). Alfred Fabian Groffen wrote:
-- View this message in context: http://www.nabble.com/Copy-Into-problem-tp21200441p21215000.html Sent from the monetdb-users mailing list archive at Nabble.com.

On 30-12-2008 01:50:32 -0800, Alfred Nordman wrote:
Yeah, looks like garbage in the rendering. But I think the most important part is "expecting type date". You specified a datetime. If you want to really have date (1997-01-01) you need to change your CSV (remove the 00:00:00), or do the modification in the database after loading your dataset with a datetime instead of a date for column "A".

Hi Fabian, It happens even if i change the table declartion so that column A is timestamp. Fabian Groffen wrote:
-- View this message in context: http://www.nabble.com/Copy-Into-problem-tp21200441p21215907.html Sent from the monetdb-users mailing list archive at Nabble.com.

Alfred Nordman wrote:
The last couple of days the COPY functionality has been improved and made more robust. It might be that you hit one of the bugs solved. The following experiment runs: mk> mclient -lsql sql>create table tmp_date ( d timestamp, s string); 0 tuples sql>COPY 1 RECORDs INTO tmp_date FROM stdin USING DELIMITERS '|', '\n'; more>1997-01-01 00:00:00|City Mall Rows affected 1 sql>select * from tmp_date; +----------------------------+-----------+ | d | s | +============================+===========+ | 1997-01-01 00:00:00.000000 | City Mall | +----------------------------+-----------+ 1 tuple

On Tue, Dec 30, 2008 at 01:50:32AM -0800, Alfred Nordman wrote:
With less you can see the following first chars <U+FEFF>. Niels
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl

Hi Niels, the first chars do not appear in the csv!!!!! Niels Nes wrote:
-- View this message in context: http://www.nabble.com/Copy-Into-problem-tp21200441p21215876.html Sent from the monetdb-users mailing list archive at Nabble.com.

On Tue, Dec 30, 2008 at 03:19:42AM -0800, Alfred Nordman wrote:
Hi Niels,
the first chars do not appear in the csv!!!!!
Sorry they do, but they look different. Niels
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl

On Tue, Dec 30, 2008 at 03:19:42AM -0800, Alfred Nordman wrote:
Hi Niels,
the first chars do not appear in the csv!!!!!
the first line of your csv file *does* start with "ef bb bf" (hex) *before* the first time stamp ("1997-01-01 00:00:00": ======== $ head -n1 ImportedCSV0.csv | hexdump -C -------- 00000000 ef bb bf 31 39 39 37 2d 30 31 2d 30 31 20 30 30 |...1997-01-01 00| 00000010 3a 30 30 3a 30 30 7c 43 69 74 79 20 4d 61 6c 6c |:00:00|City Mall| 00000020 7c 31 31 31 30 7c 36 31 36 35 7c 33 33 7c 38 31 ||1110|6165|33|81| 00000030 32 32 0d 0a |22..| 00000034 ======== $ head -n1 ImportedCSV0.csv | cut -c2- | hexdump -C -------- 00000000 31 39 39 37 2d 30 31 2d 30 31 20 30 30 3a 30 30 |1997-01-01 00:00| 00000010 3a 30 30 7c 43 69 74 79 20 4d 61 6c 6c 7c 31 31 |:00|City Mall|11| 00000020 31 30 7c 36 31 36 35 7c 33 33 7c 38 31 32 32 0d |10|6165|33|8122.| 00000030 0a |.| 00000031 ======== looks like some "magic byte" (-like) file type identifier left over by some nasty (Windows-?) program that created the csv file ... http://en.wikipedia.org/wiki/Magic_number_(programming) says: " Unicode text files encoded in UTF-16 often start with the Byte Order Mark to detect endianness ('FE FF' for big endian and 'FF FE' for little endian). UTF-8 text files often start with the UTF-8 encoding of the same character, 'EF BB BF'. " Stefan
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |

I haven't looked at your document, but it is probably the byte-order mark... See section: 3.2 Byte order mark (BOM) http://www.ietf.org/rfc/rfc2781.txt Normal editors/viewers hide it, as it isn't a visible character. I think MonetDB should have stripped it before showing you the error message. My2cts, Wouter 2008/12/30 Niels Nes <Niels.Nes@cwi.nl>:

On Tue, Dec 30, 2008 at 12:38:54PM +0100, Wouter Alink wrote:
If "magic numbers" and/or "byte order marks" are indeed to be expected in *plain* text (csv) files, then yes, ortherwise, it could only mention that the file does not comply to the expected *plain* text format. Stefan's €0.02 ;-)
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |

On 30-12-2008 13:35:13 +0100, Stefan Manegold wrote:
Could argue that since we only support UTF-8, we should try to support UTF-8 files in all kinds of flavours, or not? % file ~/Desktop/ImportedCSV0.csv ~/Desktop/ImportedCSV0.csv: UTF-8 Unicode (with BOM) text, with CRLF line terminators
participants (6)
-
Alfred Nordman
-
Fabian Groffen
-
Martin Kersten
-
Niels Nes
-
Stefan Manegold
-
Wouter Alink