
Hi Sjoerd, I tried to used iconv with no luck, I am getting always empty string. I assumed the encoding that i am getting from sprintf are in "ISO-8859-1" Can you please take a look at the following implementation: str UDFyearbracket(str *ret, const date *v) { if (*v == date_nil) { *ret = GDKstrdup(str_nil); } else { iconv_t cv = iconv_open("UTF-8", "ISO-8859-1"); int factor = 4; size_t fromlen, tolen; int year; char *buf; char *retChar = (char *)*ret; fromdate(*v, NULL, NULL, &year); buf = (char *) GDKmalloc(15); sprintf(buf, "%d", year); fromlen = strlen(buf); tolen = factor * fromlen + 1; retChar = (char *) GDKmalloc(tolen); iconv(cv, &buf, &fromlen, &retChar, &tolen); iconv_close(cv); } return MAL_SUCCEED; } Thanks On Thu, Dec 29, 2016 at 1:08 PM, Sjoerd Mullender <sjoerd@monetdb.org> wrote:
Since the MonetDB server is UTF-8 *only*, you should *never* have non-UTF-8 strings inside the server. If you have strings in some other encoding, they should be converted to UTF-8 by whatever client program you're using. mclient has options to do this (-e option). If you want to do conversions yourself, take a look at the iconv related code in common/stream/stream.c. Also, the console_read and console_write functions in that file can give you inspiration. They convert Windows wide characters (16-bit encodings of Unicode code points) to and from UTF-8. This would be close to converting ints to UTF-8.
Hi again Sjoerd,
After digging in the code I found the GDKstrFromStr, does this function handle conversion from a normal string to UTF8_string? Is this the correct syntax to use the function:
str UDFyearbracket(str *ret, const date *v) { if (*v == date_nil) { *ret = GDKstrdup(str_nil); } else { int year; fromdate(*v, NULL, NULL, &year); *ret = (str) GDKmalloc(15); sprintf(*ret, "%d", year); GDKstrFromStr((unsigned char *)*ret, (unsigned char *)*ret, 15); } return MAL_SUCCEED; }
Thank you.
On Wed, Dec 28, 2016 at 11:40 PM, imad hajj chahine <imad.hajj.chahine@gmail.com <mailto:imad.hajj.chahine@gmail.com>> wrote:
Thank you Sjoerd,
Any idea how to convert an integer to UTF-8 string, does sprintf come with a variation that can handle UTF-8?
Thank you.
On Wed, Dec 28, 2016 at 11:08 PM, Sjoerd Mullender <sjoerd@monetdb.org <mailto:sjoerd@monetdb.org>> wrote:
See https://dev.monetdb.org/hg/MonetDB-extend/ <https://dev.monetdb.org/hg/MonetDB-extend/> for a tutorial on how to create a UDF in C. You can use the URL to clone from.
On 12/28/2016 09:28 PM, Alberto Ferrari wrote: > Imad, I hope your success with this. Please comment if you get it, and > then, could those new functions incorporate to future version of Monet? > Or maybe easily compiled to current? So in the future users may suggest > new useful functions (shame about SQL UDF performance) > > Regards! > > 2016-12-28 14:48 GMT-03:00 imad hajj chahine > <imad.hajj.chahine@gmail.com <mailto:imad.hajj.chahine@gmail.com> <mailto:imad.hajj.chahine@gmail.com <mailto:imad.hajj.chahine@gmail.com>>>: > > Hi, > > After reviewing all the other alternatives like SQL and Python UDF, > I was either stuck on performance with SQL UDF or on usability with > Python UDF (unable to use with aggregation, and not such great > performance with dates), > > so I decided to go the hard way with C functions, as a bonus it will > give me the possibility to change the functionalities without > worrying about dependencies, which was not the case in other languages. > > The purpose is to create a set of formatting functions for Year, > Quarter, Month, Week and Day brackets, and of course i need to > create the bulk version of each function for performance. > > Starting from the MTIMEdate_extract_year_bulk, now i have
On 12/29/2016 01:10 AM, imad hajj chahine wrote: the simple
> function working, and successfully calling it from mclient: > / > / > /str/ > /UDFyearbracket(str *ret, const date *v)/ > /{/ > /if (*v == date_nil) {/ > /*ret = GDKstrdup(str_nil);/ > /} else {/ > /int year;/ > /fromdate(*v, NULL, NULL, &year);/ > /*ret = (str) GDKmalloc(15);/ > /sprintf(*ret, "%d", year);/ > /}/ > /return MAL_SUCCEED;/ > /}/ > > > For the bulk version i get an error in the log:
gdk_atoms.c:1345:
> strPut: Assertion `(v[i] & 0x80) == 0' failed. > /str/ > /UDFBATyearbracket(bat *ret, const bat *bid)/ > /{/ > /BAT *b, *bn;/ > /BUN i,n;/ > /str *y;/ > /const date *t;/ > / > / > /if ((b = BATdescriptor(*bid)) == NULL)/ > /throw(MAL, "UDF.BATyearbracket", "Cannot access descriptor");/ > /n = BATcount(b);/ > / > / > /bn = COLnew(b->hseqbase, TYPE_str, BATcount(b),
TRANSIENT);/
> /if (bn == NULL) {/ > /BBPunfix(b->batCacheid);/ > /throw(MAL, "UDF.BATyearbracket", "memory allocation failure");/ > /}/ > /bn->tnonil = 1;/ > /bn->tnil = 0;/ > / > / > /t = (const date *) Tloc(b, 0);/ > /y = (str *) Tloc(bn, 0);/ > /for (i = 0; i < n; i++) {/ > /if (*t == date_nil) {/ > /*y = GDKstrdup(str_nil);/ > /} else/ > /UDFyearbracket(y, t);/ > /if (strcmp(*y, str_nil) == 0) {/ > /bn->tnonil = 0;/ > /bn->tnil = 1;/ > /}/ > /y++;/ > /t++;/ > /}/ > / > / > /BATsetcount(bn, (BUN) (y - (str *) Tloc(bn, 0)));/ > / > / > /bn->tsorted = BATcount(bn)<2;/ > /bn->trevsorted = BATcount(bn)<2;/ > / > / > /BBPkeepref(*ret = bn->batCacheid);/ > /BBPunfix(b->batCacheid);/ > /return MAL_SUCCEED;/ > /}/ > > PS: I am not a c expert but i can find my way with basic
operations
> and pointers. > > Any help or suggestions is appreciated. > > Thank you. > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> <mailto:users-list@monetdb.org <mailto:users-list@monetdb.org>> > https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list> > <https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list>> > > > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list> >
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list>
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list