Re: ALTER TABLE ALTER COLUMN SET STORAGE

-------- Forwarded Message -------- Subject: Re: ALTER TABLE ALTER COLUMN SET STORAGE Date: Tue, 10 Sep 2019 15:15:09 +0200 From: aris <aris.koning@monetdbsolutions.com> To: developers-list-request@monetdb.org Hi Daniel, On 10-09-19 12:00, developers-list-request@monetdb.org wrote: library into an existing release out of the box, then the answer is no. There are some slight modifications in the GDK layer to accommodate the Mosaic module. And to interact with it from SQL, there are also some code changes in the SQL layer. But besides those dependencies, I don't expect any issue with the particular (x)api frameworks. But nothing is guaranteed obviously. It sounds like you want to hack-back port it into custom builds of earlier releases. I wouldn't give it a zero change of success but I do wish you much luck :)
Your observation about the joint life cycle of a Mosaic structure and its original column file is correct: currently Mosaic adds a compressed representation next to the existing uncompressed column. For the first milestone on the Mosaic road map we want to successfully apply compression on READ-ONLY pre-existing columns where the purpose of compression is to potentially accelerate analytical queries on these columns. However we are still looking into potentially freeing the uncompressed column once a compressed Mosaic heap is available. This would accommodate compression for the more traditional sake of limiting memory- and/or disk footprint.
I think it is an interesting idea. But I think it is part of a more general goal/problem of how to handle updates on compressed data. There are internal discussions on this topic. But whatever the outcome, this will be only relevant for a much later milestone on the road map.
Thank you, Dan
Hope it helps. Kind regards, Aris

Thank you so much for your answer, Aris. The good news is that compression is considered and is going to be part of MonetDB in two releases or so (one year). I will try to see if I can use the current code in a custom build, I am quite curious how that will affect performance. To be honest I don't expect performance issues, nowadays even on SSD era, it is still faster to read compressed data and decompress in memory if the right ratio is there, of course. And I do expect decent compression ratio on most of the data. However, at this stage the MOSAIC's dual compressed - uncompressed storage will obviously not give me the gain I need. Yet, it is interesting to understand at least how query performance might look in the future. Bad news is of course the compression feature is going to happen in ... one year. But if it comes also with support for compressing in memory results (I know, it wasnt promised), it might worth to wait. Best regards, On Wed, Sep 11, 2019 at 12:18 PM aris <aris.koning@monetdbsolutions.com> wrote:

Hello Aris, With the latest code adjustments (thank you for that), the library does what supposed to do. For whom may want to try I confirm that it can be integrated as-is in a recent version with a decent effort (it requires a bit of understanding of MonetDB code logic). Here are some remarks/questions I have after checking out a bit the code (from extensibility perspective) and its functionality so far. I am curious if the reason behind a constant MOSAICMAXCNT used for all types is related to thetaselects or projections implementations (do the blocks need to have the counters aligned?). Truth is that for most of the compressing techniques I’ve look into (not from MOSAIC project, but lz4, zstd, blosc2 as general compressors, zp, sz for floating numbers, Daniel Lemire’s compression for integers, dictionaries Judy, Art for strings) the size of the to-be-compressed block is usually the one that matters the most. Block sizes that would fit in L1, L2, L3 cache usually provide better results for certain compressors. Sure, a constant block counter matters per BAT, but would be a problem to have for instance something like MOSAICMAXCNT/(sizeof(type) or twidth ) ? MOSAIC pipe optimizer, being derived from sequential one, will inherit the performance penalties of the latest, yet after I checked the results of mosaic.analysis, I am not even sure that those with a factor close to 1 were actually considered for compression ( I guess RAW alternative is used if gain is under certain threshold) therefore I have no clue how reliable the testing benchmarks are (for those cases do I mostly test RAW codec?). Anyway, for what I have tested. times seem to roughly double when mosaic pipe is used, being only slightly slower than sequential_pipe. I assume that a complex, constant size header for all compressions ( MOSAICHEADER) is convenient to maintain, but I am wondering how that will look if more compressors are about to be added. Is it mandatory to have common header size for all mosaic heaps? Thank you, On Wed, Sep 11, 2019 at 1:53 PM Daniel Zvinca <daniel.zvinca@logbis.com> wrote:
participants (2)
-
aris
-
Daniel Zvinca