Dear Tapomay,
            
            The error message is quite specific and certainly calls for
            the
            SQL schema/query to analyse. You might sent me the MAL plan
            of that
            query (using EXPLAIN command at SQL level).
            
            regards, Martin
            
            On 2/11/13 3:53 PM, Tapomay Dey wrote:
            > After analysing the logs I see the following as a lone
            single ERR
            > statement different from rest of the repeating ERR logs
            stating
            > (crashed, manual intervention needed):
            > 2013-02-10 21:09:57 ERR msearch_stats_db[3073]:
            mserver5:
            > mal_dataflow.c:587: runMALdataflow: Assertion
            `workers[0]' failed.
            >
            > I also see "ERR merovingian[16831]: client error:
            unknown or impossible
            > state: 4" in the later stages.
            >
            > Thanks and Regards,
            > Tapomay.
            >
            >
            ------------------------------------------------------------------------
            > *From:* Tapomay Dey <
tapomay@yahoo.com>
            > *To:* Communication channel for MonetDB users <
users-list@monetdb.org>
            > *Sent:* Monday, February 11, 2013 4:46 PM
            > *Subject:* Re: monetdb status health
            >
            > Great info. Thanks a lot.
            > I am going to keep my db in this state. I will be able
            to perform your
            > suggestions tomorrow.
            > Until then this is what I see in the logs when I do a
            fresh monetdb start:
            > 2013-02-11 10:49:46 MSG merovingian[31990]: database
            'msearch_stats_db'
            > (32018) was killed by signal SIGSEGV
            > 2013-02-11 10:49:46 ERR control[31990]: (local): failed
            to fork mserver:
            > database 'msearch_stats_db' has crashed after starting,
            manual
            > intervention ...
            >
            > FYI I have not included my UDF. I have done a basic
            configure-make-make
            > install with no modifications/extra options on Ubuntu
            12.10 64
            > bit.(mercurial changeset: 46861:45c89b2e2ac2 Wed Feb 06
            11:42:37)
            > Usage profile: There is a constant transactional
            insert/update load of
            > the order of 100 update attempts per second. There is a
            "select 1;"
            > fired every 10 seconds to check if DB is alive. There
            has been no
            > significant select load yet(we are still loading
            historical data into
            > the db).
            >
            > Thanks and Regards,
            > Tapomay.
            >
            >
            >
            ------------------------------------------------------------------------
            > *From:* Stefan Manegold <
Stefan.Manegold@cwi.nl>
            > *To:* Communication channel for MonetDB users <
users-list@monetdb.org>
            > *Sent:* Monday, February 11, 2013 12:17 PM
            > *Subject:* Re: monetdb status health
            >
            > Tapomay,
            >
            > in addition to what Martin suggests, please consider
            also checking the
            > merovingian log for all details, in particular any
            server error messages.
            > To understand the cause of the problem, it is crucial
            to know the exact
            > error messages, in fact the exact sequence of events
            that led to the
            > current situation.
            > So, what did you do when (or just before) the server
            crashed first on
            > your database? What did you do then? Each step (and its
            outcome
            > including both error messages on the client and server
            errors in the
            > merovingian log) is important.
            >
            > If you use a genuine (i.e., non-modified) Feb2013 code
            base, the problem
            > (obviously) exists in that code. If you modified the
            code locally (e.g.,
            > by adding UDFs), the problem might also be in your
            code.
            >
            > You might also what to consider building a debug
            version (i.e.,
            > configured with --disable-optimize --enable-debug
            --enable-assert),
            > start mserver5 by hand on you database using the exact
            command line as
            > given in the merovingian log (possibly also in a
            debugger), and see
            > where (and why?) the crash occurs.
            >
            > Best,
            > Stefan
            >
            > ----- Original Message -----
            >  > Dear Tapomay,
            >  >
            >  > Taking the non-released revision is indeed
            "living on the edge".
            >  >
            >  > On 2/11/13 6:44 AM, Tapomay Dey wrote:
            >  > > 1. BTW I am running a non-released revision
            of Feb13 branch. Could
            >  > > this
            >  > > be the reason for such a crash?
            >  > > I am doing so coz I need a fix that Niels
            had made for fixing a
            >  > > concurrency issue that caused duplicate
            keys.
            >  > > Also planning to implement group_concat UDF
            as per the changed
            >  > > semantics
            >  > > of Feb13. I already have a partially running
            one for Oct12.
            >  > There are a few cases known where it may crash,
            it is worked upon.
            >  > In the testweb you can find the few cases. In
            general, you would be
            >  > the unlucky guy if you hit on them immediately.
            They seem rare.
            >  > >
            >  > > 2. As the DB crashes each time I try to
            start it I think its a
            >  > > perfect
            >  > > state to gather more diagnostics. How do I
            do so?
            >  > > I really need that a DB never reaches a
            non-recoverable state.
            >  > If it never passes the initialization phase after
            restart, it is most
            >  > likely a corrupted database. This could happen as
            a result of a
            >  > hardware
            >  > failure, or an unknown error software error that
            caused a crash.
            >  > It may be your UDF that went haywire and caused
            the system to loose.
            >  > If it crashes without your UDF, then a run of the
            mserver using gdb
            >  > may provide a hint on the whereabouts
            >  > (see calling sequence in meriovingian.log to
            start mserver directly)
            >  >
            >  > My approach would now be:
            >  > 1) restore database from backup (or a small
            testdb)
            >  > 2) ensure it is working correctly without your
            UDF
            >  > 3) prepare test cases for your UDF
            >  > 4) add your UDF
            >  > 5) start/stop after the first few calls of code
            with UDF
            >  > to observe behavior.
            >  >
            >  > Success, Martin
            >  >
            >  > >
            >  > > My setup is such that there would be
            non-stop Inserts/updates into
            >  > > the
            >  > > DB 24/7.
            >  > >
            >  > > Thanks and Regards,
            >  > > Tapomay.
            >  > >
            >  > >
            >
            ------------------------------------------------------------------------
            >  > > *From:* Tapomay Dey <
tapomay@yahoo.com
            <mailto:
tapomay@yahoo.com>>
            >  > > *To:* Communication channel for MonetDB
            users
            >  > > <
users-list@monetdb.org
            <mailto:
users-list@monetdb.org>>
            >  > > *Sent:* Monday, February 11, 2013 10:47 AM
            >  > > *Subject:* Re: monetdb status health
            >  > >
            >  > > Thanks a lot.
            >  > > But since the time I asked the question the
            DB has gone into a
            >  > > state
            >  > > where it keeps logging
            >  > > 2013-02-11 04:12:40 ERR merovingian[15380]:
            client error: database
            >  > > 'msearch_stats_db' has crashed after
            starting, manual intervention
            >  > > needed, check monetdbd's logfile for details
            >  > >
            >  > > in merovingian.log.
            >  > >
            >  > > Health is 1%.
            >  > >
            >  > > What can I do at this stage?
            >  > >
            >  > > Thanks and Regards,
            >  > > Tapomay.
            >  > >
            >  > >
            >
            ------------------------------------------------------------------------
            >  > > *From:* Fabian Groffen <
fabian@monetdb.org
            <mailto:
fabian@monetdb.org>>
            >  > > *To:* 
users-list@monetdb.org
            <mailto:
users-list@monetdb.org>
            >  > > *Sent:* Sunday, February 10, 2013 11:33 PM
            >  > > *Subject:* Re: monetdb status health
            >  > >
            >  > > On 10-02-2013 09:44:09 -0800, Tapomay Dey
            wrote:
            >  > >  > My questions are simple:
            >  > >  >
            >  > >  > what causes crashes?
            >  > >
            >  > > The mserver5 (monetdb database) terminates
            in such a way that it
            >  > > can
            >  > > not be considered a clean shutdown, this is
            usually the case when
            >  > > the
            >  > > program gets terminated due to a condition
            that makes further
            >  > > execution
            >  > > impossible, e.g. memory faults.  These are
            almost always program
            >  > > errors.
            >  > >
            >  > >  > what is health?
            >  > >
            >  > > Health is the percentage of start-stop
            sequences compared to the
            >  > > number
            >  > > of times the database was actually started. 
            E.g. how many times a
            >  > > start
            >  > > was followed by a clean shutdown (hence no
            crash).
            >  > >
            >  > >  > how do we stop health from degrading?
            >  > >
            >  > > You can't, a database that crashes, and
            keeps on doing so will
            >  > > cause the
            >  > > health of the database to degrade.
            >  > >
            >  > >  > Following is the status of my db-
            >  > >  >  start count: 140
            >  > >  >  stop count: 1
            >  > >  >  crash count: 138
            >  > >
            >  > > So, essentially, every time you start your
            database, it never
            >  > > reaches a
            >  > > point where you stop it cleanly, but instead
            your database crashes
            >  > > all
            >  > > the time.
            >  > >
            >  > >
            >  > > --
            >  > > Fabian Groffen 
fabian@monetdb.org
            <mailto:
fabian@monetdb.org>
            > <mailto:
fabian@monetdb.org
            <mailto:
fabian@monetdb.org>>
            >  > > column-store pioneer 
http://www.monetdb.org/Home
            >  > >
            _______________________________________________
            >  > > users-list mailing list
            >  > > 
users-list@monetdb.org
            <mailto:
users-list@monetdb.org>
            > <mailto:
users-list@monetdb.org
            <mailto:
users-list@monetdb.org>>
            >  > > 
http://mail.monetdb.org/mailman/listinfo/users-list
            >  > >
            >  > >
            >  > >
            >  > >
            _______________________________________________
            >  > > users-list mailing list
            >  > > 
users-list@monetdb.org
            <mailto:
users-list@monetdb.org>
            > <mailto:
users-list@monetdb.org
            <mailto:
users-list@monetdb.org>>
            >  > > 
http://mail.monetdb.org/mailman/listinfo/users-list
            >  > >
            >  > >
            >  > >
            >  > >
            >  > >
            _______________________________________________
            >  > > users-list mailing list
            >  > > 
users-list@monetdb.org
            <mailto:
users-list@monetdb.org>
            >  > > 
http://mail.monetdb.org/mailman/listinfo/users-list
            >  > >
            >  > _______________________________________________
            >  > users-list mailing list
            >  > 
users-list@monetdb.org
            <mailto:
users-list@monetdb.org>
            >  > 
http://mail.monetdb.org/mailman/listinfo/users-list
            >  >
            >
            > --
            > | 
Stefan.Manegold@CWI.nl
            <mailto:
Stefan.Manegold@CWI.nl>
            | DB
            > Architectures  (DA) |
            > | 
www.CWI.nl/~manegold/  | Science Park 123 (L321) |
            > | +31 (0)20 592-4212    | 1098 XG Amsterdam  (NL) |
            >
            > _______________________________________________
            > users-list mailing list
            > 
users-list@monetdb.org
            <mailto:
users-list@monetdb.org>
            > 
http://mail.monetdb.org/mailman/listinfo/users-list
            >
            >
            >
            > _______________________________________________
            > users-list mailing list
            > 
users-list@monetdb.org
            <mailto:
users-list@monetdb.org>
            > 
http://mail.monetdb.org/mailman/listinfo/users-list
            >
            >
            >
            >
            > _______________________________________________
            > users-list mailing list
            > 
users-list@monetdb.org
            > 
http://mail.monetdb.org/mailman/listinfo/users-list
            >
            _______________________________________________
            users-list mailing list
            
users-list@monetdb.org
            http://mail.monetdb.org/mailman/listinfo/users-list