Re: NThreads Parameter and the Performance Related to the Number of Database Instance

10 Apr 2015


      On 15-04-09 03:10 AM, Martin Kersten wrote:
...
Hi
...
Hello MonetDB Users,
In short:
1) What does the nthreads setting mean?
It is the minimal number of worker threads per mserver5 instance and the basis for multi-core
On 09/04/15 07:53, Joseph Mate wrote:
parallel processing. The number of worker threads can be set as a commandline parameter.
Can multiple worker threads work on a single query?
...
...
2) Why does performance increase as you increase the number of database instances of a single farm? Is there any way to avoid this?
No immediate answer. Your experiment seems a mixture of hot/cold runs, which affect file system caches.
The small database size leads to keeping all data more-or-less in memory.
I forgot to mention that I allowed the TPCH workload to run for five 
minutes before taking measurements to ensure measurements were taken 
during steady state.

Thank you!
Joseph
...
regards, Martin
...
In long:
What does the nthreads setting mean? From the manpage[1] it's the number of worker threads that perform main processing. Is this the total number of threads for that database instance? Or is it per query? I've compared the performance of different
settings. I find it strange that nthreads=8 would perform the best because I have 12 cores. I confirmed the number of cores by checking nproc. Here are my plots showing that 8 threads performs best on my system:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs...
Average Response Ttime: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-...
99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
The second thing is: why does the performance increase if I distribute the conncurent clients over multiple database instances on the same data farm? I expected the opposite; performance should decrease. I thought I would increase the overhead by adding
more database instances to the same farm. Is there any way to avoid this? Below are the plots I have. I used 40 concurrent TPC-H clients for all:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs...
Average Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-...
99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
Context:
To give you guys some context on my project: I'm a Master's student doing my research on DBaaS tenant placement. I am evaluating some placement algorithms by using the TPC-H workload and MonetDB as the database. I am using separate database instances on a
single farm for isolation.
The workload I'm testing with runs read only queries from the TPC-H benchmark. Each TPC-H client is a thread each with its own persistent connection to the database running the following pseudocode:
while true
    for queryNum 1 ... 22 # except query 15 which creates a tmp table
      run queury queryNum
Each database instance has 100MB of data.
Thank you for your time,
Joseph
References:
[1] https://www.monetdb.org/Documentation/monetdb-man-page
_______________________________________________
users-list mailing list
users-list at monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list