
Hi Joseph, a few questions and comments on your observations: Which version of MonetDB are you using? With some older versions nthreads was indeed (incorrectly) the number of threads per client context, however, with recent versions it's the number of total worker threads of the server, i.e., the number of threads that do the actual query evaluation. Performance, specifically parallel performance and in particular multi-core performance is everthing but simple, surely in a data management context where data access does play a more important role than pure compute power. Thus, more cores/threads does not necessarily mean higher performance. A first question in order to be able to interpret your results is whether your machine has 12 physical cores or 12 virtual cores (due to hyperthreading), i.e., only 6 physical cores. If the latter, did you also try with 6 threads? Hyperthreading helps to improve performance only of some threads stall due to resource stall, and thus other threads can use the idle instruction units. Also, is your machine a single socket or a multi-socket machine, i.e., could NUMA effects play a role? Finally, is your machine idle except for the database server? Where do your clients run? on the same machine? Then they might also impact your results, in particular when trying to use all 12 cores for the server, which is then competing with the clients. In particular with your tiny 100MB database and thus fast query responses, clients will by rather busy to constantly receive results and issue new queries. One explanation could be that with 8 server threads, an "equlibrium" is reached such that 8 core are busy with server threads, and 4 with clients. The peak performance with exactly 4 clients IMHO support this idea. Did you check your system load? To fully understand the behavior, you might als want to use numbers of server threads and clients that are not powers of two --- and not only because your machine has a non power of two number of cores. For the multi DB instance, I don't have any idea, yet. However, multiple DB instance also means multiple servers, thus I assume your thread number are per server? Also, your data will be replicated and data access on disk and in memory will be different... Best, Stefan ----- Original Message -----
Hello MonetDB Users,
In short: 1) What does the nthreads setting mean? 2) Why does performance increase as you increase the number of database instances of a single farm? Is there any way to avoid this?
In long: What does the nthreads setting mean? From the manpage[1] it's the number of worker threads that perform main processing. Is this the total number of threads for that database instance? Or is it per query? I've compared the performance of different settings. I find it strange that nthreads=8 would perform the best because I have 12 cores. I confirmed the number of cores by checking nproc. Here are my plots showing that 8 threads performs best on my system:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Ttime: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
The second thing is: why does the performance increase if I distribute the conncurent clients over multiple database instances on the same data farm? I expected the opposite; performance should decrease. I thought I would increase the overhead by adding more database instances to the same farm. Is there any way to avoid this? Below are the plots I have. I used 40 concurrent TPC-H clients for all:
Throughput: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/throughput-vs... Average Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-avg-vs-... 99th Percentile Response Time: https://cs.uwaterloo.ca/~jmate/nthreads-and-database-instances/rsptm-p99-vs-...
Context: To give you guys some context on my project: I'm a Master's student doing my research on DBaaS tenant placement. I am evaluating some placement algorithms by using the TPC-H workload and MonetDB as the database. I am using separate database instances on a single farm for isolation.
The workload I'm testing with runs read only queries from the TPC-H benchmark. Each TPC-H client is a thread each with its own persistent connection to the database running the following pseudocode:
while true for queryNum 1 ... 22 # except query 15 which creates a tmp table run queury queryNum
Each database instance has 100MB of data.
Thank you for your time, Joseph
References: [1] https://www.monetdb.org/Documentation/monetdb-man-page
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |