
Hi, I would appreciate some help interpreting the following memory-related issues. I've got a mserver5 instance running with cgroups v1 constraints - memory.limit_in_bytes = 17179869184 (16g) - memory.memsw.limit_in_bytes = 9223372036854771712 gdk_mem_maxsize is initialised as 0.815 * 17179869184 = 14001593384. So I get: sql>select * from env() where name in ('gdk_mem_maxsize', 'gdk_vm_maxsize'); +-----------------+---------------+ | name | value | +=================+===============+ | gdk_vm_maxsize | 4398046511104 | | gdk_mem_maxsize | 14001593384 | +-----------------+---------------+ That looks good. To my surprise, this instance gets frequently OOM-killed for reaching 16g of RSS (no swap used): memory: usage 16777216kB, limit 16777216kB, failcnt 244063804 memory+swap: usage 16777964kB, limit 9007199254740988kB, failcnt 0 kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Memory cgroup out of memory: Kill process 975803 (mserver5) score 1003 or sacrifice child Now, there are two different aspects: giving a process a memory cap and making the process respect that cap without getting killed. - if the process allocates more than defined with cgroups, then it gets killed. That is fine, it doesn't surprise me - the question is: why did monetDB surpass the 16g limit? Even more surprising, given that it "prudently" initialises itself at 80% of the available memory. Perhaps I was under the wrong assumption that MonetDB would never allocate more than gdk_mem_maxsize, but now I seem to realise that it simply uses this value to optimise its memory management (e.g. to decide how early to mmap). So, am I correct that setting gdk_mem_maxsize (indirectly via gcroups or directly via memmaxsize parameter) does not guarantee rss memory will stay underthat value? If that is true, I am back at square 1 in my quest for how to cap rss usage (without getting the process killed). Thanks for your help. Roberto

We do not in any way control RSS (resident set size). That is fully under control of the kernel. gdk_mem_maxsize is the maximum amount of address space we want to allocate using malloc and friends. gdk_vm_maxsize is the maximum amount of address space we want to allocate (malloc + mmap). Neither value has anything to do with how much actual, physical memory is being used. They are just measures of how much address space is used, allocated either through malloc or malloc+mmap. Allocated address space may or may not reside in physical memory. The kernel decides that. Of course, if you're using the address space (however you got it), there must be physical memory to which the address space is mapped. The difference between malloc and mmap is mostly where the physical, disk-spaced backing (if any) for the virtual memory is located, i.e. where the kernel can copy the memory to if it needs space. In the case of mmap (our use of it, anyway) it is files in the file system, and in the case of malloc it is swap (if you have it) or physical memory (if you don't). On 09/03/2021 16.58, Roberto Cornacchia wrote:
Hi,
I would appreciate some help interpreting the following memory-related issues.
I've got a mserver5 instance running with cgroups v1 constraints
- memory.limit_in_bytes = 17179869184 (16g) - memory.memsw.limit_in_bytes = 9223372036854771712
gdk_mem_maxsize is initialised as 0.815 * 17179869184 = 14001593384.
So I get: sql>select * from env() where name in ('gdk_mem_maxsize', 'gdk_vm_maxsize'); +-----------------+---------------+ | name | value | +=================+===============+ | gdk_vm_maxsize | 4398046511104 | | gdk_mem_maxsize | 14001593384 | +-----------------+---------------+
That looks good.
To my surprise, this instance gets frequently OOM-killed for reaching 16g of RSS (no swap used):
memory: usage 16777216kB, limit 16777216kB, failcnt 244063804 memory+swap: usage 16777964kB, limit 9007199254740988kB, failcnt 0 kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Memory cgroup out of memory: Kill process 975803 (mserver5) score 1003 or sacrifice child
Now, there are two different aspects: giving a process a memory cap and making the process respect that cap without getting killed.
- if the process allocates more than defined with cgroups, then it gets killed. That is fine, it doesn't surprise me - the question is: why did monetDB surpass the 16g limit?
Even more surprising, given that it "prudently" initialises itself at 80% of the available memory.
Perhaps I was under the wrong assumption that MonetDB would never allocate more than gdk_mem_maxsize, but now I seem to realise that it simply uses this value to optimise its memory management (e.g. to decide how early to mmap).
So, am I correct that setting gdk_mem_maxsize (indirectly via gcroups or directly via memmaxsize parameter) does not guarantee rss memory will stay underthat value?
If that is true, I am back at square 1 in my quest for how to cap rss usage (without getting the process killed).
Thanks for your help. Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender

Sjoerd, Thanks for these details. Let me focus on these two concepts:
Allocated address space may or may not reside in physical memory. The kernel decides that.
Absolutely. Still, you can decide what's the max that malloc() can use:
gdk_mem_maxsize is the maximum amount of address space we want to allocate using malloc and friends.
Isn't malloc() using RSS + swap to back its allocations? Does that mean that gdk_mem_maxsize should be a cap to what we want to be able to allocate on RSS + swap? In this case, actually, swap usage was 0. So I still don't understand why mallocs for 16g happened, when gdk_mem_maxsize was 14g. On Tue, 9 Mar 2021 at 17:27, Sjoerd Mullender <sjoerd@monetdb.org> wrote:
We do not in any way control RSS (resident set size). That is fully under control of the kernel.
gdk_mem_maxsize is the maximum amount of address space we want to allocate using malloc and friends. gdk_vm_maxsize is the maximum amount of address space we want to allocate (malloc + mmap). Neither value has anything to do with how much actual, physical memory is being used. They are just measures of how much address space is used, allocated either through malloc or malloc+mmap. Allocated address space may or may not reside in physical memory. The kernel decides that.
Of course, if you're using the address space (however you got it), there must be physical memory to which the address space is mapped.
The difference between malloc and mmap is mostly where the physical, disk-spaced backing (if any) for the virtual memory is located, i.e. where the kernel can copy the memory to if it needs space. In the case of mmap (our use of it, anyway) it is files in the file system, and in the case of malloc it is swap (if you have it) or physical memory (if you don't).
On 09/03/2021 16.58, Roberto Cornacchia wrote:
Hi,
I would appreciate some help interpreting the following memory-related issues.
I've got a mserver5 instance running with cgroups v1 constraints
- memory.limit_in_bytes = 17179869184 (16g) - memory.memsw.limit_in_bytes = 9223372036854771712
gdk_mem_maxsize is initialised as 0.815 * 17179869184 = 14001593384.
So I get: sql>select * from env() where name in ('gdk_mem_maxsize', 'gdk_vm_maxsize'); +-----------------+---------------+ | name | value | +=================+===============+ | gdk_vm_maxsize | 4398046511104 | | gdk_mem_maxsize | 14001593384 | +-----------------+---------------+
That looks good.
To my surprise, this instance gets frequently OOM-killed for reaching 16g of RSS (no swap used):
memory: usage 16777216kB, limit 16777216kB, failcnt 244063804 memory+swap: usage 16777964kB, limit 9007199254740988kB, failcnt 0 kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Memory cgroup out of memory: Kill process 975803 (mserver5) score 1003 or sacrifice child
Now, there are two different aspects: giving a process a memory cap and making the process respect that cap without getting killed.
- if the process allocates more than defined with cgroups, then it gets killed. That is fine, it doesn't surprise me - the question is: why did monetDB surpass the 16g limit?
Even more surprising, given that it "prudently" initialises itself at 80% of the available memory.
Perhaps I was under the wrong assumption that MonetDB would never allocate more than gdk_mem_maxsize, but now I seem to realise that it simply uses this value to optimise its memory management (e.g. to decide how early to mmap).
So, am I correct that setting gdk_mem_maxsize (indirectly via gcroups or directly via memmaxsize parameter) does not guarantee rss memory will stay underthat value?
If that is true, I am back at square 1 in my quest for how to cap rss usage (without getting the process killed).
Thanks for your help. Roberto
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

On 09/03/2021 17.44, Roberto Cornacchia wrote:
Sjoerd,
Thanks for these details.
Let me focus on these two concepts:
Allocated address space may or may not reside in physical memory. The kernel decides that.
Absolutely. Still, you can decide what's the max that malloc() can use:
gdk_mem_maxsize is the maximum amount of address space we want to allocate using malloc and friends.
GDK_mem_maxsize is not a hard limit. It's a value on which some decisions having to do with allocating memory are based. GDK_vm_maxsize is a fairly hard limit. We may go over it in critical code (during transaction commit), but otherwise allocations (both malloc and mmap) will fail if you were to go over this limit.
Isn't malloc() using RSS + swap to back its allocations? Does that mean that gdk_mem_maxsize should be a cap to what we want to be able to allocate on RSS + swap? In this case, actually, swap usage was 0.
So I still don't understand why mallocs for 16g happened, when gdk_mem_maxsize was 14g.
This 16g is a combination of what was allocated (and used) through malloc and through mmap. Forget about memory being a malloc thing. Mmap also uses memory. Why swap is unused I don't know. As I said, it's the kernel that does that. We have nothing to do with that. It may be you have kernel parameters set that cause the kernel to not use swap. By using the debugger you can check how much MonetDB thinks it has allocated. Look at the values of the variables GDK_mallocedbytes_estimate and GDK_vm_cursize. But again, that is allocated address space, not memory. And there may be fragmentation as well, so the amount of address space in use by the process may well be higher (of course these numbers don't take space for declared variables into account).
On Tue, 9 Mar 2021 at 17:27, Sjoerd Mullender <sjoerd@monetdb.org <mailto:sjoerd@monetdb.org>> wrote:
We do not in any way control RSS (resident set size). That is fully under control of the kernel.
gdk_mem_maxsize is the maximum amount of address space we want to allocate using malloc and friends. gdk_vm_maxsize is the maximum amount of address space we want to allocate (malloc + mmap). Neither value has anything to do with how much actual, physical memory is being used. They are just measures of how much address space is used, allocated either through malloc or malloc+mmap. Allocated address space may or may not reside in physical memory. The kernel decides that.
Of course, if you're using the address space (however you got it), there must be physical memory to which the address space is mapped.
The difference between malloc and mmap is mostly where the physical, disk-spaced backing (if any) for the virtual memory is located, i.e. where the kernel can copy the memory to if it needs space. In the case of mmap (our use of it, anyway) it is files in the file system, and in the case of malloc it is swap (if you have it) or physical memory (if you don't).
On 09/03/2021 16.58, Roberto Cornacchia wrote: > Hi, > > I would appreciate some help interpreting the following memory-related > issues. > > I've got a mserver5 instance running with cgroups v1 constraints > > - memory.limit_in_bytes = 17179869184 (16g) > - memory.memsw.limit_in_bytes = 9223372036854771712 > > gdk_mem_maxsize is initialised as 0.815 * 17179869184 = 14001593384. > > So I get: > sql>select * from env() where name in ('gdk_mem_maxsize', 'gdk_vm_maxsize'); > +-----------------+---------------+ > | name | value | > +=================+===============+ > | gdk_vm_maxsize | 4398046511104 | > | gdk_mem_maxsize | 14001593384 | > +-----------------+---------------+ > > That looks good. > > > To my surprise, this instance gets frequently OOM-killed for reaching > 16g of RSS (no swap used): > > memory: usage 16777216kB, limit 16777216kB, failcnt 244063804 > memory+swap: usage 16777964kB, limit 9007199254740988kB, failcnt 0 > kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > Memory cgroup out of memory: Kill process 975803 (mserver5) score 1003 > or sacrifice child > > Now, there are two different aspects: giving a process a memory cap and > making the process respect that cap without getting killed. > > - if the process allocates more than defined with cgroups, then it gets > killed. That is fine, it doesn't surprise me > - the question is: why did monetDB surpass the 16g limit? > > Even more surprising, given that it "prudently" initialises itself at > 80% of the available memory. > > Perhaps I was under the wrong assumption that MonetDB would never > allocate more than gdk_mem_maxsize, but now I seem to realise that it > simply uses this value to optimise its memory management (e.g. to decide > how early to mmap). > > So, am I correct that setting gdk_mem_maxsize (indirectly via gcroups or > directly via memmaxsize parameter) does not guarantee rss memory will > stay underthat value? > > If that is true, I am back at square 1 in my quest for how to cap rss > usage (without getting the process killed). > > Thanks for your help. > Roberto > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list> >
-- Sjoerd Mullender _______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list>
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender

Thanks again for these details and for the gdk estimates to look at. To me what is important here is not much to understand why swap wasn't used, but the fact that no swap used actually simplifies things.
This 16g is a combination of what was allocated (and used) through malloc and through mmap. Forget about memory being a malloc thing. Mmap also uses memory.
The kernel seems to tell me that in this case the 16g used are physical RAM. Not swapped, not mmapped: memory: usage 16777216kB, limit 16777216kB, failcnt 244063804 memory+swap: usage 16777964kB, limit 9007199254740988kB, failcnt 0 kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Memory cgroup out of memory: Kill process 975803 (mserver5) score 1003 or sacrifice child Killed process 4134544 (mserver5), UID 0, total-vm:28382372kB, anon-rss:16776396kB, file-rss:13284kB, shmem-rss:0kB
GDK_mem_maxsize is not a hard limit. It's a value on which some decisions having to do with allocating memory are based.
This is actually what I was asking confirmation about. I'm not complaining of anything, but this is not really documented and I'm just trying to understand exactly which tools I have: is it a soft limit in the sense that it actually tries to stay more or less below this limit, or is it really only used to estimate other values? The point is not to force some exact behaviour, but to apply some resource management *without* having the server killed, and that is a big problem. On Tue, 9 Mar 2021 at 20:26, Sjoerd Mullender <sjoerd@monetdb.org> wrote:
On 09/03/2021 17.44, Roberto Cornacchia wrote:
Sjoerd,
Thanks for these details.
Let me focus on these two concepts:
Allocated address space may or may not reside in physical memory. The kernel decides that.
Absolutely. Still, you can decide what's the max that malloc() can use:
gdk_mem_maxsize is the maximum amount of address space we want to allocate using malloc and friends.
GDK_mem_maxsize is not a hard limit. It's a value on which some decisions having to do with allocating memory are based. GDK_vm_maxsize is a fairly hard limit. We may go over it in critical code (during transaction commit), but otherwise allocations (both malloc and mmap) will fail if you were to go over this limit.
Isn't malloc() using RSS + swap to back its allocations? Does that mean that gdk_mem_maxsize should be a cap to what we want to be able to allocate on RSS + swap? In this case, actually, swap usage was 0.
So I still don't understand why mallocs for 16g happened, when gdk_mem_maxsize was 14g.
This 16g is a combination of what was allocated (and used) through malloc and through mmap. Forget about memory being a malloc thing. Mmap also uses memory.
Why swap is unused I don't know. As I said, it's the kernel that does that. We have nothing to do with that. It may be you have kernel parameters set that cause the kernel to not use swap.
By using the debugger you can check how much MonetDB thinks it has allocated. Look at the values of the variables GDK_mallocedbytes_estimate and GDK_vm_cursize. But again, that is allocated address space, not memory. And there may be fragmentation as well, so the amount of address space in use by the process may well be higher (of course these numbers don't take space for declared variables into account).
On Tue, 9 Mar 2021 at 17:27, Sjoerd Mullender <sjoerd@monetdb.org <mailto:sjoerd@monetdb.org>> wrote:
We do not in any way control RSS (resident set size). That is fully under control of the kernel.
gdk_mem_maxsize is the maximum amount of address space we want to allocate using malloc and friends. gdk_vm_maxsize is the maximum amount of address space we want to allocate (malloc + mmap). Neither value has anything to do with how much actual, physical
memory
is being used. They are just measures of how much address space is used, allocated either through malloc or malloc+mmap. Allocated address space may or may not reside in physical memory. The kernel decides that.
Of course, if you're using the address space (however you got it), there must be physical memory to which the address space is mapped.
The difference between malloc and mmap is mostly where the physical, disk-spaced backing (if any) for the virtual memory is located, i.e. where the kernel can copy the memory to if it needs space. In the
case
of mmap (our use of it, anyway) it is files in the file system, and
in
the case of malloc it is swap (if you have it) or physical memory (if you don't).
On 09/03/2021 16.58, Roberto Cornacchia wrote: > Hi, > > I would appreciate some help interpreting the following memory-related > issues. > > I've got a mserver5 instance running with cgroups v1 constraints > > - memory.limit_in_bytes = 17179869184 (16g) > - memory.memsw.limit_in_bytes = 9223372036854771712 > > gdk_mem_maxsize is initialised as 0.815 * 17179869184
= 14001593384.
> > So I get: > sql>select * from env() where name in ('gdk_mem_maxsize', 'gdk_vm_maxsize'); > +-----------------+---------------+ > | name | value | > +=================+===============+ > | gdk_vm_maxsize | 4398046511104 | > | gdk_mem_maxsize | 14001593384 | > +-----------------+---------------+ > > That looks good. > > > To my surprise, this instance gets frequently OOM-killed for reaching > 16g of RSS (no swap used): > > memory: usage 16777216kB, limit 16777216kB, failcnt 244063804 > memory+swap: usage 16777964kB, limit 9007199254740988kB, failcnt 0 > kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > Memory cgroup out of memory: Kill process 975803 (mserver5) score 1003 > or sacrifice child > > Now, there are two different aspects: giving a process a memory cap and > making the process respect that cap without getting killed. > > - if the process allocates more than defined with cgroups, then it gets > killed. That is fine, it doesn't surprise me > - the question is: why did monetDB surpass the 16g limit? > > Even more surprising, given that it "prudently" initialises itself at > 80% of the available memory. > > Perhaps I was under the wrong assumption that MonetDB would never > allocate more than gdk_mem_maxsize, but now I seem to realise that it > simply uses this value to optimise its memory management (e.g. to decide > how early to mmap). > > So, am I correct that setting gdk_mem_maxsize (indirectly via gcroups or > directly via memmaxsize parameter) does not guarantee rss memory will > stay underthat value? > > If that is true, I am back at square 1 in my quest for how to cap rss > usage (without getting the process killed). > > Thanks for your help. > Roberto > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list> >
-- Sjoerd Mullender _______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list>
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

What I'm thinking now that is happening is this. I assume that your hardware has plenty of memory available and that the system as a whole is not under memory pressure. It is just this one container that is using up its allocated space. I think the kernel is here at fault in that it prefers to kill the process in the container to moving some of its memory pages to swap. I have no idea whether you can tell the kernel to either use more memory for that container (since there is probably still plenty around) or to swap pages of that container out. In either case, that is not something that mserver5 can do or control. The memory settings available to mserver5 are just how much virtual memory (or address space) it is allowed to use (hard limit), and an indication of how soon to switch from allocating memory for BATs using malloc to using memory-mapped files. In either case, physical memory is used for those BATs, and if the kernel is unwilling to relinquish that memory (sending pages to swap/disk) to make space within the container, then you will run into this problem. On 09/03/2021 21.08, Roberto Cornacchia wrote:
Thanks again for these details and for the gdk estimates to look at.
To me what is important here is not much to understand why swap wasn't used, but the fact that no swap used actually simplifies things.
This 16g is a combination of what was allocated (and used) through malloc and through mmap. Forget about memory being a malloc thing. Mmap also uses memory.
The kernel seems to tell me that in this case the 16g used are physical RAM. Not swapped, not mmapped:
memory: usage 16777216kB, limit 16777216kB, failcnt 244063804 memory+swap: usage 16777964kB, limit 9007199254740988kB, failcnt 0 kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Memory cgroup out of memory: Kill process 975803 (mserver5) score 1003 or sacrifice child Killed process 4134544 (mserver5), UID 0, total-vm:28382372kB, anon-rss:16776396kB, file-rss:13284kB, shmem-rss:0kB
GDK_mem_maxsize is not a hard limit. It's a value on which some decisions having to do with allocating memory are based.
This is actually what I was asking confirmation about. I'm not complaining of anything, but this is not really documented and I'm just trying to understand exactly which tools I have: is it a soft limit in the sense that it actually tries to stay more or less below this limit, or is it really only used to estimate other values?
The point is not to force some exact behaviour, but to apply some resource management *without* having the server killed, and that is a big problem.
On Tue, 9 Mar 2021 at 20:26, Sjoerd Mullender <sjoerd@monetdb.org <mailto:sjoerd@monetdb.org>> wrote:
On 09/03/2021 17.44, Roberto Cornacchia wrote: > Sjoerd, > > Thanks for these details. > > Let me focus on these two concepts: > > > Allocated address space may or may not reside in physical memory. > > The kernel decides that. > > Absolutely. > Still, you can decide what's the max that malloc() can use: > > > gdk_mem_maxsize is the maximum amount of address space we want to > > allocate using malloc and friends.
GDK_mem_maxsize is not a hard limit. It's a value on which some decisions having to do with allocating memory are based. GDK_vm_maxsize is a fairly hard limit. We may go over it in critical code (during transaction commit), but otherwise allocations (both malloc and mmap) will fail if you were to go over this limit.
> > Isn't malloc() using RSS + swap to back its allocations? Does that mean > that gdk_mem_maxsize should be a cap to what we want to be able to > allocate on RSS + swap? > In this case, actually, swap usage was 0. > > So I still don't understand why mallocs for 16g happened, > when gdk_mem_maxsize was 14g.
This 16g is a combination of what was allocated (and used) through malloc and through mmap. Forget about memory being a malloc thing. Mmap also uses memory.
Why swap is unused I don't know. As I said, it's the kernel that does that. We have nothing to do with that. It may be you have kernel parameters set that cause the kernel to not use swap.
By using the debugger you can check how much MonetDB thinks it has allocated. Look at the values of the variables GDK_mallocedbytes_estimate and GDK_vm_cursize. But again, that is allocated address space, not memory. And there may be fragmentation as well, so the amount of address space in use by the process may well be higher (of course these numbers don't take space for declared variables into account).
> > > On Tue, 9 Mar 2021 at 17:27, Sjoerd Mullender <sjoerd@monetdb.org <mailto:sjoerd@monetdb.org> > <mailto:sjoerd@monetdb.org <mailto:sjoerd@monetdb.org>>> wrote: > > We do not in any way control RSS (resident set size). That is fully > under control of the kernel. > > gdk_mem_maxsize is the maximum amount of address space we want to > allocate using malloc and friends. > gdk_vm_maxsize is the maximum amount of address space we want to > allocate (malloc + mmap). > Neither value has anything to do with how much actual, physical memory > is being used. They are just measures of how much address space is > used, allocated either through malloc or malloc+mmap. Allocated > address > space may or may not reside in physical memory. The kernel decides > that. > > Of course, if you're using the address space (however you got it), > there > must be physical memory to which the address space is mapped. > > The difference between malloc and mmap is mostly where the physical, > disk-spaced backing (if any) for the virtual memory is located, i.e. > where the kernel can copy the memory to if it needs space. In the case > of mmap (our use of it, anyway) it is files in the file system, and in > the case of malloc it is swap (if you have it) or physical memory (if > you don't). > > On 09/03/2021 16.58, Roberto Cornacchia wrote: > > Hi, > > > > I would appreciate some help interpreting the following > memory-related > > issues. > > > > I've got a mserver5 instance running with cgroups v1 constraints > > > > - memory.limit_in_bytes = 17179869184 (16g) > > - memory.memsw.limit_in_bytes = 9223372036854771712 > > > > gdk_mem_maxsize is initialised as 0.815 * 17179869184 = 14001593384. > > > > So I get: > > sql>select * from env() where name in ('gdk_mem_maxsize', > 'gdk_vm_maxsize'); > > +-----------------+---------------+ > > | name | value | > > +=================+===============+ > > | gdk_vm_maxsize | 4398046511104 | > > | gdk_mem_maxsize | 14001593384 | > > +-----------------+---------------+ > > > > That looks good. > > > > > > To my surprise, this instance gets frequently OOM-killed for > reaching > > 16g of RSS (no swap used): > > > > memory: usage 16777216kB, limit 16777216kB, failcnt 244063804 > > memory+swap: usage 16777964kB, limit 9007199254740988kB, failcnt 0 > > kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > > Memory cgroup out of memory: Kill process 975803 (mserver5) score > 1003 > > or sacrifice child > > > > Now, there are two different aspects: giving a process a memory > cap and > > making the process respect that cap without getting killed. > > > > - if the process allocates more than defined with cgroups, then > it gets > > killed. That is fine, it doesn't surprise me > > - the question is: why did monetDB surpass the 16g limit? > > > > Even more surprising, given that it "prudently" initialises > itself at > > 80% of the available memory. > > > > Perhaps I was under the wrong assumption that MonetDB would never > > allocate more than gdk_mem_maxsize, but now I seem to realise > that it > > simply uses this value to optimise its memory management (e.g. to > decide > > how early to mmap). > > > > So, am I correct that setting gdk_mem_maxsize (indirectly via > gcroups or > > directly via memmaxsize parameter) does not guarantee rss memory > will > > stay underthat value? > > > > If that is true, I am back at square 1 in my quest for how to cap > rss > > usage (without getting the process killed). > > > > Thanks for your help. > > Roberto > > > > _______________________________________________ > > users-list mailing list > > users-list@monetdb.org <mailto:users-list@monetdb.org> <mailto:users-list@monetdb.org <mailto:users-list@monetdb.org>> > > https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list> > <https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list>> > > > > -- > Sjoerd Mullender > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> <mailto:users-list@monetdb.org <mailto:users-list@monetdb.org>> > https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list> > <https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list>> > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list> >
-- Sjoerd Mullender _______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list>
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender

Thanks again Sjoerd, I appreciate your thoughts on this. Indeed, after the problem started I have tried to play with kernel parameters (swappiness, memory_overcommit, etc), unfortunately with no improvements. Then I moved all parameters back to normal (thoroughly double-checked). What is funny is that I have two more VMs that are the exact copy of this one (same OS, same running containers, same limits assigned to containers, same data, same versions of everything - they were created programmatically). The one difference is that these two have less RAM assigned to the VM. And they don't fail. Ironic, isn't it? More physical memory available makes me think that more memory is used by the OS as cache memory. In principle this should not affect the amounts counted as RSS, but it's really the only difference I can see. For the record, yesterday I tried to use oom_kill_disable option in the container as a workaround, to prevent a kill that seemed unjustified. That didn't go well. OOM was indeed not invoked, but the cgroups' hard limit was still there, which means the kernel still refused to let it allocate more memory than 16g. That resulted in: !ERROR:MALException:mal.interpreter:GDK reported error: GDKload: cannot read: name=02/25/22515, ext=theap, 8192 bytes missing. !ERROR:!OS: Cannot allocate memory !ERROR:!ERROR: GDKload: cannot read: name=01/53/15346, ext=theap, 8192 bytes missing. !ERROR:!OS: Cannot allocate memory I could try to set *only* the soft limit in the container. That is in principle a bit risky, as the container is allowed to go rogue with memory, but let's at least see what happens. To be clear: I understand the role of the kernel in this, and don't expect MonetDB developers to debug kernel behaviours. However, correct memory management is crucial for MonetDB, so the two are rather related in practice. I ask and comment here thinking that unraveling the issue could be useful to other users in similar conditions. On Thu, 11 Mar 2021 at 09:05, Sjoerd Mullender <sjoerd@monetdb.org> wrote:
What I'm thinking now that is happening is this. I assume that your hardware has plenty of memory available and that the system as a whole is not under memory pressure. It is just this one container that is using up its allocated space. I think the kernel is here at fault in that it prefers to kill the process in the container to moving some of its memory pages to swap. I have no idea whether you can tell the kernel to either use more memory for that container (since there is probably still plenty around) or to swap pages of that container out. In either case, that is not something that mserver5 can do or control.
The memory settings available to mserver5 are just how much virtual memory (or address space) it is allowed to use (hard limit), and an indication of how soon to switch from allocating memory for BATs using malloc to using memory-mapped files. In either case, physical memory is used for those BATs, and if the kernel is unwilling to relinquish that memory (sending pages to swap/disk) to make space within the container, then you will run into this problem.
On 09/03/2021 21.08, Roberto Cornacchia wrote:
Thanks again for these details and for the gdk estimates to look at.
To me what is important here is not much to understand why swap wasn't used, but the fact that no swap used actually simplifies things.
This 16g is a combination of what was allocated (and used) through malloc and through mmap. Forget about memory being a malloc thing. Mmap also uses memory.
The kernel seems to tell me that in this case the 16g used are physical RAM. Not swapped, not mmapped:
memory: usage 16777216kB, limit 16777216kB, failcnt 244063804 memory+swap: usage 16777964kB, limit 9007199254740988kB, failcnt 0 kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Memory cgroup out of memory: Kill process 975803 (mserver5) score 1003 or sacrifice child Killed process 4134544 (mserver5), UID 0, total-vm:28382372kB, anon-rss:16776396kB, file-rss:13284kB, shmem-rss:0kB
GDK_mem_maxsize is not a hard limit. It's a value on which some decisions having to do with allocating memory are based.
This is actually what I was asking confirmation about. I'm not complaining of anything, but this is not really documented and I'm just trying to understand exactly which tools I have: is it a soft limit in the sense that it actually tries to stay more or less below this limit, or is it really only used to estimate other values?
The point is not to force some exact behaviour, but to apply some resource management *without* having the server killed, and that is a big problem.
On Tue, 9 Mar 2021 at 20:26, Sjoerd Mullender <sjoerd@monetdb.org <mailto:sjoerd@monetdb.org>> wrote:
On 09/03/2021 17.44, Roberto Cornacchia wrote: > Sjoerd, > > Thanks for these details. > > Let me focus on these two concepts: > > > Allocated address space may or may not reside in physical memory. > > The kernel decides that. > > Absolutely. > Still, you can decide what's the max that malloc() can use: > > > gdk_mem_maxsize is the maximum amount of address space we want to > > allocate using malloc and friends.
GDK_mem_maxsize is not a hard limit. It's a value on which some decisions having to do with allocating memory are based. GDK_vm_maxsize is a fairly hard limit. We may go over it in critical code (during transaction commit), but otherwise allocations (both malloc and mmap) will fail if you were to go over this limit.
> > Isn't malloc() using RSS + swap to back its allocations? Does that mean > that gdk_mem_maxsize should be a cap to what we want to be able to > allocate on RSS + swap? > In this case, actually, swap usage was 0. > > So I still don't understand why mallocs for 16g happened, > when gdk_mem_maxsize was 14g.
This 16g is a combination of what was allocated (and used) through malloc and through mmap. Forget about memory being a malloc thing. Mmap also uses memory.
Why swap is unused I don't know. As I said, it's the kernel that does that. We have nothing to do with that. It may be you have kernel parameters set that cause the kernel to not use swap.
By using the debugger you can check how much MonetDB thinks it has allocated. Look at the values of the variables GDK_mallocedbytes_estimate and GDK_vm_cursize. But again, that is allocated address space, not memory. And there may be fragmentation as well, so the amount of address space in use by the process may well be higher (of course these numbers don't take space for declared variables into account).
> > > On Tue, 9 Mar 2021 at 17:27, Sjoerd Mullender <sjoerd@monetdb.org <mailto:sjoerd@monetdb.org> > <mailto:sjoerd@monetdb.org <mailto:sjoerd@monetdb.org>>> wrote: > > We do not in any way control RSS (resident set size). That is fully > under control of the kernel. > > gdk_mem_maxsize is the maximum amount of address space we want to > allocate using malloc and friends. > gdk_vm_maxsize is the maximum amount of address space we want to > allocate (malloc + mmap). > Neither value has anything to do with how much actual, physical memory > is being used. They are just measures of how much address space is > used, allocated either through malloc or malloc+mmap. Allocated > address > space may or may not reside in physical memory. The kernel decides > that. > > Of course, if you're using the address space (however you got it), > there > must be physical memory to which the address space is mapped. > > The difference between malloc and mmap is mostly where the physical, > disk-spaced backing (if any) for the virtual memory is located, i.e. > where the kernel can copy the memory to if it needs space. In the case > of mmap (our use of it, anyway) it is files in the file system, and in > the case of malloc it is swap (if you have it) or physical memory (if > you don't). > > On 09/03/2021 16.58, Roberto Cornacchia wrote: > > Hi, > > > > I would appreciate some help interpreting the following > memory-related > > issues. > > > > I've got a mserver5 instance running with cgroups v1 constraints > > > > - memory.limit_in_bytes = 17179869184 (16g) > > - memory.memsw.limit_in_bytes = 9223372036854771712 > > > > gdk_mem_maxsize is initialised as 0.815 * 17179869184 = 14001593384. > > > > So I get: > > sql>select * from env() where name in ('gdk_mem_maxsize', > 'gdk_vm_maxsize'); > > +-----------------+---------------+ > > | name | value | > > +=================+===============+ > > | gdk_vm_maxsize | 4398046511104 | > > | gdk_mem_maxsize | 14001593384 | > > +-----------------+---------------+ > > > > That looks good. > > > > > > To my surprise, this instance gets frequently OOM-killed for > reaching > > 16g of RSS (no swap used): > > > > memory: usage 16777216kB, limit 16777216kB, failcnt 244063804 > > memory+swap: usage 16777964kB, limit 9007199254740988kB, failcnt 0 > > kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > > Memory cgroup out of memory: Kill process 975803 (mserver5) score > 1003 > > or sacrifice child > > > > Now, there are two different aspects: giving a process a memory > cap and > > making the process respect that cap without getting killed. > > > > - if the process allocates more than defined with cgroups, then > it gets > > killed. That is fine, it doesn't surprise me > > - the question is: why did monetDB surpass the 16g limit? > > > > Even more surprising, given that it "prudently" initialises > itself at > > 80% of the available memory. > > > > Perhaps I was under the wrong assumption that MonetDB would never > > allocate more than gdk_mem_maxsize, but now I seem to realise > that it > > simply uses this value to optimise its memory management (e.g. to > decide > > how early to mmap). > > > > So, am I correct that setting gdk_mem_maxsize (indirectly via > gcroups or > > directly via memmaxsize parameter) does not guarantee rss memory > will > > stay underthat value? > > > > If that is true, I am back at square 1 in my quest for how to cap > rss > > usage (without getting the process killed). > > > > Thanks for your help. > > Roberto > > > > _______________________________________________ > > users-list mailing list > > users-list@monetdb.org <mailto:users-list@monetdb.org> <mailto:users-list@monetdb.org <mailto:users-list@monetdb.org>> > > https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list> > <https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list>> > > > > -- > Sjoerd Mullender > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> <mailto:users-list@monetdb.org <mailto:users-list@monetdb.org>> > https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list> > <https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list>> > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org <mailto:users-list@monetdb.org> > https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list> >
-- Sjoerd Mullender _______________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> https://www.monetdb.org/mailman/listinfo/users-list <https://www.monetdb.org/mailman/listinfo/users-list>
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (2)
-
Roberto Cornacchia
-
Sjoerd Mullender