ROS Resources: Documentation | Support | Discussion Forum | Index | Service Status | ros @ Robotics Stack Exchange
Ask Your Question
0

Why rttest allocates 8GB of memory?

asked 2019-11-07 05:28:37 -0600

r7vme gravatar image

updated 2019-11-07 06:13:24 -0600

Hello, i'm going thru rttest ulitity and was wondering why "lock_and_prefault_dynamic()" always locks 8GB of heap memory?

Steps to reproduce: 1. Build as described in README.md 2. Set unlimited memlock user - memlock -1 in /etc/security/limits.conf, relogin 3. Run ./example_loop -i 10000 4. In separate terminal check memory usage by top (RES column)

8GB also mentioned in Real-Time-Programming

From what i understand lock_and_prefault_dynamic() sets allocation options (no trim, no mmap) and then allocates memory in 64*4k chunks until allocation hits NO page faults. The question is why it's always 8GB?

I'm testing on NVIDIA Jetson AGX Xavier which has 16GB ram.

EDIT:

also checked on google cloud VM with 32GB of total ram (no swap). Getting same number 8GB

My Xavier ulimit -a

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 62481
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 81920
cpu time               (seconds, -t) unlimited
max user processes              (-u) 62481
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
0

answered 2019-11-22 15:28:28 -0600

r7vme gravatar image

Thanks to @sgermanserrano for pointing out to his workaround on how to decrease memory usage this gave me a first clue on what's going on.

What happens in lock_and_prefault_dynamic function?

On every iteration 262144 bytes (64 * page_size) allocated on the heap memory by new char array. Also there is a std::vector which holds pointers to allocated arrays. So on every iteration this vector increased by 8 bytes (size of 64 bit pointer). As soon as vector needs to be resized, it reallocated 2x new heap memory, copies data there and deallocates old memory (see details here).

This means that heap memory fragmented with chunks of free memory from vector reallocations. And as soon as there will be available free chunk of size 262144 bytes it will be used for new char array and no minor page fault will happen, which will meet while loop condition.

Why 8GB?

In order to reach size of 262144 bytes vector needs 262144 / 8 = 32768 elements, which is 32768 iterations, which is 32768 of char arrays of size 64 * 4 * 1024 totals into 8589934592 bytes (8GB). Plus memory from reallocated vector data, which is not so significant (actually two char arrays - 512KB).

Code

I did a small patch with debug logging, which shows the following

DEBUG: Memory chunk required 262144
DEBUG: Heap memory chunk of size 8 will be available between 0x55599f4fe0 - 0x55599f4fe0.
DEBUG: Heap memory chunk of size 16 will be available between 0x5559a7b220 - 0x5559a7b228.
DEBUG: Heap memory chunk of size 32 will be available between 0x5559abb250 - 0x5559abb268.
DEBUG: Heap memory chunk of size 64 will be available between 0x5559b3b2a0 - 0x5559b3b2d8.
DEBUG: Heap memory chunk of size 128 will be available between 0x5559c3b330 - 0x5559c3b3a8.
DEBUG: Heap memory chunk of size 256 will be available between 0x5559e3b440 - 0x5559e3b538.
DEBUG: Heap memory chunk of size 512 will be available between 0x555a23b650 - 0x555a23b848.
DEBUG: Heap memory chunk of size 1024 will be available between 0x555aa3ba60 - 0x555aa3be58.
DEBUG: Heap memory chunk of size 2048 will be available between 0x555ba3c270 - 0x555ba3ca68.
DEBUG: Heap memory chunk of size 4096 will be available between 0x555da3d280 - 0x555da3e278.
DEBUG: Heap memory chunk of size 8192 will be available between 0x5561a3f290 - 0x5561a41288.
DEBUG: Heap memory chunk of size 16384 will be available between 0x5569a432a0 - 0x5569a47298.
DEBUG: Heap memory chunk of size 32768 will be available between 0x5579a4b2b0 - 0x5579a532a8.
DEBUG: Heap memory chunk of size 65536 will be available between 0x5599a5b2c0 - 0x5599a6b2b8.
DEBUG: Heap memory chunk of size 131072 will be available between 0x55d9a7b2d0 - 0x55d9a9b2c8.
DEBUG: Heap memory chunk of size 262144 will be available between 0x5659abb2e0 - 0x5659afb2d8.
DEBUG: New char array address 0x5659abb2e0 is lower the old one 0x5759b7b2f0.
DEBUG: This means it was created in already allocated memory.

Patch

diff --git a/rttest/src/rttest.cpp b/rttest/src/rttest.cpp
index cb21d13..6d78bfb 100644
--- a/rttest/src/rttest.cpp
+++ b/rttest/src/rttest.cpp
@@ -676,6 +676,8 @@ int Rttest::lock_and_prefault_dynamic()
   size_t prev_majflts = usage.ru_majflt;
   size_t encountered_minflts = 1;
   size_t encountered_majflts = 1;
+
+  printf("DEBUG: Memory chunk required %zu\n", (64 ...
(more)
edit flag offensive delete link more

Comments

r7vme gravatar image r7vme  ( 2019-11-22 15:47:35 -0600 )edit
0

answered 2019-11-22 08:16:19 -0600

sgermanserrano gravatar image

updated 2019-11-22 08:17:44 -0600

@r7vme I have done a little bit of research on this 96boards blog post. My results, which could be wrong, show that the default rttest library generate +2M pagefaults x 4Kb ~= 8Gb that get allocated, so I needed to modify the library to be able to run the pendulum demo in the Hikey970

edit flag offensive delete link more

Comments

i like your workaround, but i believe reason in std::vector (as i just found out). I see that page faults are stopped as soon as vector resized to 65536 (twice from 32768, which is 8GB with 64 pages per pointer), . But if i preallocate vector to 65536, program gets killed by OOM killer.

Just add "prefaulter.reserve(65536);" after declaring vector and program will try to eat all memory. This blows my mind, because vector stored on stack not on the hype. :thinking:

r7vme gravatar image r7vme  ( 2019-11-22 10:31:07 -0600 )edit

Ah, wait std::vector allocates memory on the heap. So as soon as big allocation happens (+32k * 4bytes size of pointer) this some how interferes with memset. This is proably some specific memory model behavior...

r7vme gravatar image r7vme  ( 2019-11-22 10:40:52 -0600 )edit

I think i've got the trick.

std::vector copies data to new memory, when it resizes itself, so in the heap there are some free memory chunks.

The sauce is: as soon as free memory chunk of 262144 bytes (64 * 4k) becomes available no page fault is happening. And this happens exactly, when vector reaches of size 262144 bytes (65536 elems * 4 bytes each). One thing is not clear tho, if it's resized to 262144 bytes, then free memory chunk should be 131072 only.

Anyway this approach is crazy and smart at the same time. Would love author to comment here.

r7vme gravatar image r7vme  ( 2019-11-22 13:21:19 -0600 )edit

gotcha pointer size is 8 bytes (not 4), now all is clear

r7vme gravatar image r7vme  ( 2019-11-22 14:14:24 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2019-11-07 05:28:37 -0600

Seen: 543 times

Last updated: Nov 22 '19