Skip to main content
Skip table of contents

Out of Memory Killer

In some cases, the operating system can decide to drop processes in order to keep the machine running. This process is called OOM killer (Out-Of-Memory Killer) and is invoked when all other RAM recovery processes have failed.

OOM-Killer is called when the swap is completely filled and the OS has run out of enough memory to allocate to running processes.

In order to determine which processes to kill, the OOM-Killer applies a ranking heuristic. All processes are ranked using a oom_score_adj score between -1000 (never be killed) and +1000 (to be absolutely killed).

The process kill is brutal and does not allow sacrificed process to write a goodbye message in their own log.

Such behaviors occur in Indexima when configuration parameters violate the 20% memory rule.

Stand Alone INDEXIMA CLUSTER

In Standalone configuration when nodes are dedicated to Indexima, the Linux-based free command allows you to estimate the maximum memory to allocate to the Indexima Java Heap:

The free command

BASH
# free -h
              total        used        free      shared  buff/cache   available
Mem:            31G        139M         30G        241M        1.2G         30G
Swap:          1.0G          0B        1.0G

In this example, the maximum available RAM is 31GB.

The parameter GALACTICA_MEM  must be set to the maximum available RAM minus 20% of available memory.

This parameter is defined in conf/galactica.conf file.

BASH
# Java heap size for INDEXIMA data engine
export GALACTICA_MEM=25000m

Hadoop / Yarn configuration

The parameter yarn.memory defines the maximum available RAM you can allocate to an Indexima Hadoop Container.

This parameter is located in conf/galactica.conf and the default value is 1024. 

yarn.memory value must be greater than java heap + 20%.

BASH
yarn.memory = GALACTICA_MEM + 20%

Snippet of /var/log/messages in centos7 when an om-killer occurred.

Search for oom-killer pattern

BASH
15:15:05 ip-38-75 kernel: [1380440.636612] java invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
15:15:05 ip-38-75 kernel: [1380440.646149] java cpuset=/ mems_allowed=0
15:15:05 ip-38-75 kernel: [1380440.649393] CPU: 0 PID: 6273 Comm: java Not tainted 4.14.62-65.117.amzn1.x86_64 #1
15:15:05 ip-38-75 kernel: [1380440.655182] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
15:15:05 ip-38-75 kernel: [1380440.660166] Call Trace:
15:15:05 ip-38-75 kernel: [1380440.662481]  dump_stack+0x5c/0x82
15:15:05 ip-38-75 kernel: [1380440.665407]  dump_header+0x94/0x21c
15:15:05 ip-38-75 kernel: [1380440.675890]  oom_kill_process+0x213/0x410
15:15:05 ip-38-75 kernel: [1380440.679277]  out_of_memory+0x296/0x4c0
15:15:05 ip-38-75 kernel: [1380440.690514]  filemap_fault+0x1e3/0x5f0
15:15:05 ip-38-75 kernel: [1380440.703103]  __do_fault+0x20/0x60
15:15:05 ip-38-75 kernel: [1380440.706465]  __handle_mm_fault+0xcd2/0x13f0

15:15:05 ip-38-75 kernel: [1380440.724966] RIP: 5ed07690:0x3e8
15:15:05 ip-38-75 kernel: [1380440.728230] RSP: 59f3c800:00007f4831bb56d0 EFLAGS: 7f4831bb5660
15:15:05 ip-38-75 kernel: [1380440.728271] Mem-Info:
15:15:05 ip-38-75 kernel: [1380440.736040] active_anon:4033458 inactive_anon:13 isolated_anon:0
15:15:05 ip-38-75 kernel: [1380440.736040]  active_file:561 inactive_file:3996 isolated_file:382
15:15:05 ip-38-75 kernel: [1380440.736040]  unevictable:0 dirty:5 writeback:0 unstable:0
15:15:05 ip-38-75 kernel: [1380440.736040]  slab_reclaimable:5366 slab_unreclaimable:6806
15:15:05 ip-38-75 kernel: [1380440.736040]  mapped:572 shmem:19 pagetables:9400 bounce:0
15:15:05 ip-38-75 kernel: [1380440.736040]  free:32493 free_pcp:62 free_cma:0
15:15:05 ip-38-75 kernel: [1380440.802358] lowmem_reserve[]: 0 3720 16005 16005
15:15:05 ip-38-75 kernel: [1380440.832894] Node 0 Normal free:49936kB min:50188kB low:62764kB high:75340kB active_anon:12383276kB inactive_anon:52kB active_file:260kB inactive_file:12756kB unevictable:0kB writepending:12kB present:12845056kB managed:12579648kB mlocked:0kB kernel_stack:8048kB pagetables:30404kB bounce:0kB free_pcp:8kB local_pcp:8kB free_cma:0kB
15:15:05 ip-38-75 kernel: [1380440.872715] Node 0 DMA32: 296*4kB (UME) 215*8kB (UME) 212*16kB (UME) 70*32kB (UME) 50*64kB (UME) 27*128kB (UME) 12*256kB (UME) 4*512kB (UE) 2*1024kB (ME) 1*2048kB (M) 10*4096kB (UME) = 65368kB
15:15:05 ip-38-75 kernel: [1380440.900634] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
15:15:05 ip-38-75 kernel: [1380440.915330] 5183 total pagecache pages
15:15:05 ip-38-75 kernel: [1380440.919079] 0 pages in swap cache
15:15:05 ip-38-75 kernel: [1380440.927072] Free swap  = 0kB
15:15:05 ip-38-75 kernel: [1380440.930156] Total swap = 0kB
15:15:05 ip-38-75 kernel: [1380440.933397] 4194205 pages RAM
15:15:05 ip-38-75 kernel: [1380440.936526] 0 pages HighMem/MovableOnly
15:15:05 ip-38-75 kernel: [1380440.940323] 87612 pages reserved

In this example, no more RAM is available.

List all processes to determine oom_score_adj

BASH
15:15:05 ip-38-75 kernel: [1380440.943466] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
15:15:05 ip-38-75 kernel: [1380440.950603] [ 1797]     0  1797     2869      225      12       3        0         -1000 udevd
15:15:05 ip-38-75 kernel: [1380440.995381] [ 2568]    32  2568     8841      101      22       4        0             0 rpcbind
15:15:05 ip-38-75 kernel: [1380441.122893] [ 4802]     0  4802  1077373    41765     179       8        0             0 java

The faulty process is found and is immediately killed.

BASH
15:15:05 ip-38-75 kernel: [1380441.293974] Out of memory: Kill process 2965 (java) score 854 or sacrifice child
15:15:05 ip-38-75 kernel: [1380441.300828] Killed process 2965 (java) total-vm:16098684kB, anon-rss:14004696kB, file-rss:0kB, shmem-rss:0kB
15:15:05 ip-38-75 dhclient[4141]: bound to 172.31.38.75 -- renewal in 1461 seconds.
15:15:06 ip-38-75 kernel: [1380442.025206] oom_reaper: reaped process 2965 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.