Posted on April 11, 2019 2:00 am
 |  Asked by Aftab Siddiqui
 |  127 views
Tags:
0
0
Print Friendly, PDF & Email

All of a sudden this switch is showing very high memory usage.
———–

top – 10:21:04 up 3:29, 2 users, load average: 0.21, 0.36, 0.32
Tasks: 294 total, 1 running, 293 sleeping, 0 stopped, 0 zombie
%Cpu(s): 10.3 us, 2.2 sy, 0.0 ni, 87.0 id, 0.0 wa, 0.2 hi, 0.2 si, 0.0 st
KiB Mem: 3820256 total, 3642120 used, 178136 free, 208096 buffers
KiB Swap: 0 total, 0 used, 0 free, 2298888 cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1731 root 20 0 705m 380m 281m S 0.3 10.2 3:09.68 ConfigAgent
1675 root 20 0 589m 296m 227m S 0.3 7.9 1:20.28 Sysdb
2630 root 20 0 703m 262m 177m S 4.6 7.0 11:21.25 Strata
2599 root 20 0 590m 219m 165m S 0.0 5.9 0:09.23 StrataL3
3171 root 20 0 558m 199m 166m S 18.9 5.4 0:52.38 Snmp
2603 root 20 0 533m 165m 142m S 0.7 4.4 0:34.62 StrataL2
1738 root 20 0 549m 158m 127m S 0.0 4.3 0:08.37 Fru
2556 root 20 0 546m 157m 106m S 0.0 4.2 0:05.93 StrataVlanTopo
3198 root 20 0 233m 156m 140m S 1.3 4.2 1:37.58 Bgp-main
1745 root 20 0 534m 151m 121m S 0.0 4.1 0:16.41 Launcher
2279 root 20 0 536m 150m 121m S 1.0 4.0 1:57.91 Mlag
2043 root 20 0 620m 142m 108m S 0.3 3.8 1:18.14 SuperServer
2561 root 20 0 239m 139m 121m S 0.0 3.7 0:36.93 Ospf
2649 root 20 0 519m 138m 116m S 0.0 3.7 0:02.75 StrataLag
2617 root 20 0 529m 138m 116m S 0.0 3.7 0:04.97 Arp
3249 root 20 0 537m 134m 107m S 0.7 3.6 1:51.67 IgmpSnooping
2565 root 20 0 510m 133m 112m S 0.0 3.6 0:05.01 Lag

——
Software image version: 4.20.5F
Architecture: i386
Internal build version: 4.20.5F-8127914.4205F

Uptime: 10 hours and 38 minutes
Total memory: 3820256 kB
Free memory: 2510676 kB

0
Posted by Adam Levin
Answered on April 11, 2019 2:11 am

When you say “all of a sudden” what exactly do you mean? Have you seen this switch using less memory for a long time, and then suddenly the state of the memory usage changed, or do you mean that you just noticed that it appears to be using a lot of memory?

In fact the system is not using much memory. Notice that your “show version” output shows 2.5MB free out of 4. The top output is showing different numbers because of the way UNIX uses memory. The UNIX kernel will maximize memory usage because memory is expensive. If the user processes on the system don’t need the memory, the kernel will fill up the memory with buffers and file caching. Notice the two numbers at right side of the memory output in top: 208,096 buffers and 2,298,888 cached. That’s 2.5MB of memory used by the kernel. If a user process needs the memory the kernel will release it and allocate it to the user processes. The output from “show version” is taking that into account and only showing you the actual user process memory being consumed.

In other words, this is not a problem, this system is operating normally.

2
Answered on April 11, 2019 9:11 am

I would suggest reading the following EOS Central articles which explains more details regarding memory handling on Linux/EOS.

https://eos.arista.com/introduction-to-managing-eos-devices-memory-utilisation/

Hope this helps.

0
Posted by Aftab Siddiqui
Answered on April 11, 2019 3:44 pm

Yes, it was using less memory around 750xxx to 700xx range normally and the behavior is consistent across the network. I have 2 switches at every pop (4 pops). 2 weeks ago I created MLAG with TOR switches nothing major has changed since then only added some VXLAN configs (these are all vxlan evpn nodes). 2 switches stopped forwarding this morning and had to be rebooted, after the reboot the memory starts decreasing gradually. Currently its 162448 and with this rate it may crash again. Anything related to Security Advisory 0037?
https://www.arista.com/en/support/advisories-notices/security-advisories/5782-security-advisory-37

After a reboot the buffers and cache would be cleared, so for several hours/days after reboot the memory will be used by the kernel when it’s not needed by the agents. So again, that sounds like normal behavior, but certainly if the agents are continuing to use up memory that isn’t getting released that could be a more serious issue.

I suggest you open a case with TAC so they can investigate the logs and determine the cause of why the switch stopped forwarding.

(Adam Levin at April 11, 2019 8:12 pm)
1
Answered on April 11, 2019 8:14 pm

Hi Aftab. Sorry to hear about the issues you have experienced. I would suggest to open a support ticket with Arista TAC (support@arista.com) in order to collect additional information and investigate this further.

Post your Answer

You must be logged in to post an answer.