Contents
- Introduction
- Prerequisite
- Adapters
- Configuring ELK Stack
- Installing and Configuring openconfigbeat for EOS
- Configuration file for openconfigbeat
- Configuring TerminAttr and openconfigbeat daemon
- Setting up Kibana index pattern
- Using native OpenConfig CLI and gRPC transport
- Troubleshooting
- Example Configuration files
Introduction
The purpose of this document is to help you to set up an ELK (Elasticsearch/Logstash/Kibana) stack and stream EOS Telemetry states from an Arista Switch using openconfigbeat that can stream gRPC updates from OpenConfig or TerminAttr directly into Elasticsearch. Please note, that this app was written as a proof-of-concept and is supported on a best-effort basis. The projects can be forked and modified to suit your needs. Feedbacks are always welcome and issues can be filed like for any other projects on Github.
Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack. Logstash and Beats facilitate collecting, aggregating, and enriching your data and storing it in Elasticsearch. Kibana enables you to interactively explore, visualize, and share insights into your data and manage and monitor the stack.
Prerequisite
The following tools are required to proceed with this setup including cloning the repository and compiling openconfigbeat for EOS.
Adapters
Currently we have two adapters:
- openconfigbeat
- A Logstash configuration which utilises ockafka
Configuring ELK Stack
There are multiple ways to install the ELK stack. We will be using the quick and easy docker container installation in this guide.
Copy the following docker file to your system as save it as docker-compose.yml:
version: '2.2' services: elastic-primary: image: docker.elastic.co/elasticsearch/elasticsearch:7.6.2 container_name: elastic-primary environment: - node.name=elastic-primary - cluster.name=es-docker-cluster - discovery.seed_hosts=elastic-secondary,elastic-tertiary - cluster.initial_master_nodes=elastic-primary,elastic-secondary,elastic-tertiary - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms512m -Xmx512m" - xpack.license.self_generated.type=trial - xpack.security.enabled=true - xpack.monitoring.collection.enabled=true - http.cors.enabled=true - http.cors.allow-origin=* - "ELASTIC_PASSWORD=arastra" ulimits: memlock: {soft: -1, hard: -1} volumes: - data01:/usr/share/elasticsearch/data ports: - 9200:9200 - 9300:9300 networks: - elastic elastic-secondary: image: docker.elastic.co/elasticsearch/elasticsearch:7.6.2 container_name: elastic-secondary environment: - node.name=elastic-secondary - cluster.name=es-docker-cluster - discovery.seed_hosts=elastic-primary,elastic-tertiary - cluster.initial_master_nodes=elastic-primary,elastic-secondary,elastic-tertiary - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms512m -Xmx512m" - xpack.license.self_generated.type=trial - xpack.security.enabled=true - xpack.monitoring.collection.enabled=true - http.cors.enabled=true - http.cors.allow-origin=* - "ELASTIC_PASSWORD=arastra" ulimits: memlock: {soft: -1, hard: -1} volumes: - data02:/usr/share/elasticsearch/data ports: - 9201:9201 - 9301:9301 networks: - elastic elastic-tertiary: image: docker.elastic.co/elasticsearch/elasticsearch:7.6.2 container_name: elastic-tertiary environment: - node.name=elastic-tertiary - cluster.name=es-docker-cluster - discovery.seed_hosts=elastic-primary,elastic-secondary - cluster.initial_master_nodes=elastic-primary,elastic-secondary,elastic-tertiary - bootstrap.memory_lock=true - "ES_JAVA_OPTS=-Xms512m -Xmx512m" - xpack.license.self_generated.type=trial - xpack.security.enabled=true - xpack.monitoring.collection.enabled=true - http.cors.enabled=true - http.cors.allow-origin=* - "ELASTIC_PASSWORD=arastra" ulimits: memlock: {soft: -1, hard: -1} volumes: - data03:/usr/share/elasticsearch/data ports: - 9202:9202 - 9302:9302 networks: - elastic kibana: image: docker.elastic.co/kibana/kibana:7.6.2 container_name: kibana ports: - 5601:5601 environment: - ELASTICSEARCH_URL=http://elastic-primary:9200 - ELASTICSEARCH_HOSTS=http://elastic-primary:9200 - xpack.monitoring.ui.container.elasticsearch.enabled=true - ELASTICSEARCH_USERNAME=elastic - ELASTICSEARCH_PASSWORD=arastra networks: - elastic logstash: image: docker.elastic.co/logstash/logstash:7.6.2 container_name: logstash ports: - "5000:5000/tcp" - "5000:5000/udp" - "9600:9600" environment: - xpack.monitoring.elasticsearch.hosts=http://elastic-primary:9200 - xpack.monitoring.elasticsearch.url=http://elastic-primary:9200 - "LS_JAVA_OPTS=-Xmx256m -Xms256m" - xpack.monitoring.enabled=true - xpack.monitoring.elasticsearch.username=elastic - xpack.monitoring.elasticsearch.password=arastra networks: - elastic volumes: data01: driver: local data02: driver: local data03: driver: local networks: elastic: driver: bridge
NOTE: Elasticsearch uses a mmapfs directory by default to store its indices. The default operating system limits on mmap counts (256000) are likely to be too low, which may result in out of memory exceptions and the container won’t start.
On Linux, you can increase the limits by running the following command as root:
$ sysctl -w vm.max_map_count=262144
To set this value permanently, update the vm.max_map_count setting in /etc/sysctl.conf. To verify after rebooting, run sysctl vm.max_map_count.
Run the below commands to verify the config and bring up the three-node Elasticsearch cluster, Kibana and Logstash stack:
$ docker-compose config $ docker-compose up -d
Wait for the stack to startup (you can check the container status using docker ps command:
$ docker ps --format "{{.ID}} | {{.Names}} | {{.Status}} | {{.Ports}}" 3edc74e950de | elastic-primary | Up 2 days | 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp b9e1845a1d2f | elastic-secondary | Up 2 days | 9200/tcp, 0.0.0.0:9201->9201/tcp, 9300/tcp, 0.0.0.0:9301->9301/tcp bcb7677cde80 | elastic-tertiary | Up 2 days | 9200/tcp, 0.0.0.0:9202->9202/tcp, 9300/tcp, 0.0.0.0:9302->9302/tcp a36d788eb605 | logstash | Up 2 days | 0.0.0.0:5000->5000/tcp, 0.0.0.0:9600->9600/tcp, 0.0.0.0:5000->5000/udp, 5044/tcp 833d7c962441 | kibana | Up 2 days | 0.0.0.0:5601->5601/tcp
and then log in to Kibana UI on http://<SERVER-IP>:5601 using elastic/arastra credentials.
You can bring down the containers and volumes by using below commands:
$ docker-compose down -v
Installing and Configuring openconfigbeat for EOS
Pull the repository from GitHub (or you can also use git clone):
$ go get github.com/aristanetworks/openconfigbeat
Go to the openconfigbeat directory (or the directory to which you have cloned the repo using git clone)
$ cd $GOPATH/src/github.com/aristanetworks/openconfigbeat
Compile the package for EOS
$ GOOS=linux GOARCH=386 go build
NOTE: For EOS with x86_64 architecture, compile the package as follows:
$ GOOS=linux GOARCH=amd64 go build
Copy the binary file to switch /mnt/flash/ directory
$ scp $GOPATH/src/github.com/aristanetworks/openconfigbeat/openconfigbeat admin@<switch-MGMT-IP>:/mnt/flash/
Configuration file for openconfigbeat
openconfigbeat requires a YAML configuration file to specify the Sysdb/Smash/OpenConfig paths to subscribe and also to specify the elasticsearch and gRPC server endpoints. Following is a very simple configuration file (most paths have been removed for brevity):
# The name of the shipper that publishes the network data. It can be used to group # all the transactions sent by a single shipper in the web interface. # If this option is not defined, the hostname is used. name: spine openconfigbeat: # The addresses of the OpenConfig devices to connect to. addresses: ["10.85.128.117"] # The OpenConfig/eos_native(TerminAttr) paths to subscribe to. paths: - "/Kernel/proc/cpu/utilization" # The default port to connect to if none is configured. default_port: 6042 # The username on the switch. username: cvpadmin # The password for the user on the switch. password: arastra # Enable TLS. #tls: false output.elasticsearch: # Elasticsearch host to connect to, the default port is 9200 hosts: ["10.85.129.115"] # Optional protocol and basic auth credentials. protocol: http username: "elastic" password: "arastra" # Sets log level. The default log level is info. # Available log levels are: error, warning, info, debug logging.level: error # Enable debug output for selected components. To enable all selectors use ["*"] # Other available selectors are "beat", "publish", "service" # Multiple selectors can be chained. logging.selectors: ["*"]
For details about subscription paths and metric path structures please visit: https://eos.arista.com/understanding-subscription-paths-for-open-source-telemetry-streaming
openconfigbeat.yml file permissions
The openconfigbeat.yml file should have permissions set to 750 (rwxr-x—). Under /mnt/flash you won’t be able to change file permissions thus the configuration file can be moved to some other directory like /persist/sys/ and then change the permissions using chmod.
Configuring TerminAttr and openconfigbeat daemon
Default VRF without CVP
! daemon TerminAttr exec /usr/bin/TerminAttr -disableaaa -grpcaddr 0.0.0.0:6042 no shutdown ! daemon openconfigbeat exec /mnt/flash/openconfigbeat -e -c /persist/sys/openconfigbeat.yml !
Default VRF with CVP
! daemon TerminAttr exec /usr/bin/TerminAttr -ingestgrpcurl=<CVP-IP>:9910 -cvcompression=gzip -taillogs -ingestauth=key,magickey -smashexcludes=ale,flexCounter,hardware,kni,pulse,strata -ingestexclude=/Sysdb/cell/1/agent,/Sysdb/cell/2/agent -disableaaa -grpcaddr 0.0.0.0:6042 no shutdown ! daemon openconfigbeat exec /mnt/flash/openconfigbeat -e -c /persist/sys/openconfigbeat.yml !
VRF management without CVP
! daemon TerminAttr exec /usr/bin/TerminAttr -disableaaa -grpcaddr management/0.0.0.0:6042 no shutdown ! ! daemon openconfigbeat exec /sbin/ip netns exec ns-management /mnt/flash/openconfigbeat -e -c /persist/sys/openconfigbeat.yml no shutdown !
VRF management with CVP
! daemon TerminAttr exec /usr/bin/TerminAttr -ingestgrpcurl=<CVP-IP>:9910 -cvcompression=gzip -taillogs -ingestvrf=management -ingestauth=key,magickey -smashexcludes=ale,flexCounter,hardware,kni,pulse,strata -ingestexclude=/Sysdb/cell/1/agent,/Sysdb/cell/2/agent -disableaaa -grpcaddr management/0.0.0.0:6042 no shutdown ! daemon openconfigbeat exec /sbin/ip netns exec ns-management /mnt/flash/openconfigbeat -e -c /persist/sys/openconfigbeat.yml no shutdown !
VRF management without CVP and authentication
! daemon TerminAttr exec /usr/bin/TerminAttr -grpcaddr management/0.0.0.0:6042 no shutdown ! ! daemon openconfigbeat exec /sbin/ip netns exec ns-management /mnt/flash/openconfigbeat -e -c /persist/sys/openconfigbeat.yml no shutdown ! [admin@switch sys]$ cat openconfigbeat.yml name: spine openconfigbeat: addresses: ["10.85.128.117"] paths: - "/Kernel/proc/cpu/utilization" default_port: 6042 username: cvpadmin password: arastra #tls: false output.elasticsearch: hosts: ["10.85.129.115"] protocol: http username: "elastic" password: "arastra" logging.level: error logging.selectors: ["*"]
Setting up Kibana index pattern
Kibana uses index patterns to retrieve data from Elasticsearch indices for things like visualizations. Thus we need to first create an index pattern.
1. Login to Kibana UI and go to Management tab, and select Index Patterns under Kibana section
2. Next, define an index pattern. See the example below:
3. Configure Time Filter field name as @timestamp
4. Click on Create Index Pattern
5. Go to the Discover page select the created index pattern and click on the Refresh button:
6. Now you can build different Visualizations based on the data available:
Using native OpenConfig CLI and gRPC transport
Default VRF
! management api gnmi transport grpc default no shutdown port 6030 vrf default ! daemon openconfigbeat exec /mnt/flash/openconfigbeat -e -c /persist/sys/openconfigbeat.yml !
VRF management
! management api gnmi transport grpc default no shutdown port 6030 vrf management ! ! daemon openconfigbeat exec /sbin/ip netns exec ns-management /mnt/flash/openconfigbeat -e -c /persist/sys/openconfigbeat.yml no shutdown !
NOTE: In this case, we need to change the openconfigbeat.yml file to use the OpenConfig paths which are different from the paths when using TerminAttr(eos_native paths). For more information on OpenConfig paths on EOS 4.23 see link and link.
Also, the OpenConfig endpoint has to be changed accordingly:
name: spine openconfigbeat: addresses: ["localhost"] paths: - "components/component/cpu/utilization/state/" default_port: 6030 username: cvpadmin password: arastra #tls: false output.elasticsearch: hosts: ["10.85.129.115"] protocol: http username: "elastic" password: "arastra" logging.level: error logging.selectors: ["*"]
On Kibana the index pattern would remain unchanged, just you will see data on a different path:
Troubleshooting
The daemon/agent logs are stored in /var/log/agents directory on the Arista switch.
From the switch CLI you can check the logs using the below command:
# show agent openconfigbeat logs
Or from bash shell, you can use cat/more/less/vi/nano/tail
# bash cat /var/log/agents/<agent-log-file>
You can also increase the verbosity of the logs via the openconfigbeat.yml file:
$ sudo cat openconfigbeat.yml name: spine openconfigbeat: addresses: ["10.85.128.117"] paths: - "/Kernel/proc/cpu/utilization" default_port: 6042 username: cvpadmin password: arastra #tls: false output.elasticsearch: hosts: ["10.85.129.115"] protocol: http username: "elastic" password: "arastra" logging.level: debug #This can be set to: error, warning, info, debug logging.selectors: ["*"]
The Authentication failure message indicates that openconfigbeat is not able to connect to the gRPC server that TerminAttr is serving, either because the disableaaa flag is not specified in the TerminAttr config or the username and password strings are incorrect in the openconfigbeat.yml file. The same is applied when using native OpenConfig CLI.
===> /var/log/agents/openconfigbeat-15040 Thu May 14 09:23:23 2020 <=== ===== Output from /mnt/flash/openconfigbeat ['-e', '-c', '/persist/sys/openconfigbeat.yml'] (PID=15040) started May 14 09:23:23.524335 === 2020-05-14T09:23:23.714Z ERROR beater/openconfigbeat.go:159 error from 10.85.128.117: rpc error: code = Unauthenticated desc = Authentication failed
File permission error is seen when the openconfigbeat.yml file does not have permissions set to 750 (rwxr-x—). See the link for more details on this error.
===> /var/log/agents/openconfigbeat-15909 Thu May 14 09:38:27 2020 <=== ===== Output from /mnt/flash/openconfigbeat ['-e', '-c', '/persist/sys/openconfigbeat.yml'] (PID=15909) started May 14 09:38:27.330174 === Exiting: error loading config file: config file ("/persist/sys/openconfigbeat.yml") can only be writable by the owner but the permissions are "-rwxrwx---" (to fix the permissions use: 'chmod go-w /persist/sys/openconfigbeat.yml')
The connection refused error, which means that the gRPC server is not reachable. In this case, openconfigbeat is executed in the management VRF, however, TerminAttr is running in the default VRF and the gRPC server by default is running in the default VRF too (same is applicable to native OpenConfig gRPC server). To fix it you can configure the gRPC server in the correct VRF.
===> /var/log/agents/openconfigbeat-13946 Thu May 14 10:17:04 2020 <=== ===== Output from /sbin/ip ['netns', 'exec', 'ns-management', '/persist/sys/openconfigbeat', '-e', '-c', '/persist/sys/7060-beat.yml'] (PID=13946) started May 14 10:17:03.953754 === 2020-05-14T10:17:04.010Z ERROR beater/openconfigbeat.go:159 error from 10.83.13.133: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.83.13.133:6042: connect: connection refused"
The following error is seeing when openconfigbeat is not able to connect to the elasticsearch endpoint, check if the elasticsearch container is up and running, and if the Firewall is allowing the port 9200.
===> /var/log/agents/openconfigbeat-16552 Thu May 14 10:44:03 2020 <=== ===== Output from /sbin/ip ['netns', 'exec', 'ns-management', '/persist/sys/openconfigbeat', '-e', '-c', '/persist/sys/7060-beat.yml'] (PID=16552) started May 14 10:44:01.035441 === 2020-05-14T10:44:03.194Z ERROR pipeline/output.go:100 Failed to connect to backoff(elasticsearch(http://10.85.128.164:9200)): Get http://10.85.128.164:9200: dial tcp 10.85.128.164:9200: connect: connection refused
Example Configuration files
Detailed reference configuration file from GitHub.