This document covers the use of the Network Time Protocol (NTP) to synchronize the system clocks on Arista switches. While each switch does have a local clock which can keep time without NTP, each device’s clock will slowly drift out of sync, causing issues including incorrect timestamps on event logs, which can make it difficult to correlate events between devices on the network, an inability to correctly verify the validity of cryptographic certificates for protocols such as TLS or DNSSEC, etc.
EOS comes with support to act as both an NTP client and an NTP server; this document will only cover using the NTP client in EOS to synchronize to other NTP servers, and how to monitor the status of the local NTP client.
A list of NTP servers can be configured either by hostname, IPv4 address, or IPv6 address:
SWITCH1(config)#ntp server 0.pool.ntp.org SWITCH1(config)#ntp server 1.pool.ntp.org SWITCH1(config)#ntp server 2.pool.ntp.org SWITCH1(config)#ntp server 192.0.2.123 SWITCH1(config)#ntp server 2001:db8::123
EOS then starts the ntpd daemon, which uses NTP to query each configured server to estimate how far off the local system clock is relative to the given servers. If ntpd becomes confident that there is a large offset between local time and that received from the NTP servers, ntpd will step the local system clock once to bring it in line. Once that is performed, or if there isn’t a large offset, ntpd will monitor the local system’s clock offset, and run the system clock slightly fast or slightly slow to gradually bring it into alignment with the NTP servers while avoiding causing any gaps or overlaps in system time.
NTP Conceptual Overview
The Network Time Protocol synchronizes time based off of what it calls “stratum 0” time sources, which are not hosts speaking NTP, but physical reference clocks such as GPS receivers or atomic clocks. These can be fed into a network host to synchronize its system clock, which causes it to be considered a “stratum 1” time source, since its time is based off a stratum 0 time source.
Other network hosts can then use NTP to synchronize their clocks to this stratum 1 server, and indicate that they’re two hops away from the original reference clock by labeling themselves as “stratum 2” NTP servers. This redistribution of clock synchronization and incrementing of the stratum count may continue up to stratum 15, but it is unusual to see NTP servers with stratums higher than 4-5 in practice.
When an NTP server just starts and hasn’t successfully synced with another time source yet, or believes that it lost its paths back to a physical reference clock, it labels itself as being stratum 16, to inform other hosts that it can not be an authoritative source of time.
When NTP is not yet configured, EOS will display the following:
SWITCH1#show ntp status NTP is disabled.
Once NTP is configured, it will take a few minutes for it to move into a synchronized state as it takes measurements against the configured NTP servers. Until then, it will display:
SWITCH1#show ntp status unsynchronised polling server every 8 s
Once NTP has selected a best server and synchronized to it, it will gradually back off its polling interval; starting with a 64 second interval, and eventually growing the interval to 1024 seconds as it successfully stabilizes the local clock.
SWITCH1#show ntp status synchronised to NTP server (192.0.2.123) at stratum 3 time correct to within 45 ms polling server every 1024 s
To get a more detailed view into the status of each individual configured NTP source, you can run “show ntp associations”:
SWITCH1#show ntp associations remote refid st t when poll reach delay offset jitter =============================================================================== *ntp1.example.com 192.0.2.123 3 u 56 64 377 0.268 -0.022 0.075 +ntp2.example.com 192.0.2.123 3 u 54 64 377 0.196 -0.123 0.053 +ntp3.example.com 192.0.2.155 3 u 55 64 377 0.29 -0.111 0.048
This table gives quite a bit of detail per NTP server, based on the “ntpq” utility (which can provide this same table from the bash prompt or on other non-EOS systems):
- The first character is a flag indicating how the local NTP client has classified each server. The most common flags are:
- * (asterisk) – The selected system peer which the local time is based on.
- + (plus) – This server seems to be reporting good time and is a candidate for synchronizing the local time.
- ‘ ‘ (space) – This server has been rejected as being unreachable, being synchronized to the local system, or has too high of a stratum to be considered.
- – (minus) – This server has been rejected as an outlier relative to the other configured time sources.
- remote: This is the hostname or IP address of the configured NTP server. If you would prefer to see these with the raw IP addresses instead of the hostname, you can run ntpq directly with “bash ntpq -pn” to include the -n flag to force ntpq to output numeric IP addresses instead of canonical hostnames.
- refid: The reference ID is an identifier of the NTP server which this NTP server is synced to (so two hops away for the current device). You can see in this example that ntp1.example.com and ntp2.example.com are both synchronized to the same server, 192.0.2.123.
- st: This column indicates the stratum of each server, so you can see in this example that all three servers are stratum 3 servers.
- t: The type of most NTP servers will be u for unicast. Time sources can also be broadcast (b) or local (l).
- when: This indicates how many seconds ago the server was contacted for time. This value resets to 0 when a server is successfully polled for its current time.
- poll: This is the current interval for how often this server is contacted for the time in seconds. By default, when NTP starts it polls servers every 64 seconds, and then slowly (over the span of a few hours) increases that interval by powers of 2 until the polling interval reaches 1024 seconds when everything stabilizes. You may change the lower and upper bounds on the polling interval using the minpoll and maxpoll parameters when configuring an NTP server in global configuration mode.
- reach: This value is an octal (base eight) representation of the last eight times the local NTP client has tried to poll this server and if it successfully got a response or not. When freshly started, the first successful poll will set this value to 1 (b00000001), which will then change to 3 (b00000011), 7 (b00000111), 17 (b00001111), 37, 77, 177, and finally 377 (b11111111) to indicate that the server has been reachable all eight of the previous polls. If you start seeing values other than 377 on a long-running client, this may indicate an issue with the specific server, or packet loss between the client and the server.
- delay: This is a measure of how long it took a poll to reach the remote server and come back, in milliseconds. Smaller numbers are better since it indicates that the server is closer to the client. In the example output above, all three servers are on the same subnet as the client, so the delays of 0.2ms are particularly low.
- offset: This is an estimate of how far off the remote server’s clock is from the local system clock in milliseconds. This value should ideally trend towards zero as NTP continues to calibrate the local clock.
- jitter: Jitter is a measure of the difference in round-trip time to the remote server between measurements. High jitter usually indicates congestion on the network between the NTP client and server, which makes the round time time unpredictable and sets a lower bound for how accurately the NTP client can measure the offset between the local and remote clocks. This metric can be improved by using closer NTP servers on the network, or applying Quality of Service policies which give NTP traffic priority to reduce jitter.
For even deeper diagnostics of the local ntpd daemon, the ntpd process logs are available in /var/log/messages, which can be viewed by using bash commands as root to search for “ntpd”:
SWITCH1#bash sudo grep ntpd /var/log/messages [... SNIP ...] Feb 24 19:14:23 SWITCH1 ntpd: ntpd 4.2.6p3-RC10@1.2239-o Mon Sep 23 18:31:05 UTC 2019 (1) Feb 24 19:14:23 SWITCH1 ntpd: proto: precision = 0.140 usec Feb 24 19:14:23 SWITCH1 ntpd: 0.0.0.0 c01d 0d kern kernel time sync enabled Feb 24 19:14:23 SWITCH1 ntpd: ntp_io: estimated max descriptors: 1048576, initial socket boundary: 16 Feb 24 19:14:23 SWITCH1 ntpd: Listen normally on 0 v4wildcard 0.0.0.0 UDP 123 Feb 24 19:14:23 SWITCH1 ntpd: Listen normally on 1 v6wildcard :: UDP 123 Feb 24 19:14:23 SWITCH1 ntpd: Listen normally on 2 lo 127.0.0.1 UDP 123 Feb 24 19:14:23 SWITCH1 ntpd: Listen normally on 7 lo ::1 UDP 123 Feb 24 19:14:23 SWITCH1 ntpd: Listening on routing socket on fd #20 for interface updates Feb 24 19:14:23 SWITCH1 ntpd: 0.0.0.0 c016 06 restart Feb 24 19:14:23 SWITCH1 ntpd: 0.0.0.0 c012 02 freq_set kernel 4.190 PPM Feb 24 19:18:43 SWITCH1 ntpd: 0.0.0.0 c615 05 clock_sync
Selecting NTP Servers
“A man with a watch knows what time it is. A man with two watches is never sure” – Segal’s Law
“An NTP client with three servers can calculate a confidence interval.” – Triple Modular Redundancy
It is generally recommended that when time and timestamps are critical to a businesses network or application, to not depend on public NTP servers. Public NTP servers like those provided by the pool.ntp.org project are usually run by volunteers on a best-effort basis, so not only can you not rely on those servers always being available, there is an expectation of not placing an unreasonable load on the public pool of servers.
The best option is to build an entirely local NTP infrastructure which is not dependent on any public NTP providers. There are available off-the-shelf stratum 1 NTP appliances which can use primary time sources such as GPS receivers to provide a local stratum 1 NTP service. The scale of the network should guide how much NTP infrastructure is justified, be it a single stratum 1 appliance per site or multiple, and whether all the end devices are configured to query the stratum 1 time servers directly, or if a layer of stratum 2 NTP servers should be used to scale out capacity and reduce the query load on the often under-powered stratum 1 appliances.
If the number of local NTP clients or the importance of timestamps does not justify the expense of deploying stratum 1 time servers, it is still recommended that local NTP servers be used. This has two operational advantages over pointing each network device at random public NTP servers:
- By pointing all of the local network devices at a fixed pool of local servers, even if parts of the public NTP service become unavailable or misbehave, there will still be good local agreement between devices on what time it is, which ensures that it is still possible to correlate events between devices based on their local time stamps.
- By only having a small number of local NTP servers query the public NTP pool, the impact of the local network on the public servers is limited, regardless of how many network devices locally rely on this NTP service.
Related Information: https://www.arista.com/en/um-eos/eos-section-6-2-system-clock-and-time-protocols