• Common challenges with TAP aggregation

 
 
Print Friendly, PDF & Email

Introduction

Capturing raw network packet data, whether it be from a mirror port or through an aggregation infrastructure, is often perceived to be a complex task.

In reality, most of the anomalies or limitations faced by those starting out with capture have simple explanations and are usually not due to problems with the source devices but instead the capturing tool. This article provides a brief of commonly reported issues and some suggested avenues of investigation.

Timestamping

Timestamps missing or corrupt

  • Check timestamping is configured correctly to match the hosts’ expectations (i.e. is the host looking in the right place for the timestamp) – sometimes NICs or drivers remove the packet FCS which means software tools do not find the timestamp at the expected offset.

Timestamps appear on some packets but not others

  • Check the receiving NIC is not offloading TCP processing (LRO/GRO) which may lose the timestamp
  • Check that the data received is actually a timestamp and not some spurious data caused by mismatching offsets.

Packet Corruption/Ordering

Packets received out of order

  • Normally due to parallel consumption of data in multiprocessor systems – try locking your NIC to a single core.

Packets received without VLAN tag

  • VLAN tag may be stripped by NIC, driver or software stack

Packets received with VLAN tag but ID Tag missing

  • First .1q field is likely being removed by the capture device (see section “packets received without VLAN tag”)

Packets received out of order

  • Normally due to parallel consumption of data in multiprocessor systems – try locking your NIC to a single core.

Packets received without VLAN tag

  • VLAN tag may be stripped by NIC, driver or software stack

Packets received with VLAN tag but ID Tag missing

  • First .1q field is likely being removed by the capture device (see section “packets received without VLAN tag”)

My <access-list | policy> is not being correctly applied

  • Check spelling and case usage – ACL and policy names are case sensitive
  • Try mirroring ingress traffic to CPU and using TCPdump to validate the traffic expected to hit the filter is present in the data stream
  • Check policy order and be take care to follow the packet flow actions specified in the Traffic Steering guide

Performance and Physical Layer

Latency between mirror port and tool seems high

  • Ensure fiber lengths and transceivers are considered in any end to end calculation (a reasonable rule of thumb is 5-10ns per transceiver, 5ns per meter of fiber)
  • Queueing is an artefact of interface congestion – contended interfaces common in n:1 designs will introduce queuing which can be measured with LANZ.

TAP interfaces do not come up on the aggregator

  • An optical tap provides two TX outputs – one from each direction of fiber. Connecting these to two transceivers on the appropriate side (right hand receptacle is RX) will bring the link up with no special settings on the aggregator as long as the TAPped link is healthy/live.
  • Insertion of TAPs does reduce the available optical budget so it should be ensured that a strong enough signal is received at the aggregator. An instrument such as a light meter can be used to measure receive power.
  • Mixing fiber types (SMF and MMF) causes significant signal losses and should be avoided
  • Wideband optics with greater sensitivity can be used to recover weak signals – try using a 10G-LR optic to recover a weak 10G-SR signal for example.
  • Optical TAPs operate in a unidirectional manner – weaker than expected signal splits may indicate the TAP is cabled in reverse.
  • When using 1G optical interfaces connected using simplex fibers to an optical TAP, it is advisable to disable autonegotiation to ensure the interface comes up reliably
Switch(config)#conf t
Switch(config)#interface ethernet 1
Switch(config-if-Et1)#speed forced 1000full

Configuration

My port-channel (load balancing group) does not come up

  • In aggregation mode, the device control plane is inactive – this prevents the injection of unwanted packets into the monitoring flow.
  • When creating a port channel to load-balance across tools, be sure to use static aggregation (channel-group <#> mode on) rather than LACP (mode active)

I can see LLDP neighbors on Tap ports but not on Tool ports

  • In aggregation mode, the device is intentionally silent – it does not generate its own protocol packets – to prevent unwanted packets in the monitoring flow.
  • On ingress however, LLDPDUs are “snooped” – copied to the control plane – to enable easy location of downstream devices.
Follow

Get every new post on this blog delivered to your Inbox.

Join other followers: