Posted on December 22, 2014 12:37 pm
 |  Asked by Florian
 |  3810 views
RESOLVED
0
0
Print Friendly, PDF & Email

Hi,

we’re running a network with 3 racks, in every rack are 2 switches that are connected via mlag and then we created a ring between the 3 racks (yes, that’s not optimal, and we’ll move away from the ring as soon as we have the 4th rack)

 

So it looks like this:

Rack1#1 <-MLAG-> Rack1#2 <-Fiber-> Rack2#1 <-MLAG-> Rack2#2 <-Fiber-> Rack3#1 <-MLAG-> Rack3#2 <-Fiber back to-> Rack1#1

Switches are 7048T on 4.12.4

One day I ran a tcpdump on a machine connected to rack1#1 (our machines are always connected to both switches in a rack, but active-passive) and saw ‘normal’ (not multicast or so) tcp traffic between two machines that are connected to rack3#1 and rack3#2  (but only traffic from machineX to machineY, didn’t saw traffic from Y to X)

After clearing the MAC table with “clear mac address-table dynamic” everything went back to normal.

So, now a couple of questions

1) What was going wrong there? I was kinda lost where I could look & how to debug this issue. Can I analyze what the switch is doing with a packet? If it’s broadcasting the packages, and if yes, why it does that? =)

2) in the MAC address-table I see an entry with 0000.0000.0000 as mac address and a high amount of moves, do I need to worry about this? Normally this means that the address isn’t known yet by the switch, doesn’t it?  In a stable network, should that entry show up at all? For me it’s “hopping”  a lot on various interfaces

3) Could that be a firmware issue? I realized that we’re not running the minimum recommended version…

Thanks,

Florian

 

 

0
Posted by Andrei Dvornic
Answered on December 22, 2014 2:17 pm

Hi Florian,

A few questions:

  • Can you consistently reproduce this issue?
  • Is it also reproducible using the recommended min. EOS version?
  • Any chance you can attach the configs on your devices? How about the mac address tables at the time of the ’failure’?

Thanks,
Andrei

0
Posted by John Gill
Answered on December 22, 2014 6:32 pm

If you see traffic in one direction, this is usually when routing between VLANs/L3 interfaces, and the gateway for hosts in either of these interfaces are not located on the same router.  The MAC address times out on a port and then the switch will determine it should flood whenever an address is not found.

A MAC address may have timed out (300 seconds) while the ARP entry has not, and routing between networks on different routers will not require that the L2 connectivity is present all the way between these hosts.  The fix is ultimately to match ARP and MAC address table timers. If you don’t have a large L2 domain, I would say bump up the L2 — ”mac address-table aging-time 14400”

Now, 0000.0000.0000 sounds suspicious and I would think the traffic you saw was not actually destined for this address.  This is likely a separate issue.  Any mac address flapping should be looked into.  If it is the only address showing a lot of moves (and maybe even showing up as flapping in ”show log all”), then I would try to find the ports where it lives and look at that machine as it is likely misconfigured.

If you show many addresses with similar moves, I would start to look at the stability of your spanning-tree.  One of the first things to look at when going down this path is: sh spanning-tree detail | egrep ’Port|changes’

Frequent topology change notifications are an indication of a problem in the STP config, flaky link, or other issue receiving traffic from the port to the control-plane.

Regards,

John

Post your Answer

You must be logged in to post an answer.