Posted on July 10, 2020 2:00 pm
 |  Asked by Martin Wolf
Print Friendly, PDF & Email


I was wondering, when it is recommended to use BFD echo.

For example, does it make sense to apply it on underlay and/or overlay BGP.

And Arista CLI offers another bfd feature known as ‘ bfd per-link rfc-7130’, which seems to be applied between two MLAG peers on its port channel.

Arista deployment does also mention Link Fault Signalling and I was wondering to what extend this can be seen as alternative.

I would appreciate any guidance/advice on this matter.

Thanks, Martin

Answered on January 5, 2021 6:07 pm

Hi Martin,

The short answer is that it depends on what you are trying to achieve.

Normally you'd consider BFD in order to get a higher level protocol e.g. BGP to detect failures faster than its native timers would allow.

I would generally not bother with BFD on the underlay BGP sessions because, if the link goes down, EOS will immediately take down the BGP session, rather then waiting for the protocol timers to expire. While this won't detect a failure where the physical link does not go down e.g. the peer side in some weird state where it is just blackholing everything, for example, but that is where you need to decide just what you are aiming to protect against and how likely the event is.

For the overlay e.g. BGP peering between loopbacks, then definitely use BFD there, since we would want to detect loss of reachability to the peer regardless of any events (or not) in the underlay.

The bfd per-link is more interesting. In the normal case of BFD over a port-channel interface, if the member link over which bfd control packets have been hashed goes down, there will be packet loss. If the duration of the packet loss (i.e. the time taken to detect the failure of the link and rehash the bfd control packets over an active link) exceeds the bfd timeout, the bfd session will go down and signal a failure to the client applications in spite of the fact that the port-channel itself is still up.

In order to prevent these false negatives, the bfd per link feature runs micro BFD sessions over each active member of the peer link and reports the aggregate status of all these micro bfd sessions as the status of the session over the port-channel i.e. if even one micro bfd session is Up, the port-channel session will report its state as Up.

If you are running BFD over port-channels, then using the per-link flavor would be a reasonable option, especially if you've been having issues with BFD tearing down port-channels incorrectly.

LFS is more geared towards detecting L1/optical problems e.g. that the FCS errors exceed a particular threshold, then the switch can disable the link automatically, or flag the issue via syslog. The question that you'd want to be considering if deploying LFS is if it is better to keep the link up, with it occasionally (or indeed frequently) corrupting frames, or to shut it down - so that (ideally) an alternative/good path is used.

I'd view LFS as more of a compliment to BFD rather than an alternative.


Post your Answer

You must be logged in to post an answer.