Posted on January 2, 2020 5:26 am
 |  Asked by shuja naqvi
 |  207 views
0
0
Print Friendly, PDF & Email

Hi

I have been reading a lot on EVPN A-A multihoming. The one obvious advantage is that the routing to orphaned host or layer3 subnet will not suffer a blackhole because both the leaves will advertise those hosts or subnets using their separate next hop addresses. The one big issue is convergence. With A-A multihoming, the entire convergence is dependent on BGP EVPN fabric where as in mlag the convergence is done on the data plane. Can someone explain a bit further about any technique used to speed up convergence in evpn A-A multihoming with IRB if one is to choose this model over evpn-mlag.

0
Answered on January 2, 2020 5:39 am

Hi Shuja,

EVPN A-A multihoming offers the following advantages compared to MLAG:
- No peer-link is needed
- Standards based (RFC 7432) approach, so it offers cross-vendor interop
- Supports more than two switches for additional redundancy if needed whereas MLAG is limited to two switches

With MLAG, both VTEPs share a common loopback address serving as one logical VTEP unlike EVPN A-A where each VTEP has a unique VTI address. Note that in both these cases there should NOT be any blackholing (assuming correct configuration) for routing to singly connected hosts or overlay L3 subnets. However, if you were referring to sub-optimal forwarding via peer-link as a con, EOS provides some knobs to address this. For additional details on this subject, please refer to these features:

https://eos.arista.com/eos-4-23-0f/evpn-mlag-single-homed-hosts/
https://eos.arista.com/eos-4-21-3f/evpn-mlag-shared-router-mac/

On the convergence aspect, MLAG offers superior performance compared to EVPN A-A as you mentioned. This is because in EVPN A-A multihoming, failure signaling is via EVPN control plane. For example, an ES failure causes route churn/propagation/processing of Type-1 & Type-2 updates by design and there's an additional cost incurred by all the nodes in the EVPN domain which manifests in the form of poor convergence. Unlike this, in MLAG an ES failure is a treated as a "local" event and the shared logical VTEP along with MLAG peer-link ensures there are no convergence issues. In summary, there's not much you can do about the convergence with EVPN A-A multihoming; it's really the price you pay for other benefits it offers.

Cheers
Naveen

0
Posted by Aniket Bhowmick
Answered on January 2, 2020 6:41 am

Hi Shuja,

When it comes to link failover, MLAG data-plane convergence is definitely faster compared to control-plane convergence via EVPN routes.

EVPN A-A uses the Type-1 and Type-4 routes for failover.
Consider two switches (Sw1 and Sw2) that connects to same host (same Ethernet segment identifier). Both the switches advertises the Type-4 routes (for each mac-vrf or a single L3VRF) to remote VTEPs (or PE) with the help of which the remote VTEPS is aware of the fact that the two switches are connected to the same host since it has the same Ethernet Segment ID.
This allows the remote VTEP to perform L2ECMP (in case of mac-vrf) and the remote host can forward the traffic to either of the VTEPs (even though only one VTEP has advertised the EVPN mac), thanks to Aliasing !

Now if a link goes down (from one of the switch to the host- consider the link between Sw2 and the host in this case), then the Sw2 has to withdraw it's Type-4 route from the remote VTEP for the particular interface (which has a dedicated Ethernet Segment Identifier). That allows the remote VTEP to know that Sw2 is no more directly connected to the host and it will stop forwarding any traffic to Sw2 (for that Ethernet segment ID which went down). So this failover actually depends on route withdrawl (control-plane). Now if there is any delay in sending the withdrawal routes to remote VTEP, the remote VTEP will keep on sending traffic to Sw2 and it will get blackholed.

While in case of MLAG, Sw1 and Sw2 will use the same Vxlan loopback IP and hence there is underlay ECMP from Spine to Sw1-Sw2. So if link between Sw2 and host goes down, traffic can still end up coming to Sw2. But Sw2 will use the peer-link (which is a redundant link) to steer the traffic to Sw1 and then Sw1 will forward the traffic to the host. Traffic will never get blackholes as peer-link is always present to steer the traffic to Sw1. This failover doesn't depend on EVPN route-exchange.

Only thing we have to ensure in case of MLAG is that the peer-link should never go down and for that the peer-link, which is a port-channel , should have more than one interfaces as member of the port-channel.

If you need more clarification, feel free to contact our TAC Support line.


Thanks,
Aniket

Post your Answer

You must be logged in to post an answer.