In the last blog I spent a lot of writing (and funky
formulas) just to say that the DCI circuits between two Data Centers need to be
larger than the biggest DC WAN link plus the inter-DC traffic (which increases
your cost). When stretching layer 2 across DCs, there is not much that can be
done to force DC ingress traffic to come in via the DC WAN link where the
destination workload (VM) is running. However, there are some things you can do
to force the egress traffic to go out the DC where the VM is located (and avoid
using the DCI circuits) to reduce some of the cost associated with the DCI
links. For this blog post I’m going to assume that we have Active/Active DCs.
Dual Default Gateways
When you stretch the layer 2 across the DC, the default
getaway for the stretched layer 2 segments could be physically located in one
of the two DCs. We don’t care about that use case (that would probably require
a standalone blog post to talk about the cons of this design). Instead let’s
assume that we have default gateway services in both DCs, and to provide
redundancy, we will have two routers as default gateways in each DC, running FHRP
(something like VRRP), as shown in the diagram.
In this design, the VM will forward traffic to its local
default gateway, which in turn will forward the traffic out of it local DC WAN.
For this design to work (1) there must be a mechanism to stop each pair of DC
default gateways from seeing each other (otherwise you won’t get both pairs of
FHRP routers to be Active with the same virtual MAC) and (2) to prevent a VM in
one DC from receiving ARP replies for its default gateway from the other DC’s
pair of default gateways. You could achieve this with Access List (too manual)
or you could stretch the layer 2 with something like Cisco OTV, which has
built-in mechanism (less manual) to isolate FHRP in each DC.
This design does have some potential issues that must be
taken into account. If each pair of default gateways uses different virtual MAC
addresses when replying to ARP, a VM that moves DCs will lose connectivity
(until it re-ARPs for its default gateway). Also, if both members of a pair of
default gateways go down, you may have to remove the FHRP isolation to allow
the impacted VMs to reach the default gateways in the other DC.
Distributed Default
Gateway (Top of Rack)
An alternative to dual default gateways is to stretch the
layer 2 using VXLAN (or another tunneling protocol) from the Top of Rack (ToR).
In this design all ToR will have the Layer 3 boundary and be the default
gateways for their own racks. Every time the ToR gets an ARP request, the ToR
will respond to it (and provide local FHRP isolation), as shown in the diagram
below. Two examples of this are Arista’s DCI with VXLAN and VARP, Brocade’s IP
Fabric with Anycast gateway (for the time being until Broadcom decides what to
do with Brocade’s Network business).
One advantage of this design over the previous one: there
are a lot more routers (the ToRs) acting as default gateways and built in FHRP
isolation. If a ToR dies, only the rack where it resides will be impacted as
opposed to the entire DC. Also since all ToR have the use the same virtual MAC,
when a VM moves DCs, it continues to have uninterrupted Layer 3 connectivity.
One disadvantage is that you would need to fiddle with route advertisements to
ensure the ToRs forward traffic straight up the local DC WAN; this many not be
as easily done as it sounds.
Side note: there
is a variation of this design where the ToR are strictly layer 2 (let’s call
them Leafs) and the distribution switches (henceforth Spines) do the Layer 3, thus
providing the default gateway services.
Distributed Default
Gateway (software)
Just like the physical version, you stretch the layer 2
using a tunneling protocol (like VXLAN or GENEVE) but you have a layer 3 process
in each hypervisor that serves as the default gateway (e.g. virtual router).
Each virtual router will have the same IP and virtual MAC (thus VMs can move
between DCs at will) and locally respond to ARP requests. And like the physical
version, you must manipulate routes to force each virtual router to send
traffic to its local DC WAN, as shown in the diagram.
VMware’s NSX-v (distributed logical router) achieves this
functionality. Each logical router is the “same” in each hypervisor except for
their routing tables. Each logical router in each DC will get routes only relevant
to it. This way, each logical router is “forced” to forward traffic using its
local WAN.
Elver’s Opinion: This blog
post should (mostly) conclude my thoughts on stretching Layer 2 across the DCI
(think hard before doing it). At first I thought I would use this blog to also
talk about local egress in NSX (to wrap up my thoughts on the matter), but as I
wrote I realize I would need more space than I thought, so I’ll be writing
another blog post just on local egress in NSX.
No comments:
Post a Comment