Note: I wrote this post in somewhat of a rush and I didn’t
have time to do diagrams. I have had a bit of a hiatus and I wanted to add
something to the blog but I'll update the content with the diagrams at a later date. The diagrams have been added. Also, a question asked by a good friend inspired this
post.
You conceded the point and your team will be using a single
vCenter to manage multiple physical Data Centers. All right, not the end of the
world, you’ll be fine. But a few developers are requiring Layer 2 across some
of those Data Centers. Again, not the end of the World; besides that’s what NSX
is for. However, do you understand what the impact to the Virtual Workloads in
the stretched Layer 2 (VXLAN) would be if one of those physical Data Centers
loses network connection to the Management Plane (vCenter, NSX Manager, etc…)
and the Control Plane (NSX Controllers, Logical Router Control VMs, etc…)? To
keep the topic to the network impact, we will assume that Virtual Workloads are
using Storage local to their Data Center.
Figure 1: DC isolation
Elver’s Opinion: Since we
have Logical Switches distributed among multiple physical Data Centers, I’ll
make the very safe assumption that you won’t be doing Layer 2 Bridging. If you
are trying to do Layer 2 Bridging, call me so I can talk you out of it.
To get the obvious out of the way, if you don’t have the
Management Plane, you can forget about vMotion, Storage vMotion and any NSX
Configuration changes.
Let’s tackle the slightly not so obvious within Layer 2.
Virtual Workloads within the impacted Data Centers, and in the same Logical
Switch, will be able to talk to each other via VXLAN (Overlay) as well as to
other Virtual Workloads running in VTEPs within other Data Centers they can
reach. NSX is built such that the VTEPs (ESXi hosts) will continue to
communicate with each other in the event the NSX Controllers are not reachable.
There will be an uptick of Broadcasts (specifically ARP Requests) and Unknown
Unicast traffic being replicated by the VTEPs, but the uptick shouldn’t be much
impacting. At the Control Plane, assuming the NSX Controllers are still
operational, they will remove the “isolated” VTEPs, and their associated
entries, from all their tables (Connection, VTEP, MAC, ARP) and tell the
“reachable” VTEPs to remove the “isolated” VTEPs from their VTEP Tables.
Figure 2: Inter-Logical Switch
If two Virtual Workloads within the impacted Data Centers
are in different Logical Switches (VXLANs) and those Logical Switches connect
to the same Logical Router, the Virtual Workloads will be able to talk to each
other; from the Logical Router’s perspective both subnets are directly
connected.
Figure 3: Intra-Logical Switch - Same Logical Router
The not so obvious (because of the depends involved) is the impact on Layer 3 traffic that does not
stay confined within the same Logical Router. The impact can be narrowed down
to two types of traffic flows. One where the Source and Destination Workloads
are hanging off different Logical Routers as their default gateways, and the
other where the second Workload is not connected to a Logical Switch (think a
Physical Workload or a Virtual Machine in a VLAN):
Elver’s Opinion: Type 2 flows
are basically Layer 3 traffic between Virtual and Physical networks.
Type 1:
If the Source and Destination Workloads are hanging off
different Logical Routers then you need an NSX Edge or another NFV Appliance to
do routing (two Logical Routers can’t connect to the same Logical Switch nor
same dvPortgroup). Is this Appliance within the impacted Data Centers? If not,
the two Workloads won’t be able to talk to each other because there would be no
logical path for the flow to reach the Appliance so that it can do the routing
(remember the impacted Data Centers have some sort of “isolation”).
Figure 4: Intra-Logical Switch - Different Logical Routers
If the Appliance is within the impacted Data Center, then
the two Workloads may reach each
other. I say may because it all
depends on whether there is a routing protocol between the Logical Routers and
the Appliance. If you are using static routes, then yes the two Workloads can
talk to each other. But if you are running a routing protocol, can the Logical
Routers’ Control VMs reach the Appliance to exchange routing control traffic?
If the Appliance lost connection to one or both of the Logical Routers’ Control
VMs, then the Appliance will remove the routes to the Workload’s subnets from
its routing table, thus making that subnet unreachable to itself.
While still in Type 1 flow, it is worthwhile to point out
that if two Workloads in impacted Data Centers but in different Logical Routers
can talk to each other, then Virtual Workloads in non-impacted Data Centers but
the same Logical Routers will NOT be able to communicate because they won’t be
able to reach the NSX Edge or NFV Appliance.
Type 2:
If the second Workload is not connected to a Logical Switch
(Physical Workload or VM in a VLAN), we definitely need a Perimeter Edge or an
NFV Appliance with a NFV connection to a VLAN dvPortgroup. We will assume that
we are running a routing protocol. It is similar to the Layer 3 Type 1 flow but
with a few variants.
Figure 5: Virtual to Physical
Variant 1: The Appliance can reach the Logical Router
Control VM AND the second Workload is in one of the impacted Data Centers. In
this instance communication between the Workloads will happen. However a
Virtual Workload hanging off the same Logical Router but not within the
impacted Data Centers will NOT be able to talk to the second Workload because
the Appliance wouldn’t be reachable to it.
Variant 2: The Appliance can reach the Logical Router
Control VM AND the second Workload is not in one of the impacted Data Centers.
In this case the two won’t be able to communicate; remember the impacted Data
Centers are “isolated” thus no traffic can come in or go out.
Variant 3: The Appliance can’t reach the Logical
Router Control VM. It doesn’t matter where the second Workload is because the
Appliance will remove the Virtual Workload’s network, thus making that subnet
unreachable to itself. Thus the Workloads won’t be able to talk to each other.
However, if you are using static routes, refer back to variants 1 and 2.
Variant 4: The Appliance is not in the impacted Data
Centers. In this case there is no way for Logical Router to reach the
Appliance, thus the Workloads won’t be able to talk to each other.
To wrap it up, please note that if the Perimeter Edge or NFV
Appliance is located in one of the impacted Data Centers, no Virtual to Physical
network traffic in the non-impacted Data Centers will be possible.
No comments:
Post a Comment