Better late (VERY late) than never I suppose. Here is my
last blog post on the DC DCIs and it involves the Local Egress feature of NSX.
I don’t quite remember all I wrote on my earlier posts (which you can read
here, here and here) since it has been a bit since I wrote them (and I for some
reason can’t get into reading stuff I’ve written); but I believe I explained
what happens to egress traffic to the DC and how it affects the bandwidth
requirements.
In version 6.1 (or was it 6.2?) of NSX-v, VMware introduced
something they are calling Cross vCenter NSX. The architecture consists of
having multiple NSX Managers (one per vCenter) that can exchange some
configuration information as well as share a common set of NSX Controllers. The
goody part of it is that it allows you to have multiple vCenters managing
different DCs while stretching a broadcast domain (using VXLAN) among all the
DCs (up to 8 DCs). For this blog post we will assume the DCs are physical and
have DCIs between them.
On the surface this is a cool feature (putting aside that
stretching a broadcast domain over multiple locations violates the 11th
commandment) since it allows all VTEPs (belonging to the same Universal Transport Zone) in all DCs to be fully meshed with each other. However, this by
itself doesn’t address the problem of keeping local layer 3 egress traffic
localize and without crossing the DCIs.
To illustrate the problem, please have a look at the diagram
below.
Notice that the local copy of the Logical Router (it is an
Universal Logical Router, which is functionally the same as the Global Logical
Router) will have two paths to reach the intranet: one via the NSX Edge in DC1
and another over the NSX Edge over DC2. By default NSX Components are unaware of
locality (location) and thus to the Logical Router (specifically, the LR
Control VM that owns the OSPF/BGP adjacency the NSX Edges) both NSX Edges are
equally good to reach external networks and it can choose either NSX Edge to
forward the traffic to. There is a bit of more complexity here about route
costs, but we’ll skip those for this blog post.
So to handle the layer 3 egress traffic from the Logical
Router and keep it from going over the DCI, NSX introduces a method to provide
location awareness to the routing table creation. And the method is quite
simple: Assign an ID to all Universal Logical Router entities (the LR Control
VM and ESXi hosts but NOT the NSX Edges) that you want to belong to the same
“location”. Using this location ID, or Locale ID as VMware calls it (which is
128 bits long and defaults to the UUID of the NSX Manager – but it can be
changed), the NSX Controller (the Universal NSX Controller since we are doing
Cross vCenter NSX) will only populate the routing table to those ESXi hosts
that have the same Locale ID as the LR Control VM, as shown in the figure
below.
If you have been following up to this point then you noticed
that we introduced more problems than we actually solved (thus far). In the
diagram above we have two different problems: 1) Only ESXi hosts in DC1 would get
the routing table since they have the same Locale ID as the LR Control VM and
2) The LR Control VM still sees the NSX Edges in both DCs.
To solve problem 1, NSX allows you to deploy a second LR
Control VM from the other NSX Manager (up to 8 LR Control VMs, one per NSX
Manager in the Cross vCenter NSX domain) and you assign that second LR Control
VM the same Locale ID as the ESXi hosts in the DC2. The solution to problem 1
is shown in the diagram below.
To solve problem 2, it is necessary to prevent the LR
Control VM in DC1 from forming routing adjacencies with the NSX Edge in DC2
(and vice-versa for our second LR Control VM in DC2). The solution involves
creating a Logical Switch (Universal Logical Switch) per LR Control VM and
connecting only entities (NSX Edge, LR Control VM) that reside in the same “location”
to the Logical Switch, as shown in the diagram below.
Y ya. The combination of Locale ID, multiple LR Control VMs
and a dedicated Logical Switch per “location” ensures that egress layer 3
traffic from one location doesn’t cross the DCI.
Elver’s Opinion: I don’t
think I have an opinion today…actually I do (now that I think about it). Locale
ID is a good solution that NSX-v introduces to handle local egress, but I
think there is a missed opportunity here. VMware could allow the NSX Edge to be
assigned a Locale ID (after all the LR Control VM IS a modified NSX Edge) and
avoid the need to have a ULS per location. I’m not sure that I know the reason
why they didn’t design the Locale ID solution in this way (or that they have a good one).
I have two questions regarding what seems to be a too complex (made by vmware) solution.
ReplyDeleteFirst, why can't we just manipulate the route metrics which edge routers are advertising to control VMs. Each ESG has two peers (control VMs) - we intentionally make worse the metrics that are sent to ESG on the opposite site. In that way DLRs will always use the local edge router instead of DCI link.
If for some reason we cannot solve this with metric manipulation, then there is another idea (instead of using two separate ULS switches for peering separation). Can't we just make each edge router peer with its own local control VM? Why we have to make a full routing mesh between two edge and two control VM routers?
FYI I am starting to understand NSX, still did not have much hands on practise so my comments may seem inappropriate.