Friday, April 7, 2017

NSX Local Egress

Better late (VERY late) than never I suppose. Here is my last blog post on the DC DCIs and it involves the Local Egress feature of NSX. I don’t quite remember all I wrote on my earlier posts (which you can read here, here and here) since it has been a bit since I wrote them (and I for some reason can’t get into reading stuff I’ve written); but I believe I explained what happens to egress traffic to the DC and how it affects the bandwidth requirements.

In version 6.1 (or was it 6.2?) of NSX-v, VMware introduced something they are calling Cross vCenter NSX. The architecture consists of having multiple NSX Managers (one per vCenter) that can exchange some configuration information as well as share a common set of NSX Controllers. The goody part of it is that it allows you to have multiple vCenters managing different DCs while stretching a broadcast domain (using VXLAN) among all the DCs (up to 8 DCs). For this blog post we will assume the DCs are physical and have DCIs between them.

On the surface this is a cool feature (putting aside that stretching a broadcast domain over multiple locations violates the 11th commandment) since it allows all VTEPs (belonging to the same Universal Transport Zone) in all DCs to be fully meshed with each other. However, this by itself doesn’t address the problem of keeping local layer 3 egress traffic localize and without crossing the DCIs.

To illustrate the problem, please have a look at the diagram below.

Notice that the local copy of the Logical Router (it is an Universal Logical Router, which is functionally the same as the Global Logical Router) will have two paths to reach the intranet: one via the NSX Edge in DC1 and another over the NSX Edge over DC2. By default NSX Components are unaware of locality (location) and thus to the Logical Router (specifically, the LR Control VM that owns the OSPF/BGP adjacency the NSX Edges) both NSX Edges are equally good to reach external networks and it can choose either NSX Edge to forward the traffic to. There is a bit of more complexity here about route costs, but we’ll skip those for this blog post.

So to handle the layer 3 egress traffic from the Logical Router and keep it from going over the DCI, NSX introduces a method to provide location awareness to the routing table creation. And the method is quite simple: Assign an ID to all Universal Logical Router entities (the LR Control VM and ESXi hosts but NOT the NSX Edges) that you want to belong to the same “location”. Using this location ID, or Locale ID as VMware calls it (which is 128 bits long and defaults to the UUID of the NSX Manager – but it can be changed), the NSX Controller (the Universal NSX Controller since we are doing Cross vCenter NSX) will only populate the routing table to those ESXi hosts that have the same Locale ID as the LR Control VM, as shown in the figure below.
If you have been following up to this point then you noticed that we introduced more problems than we actually solved (thus far). In the diagram above we have two different problems: 1) Only ESXi hosts in DC1 would get the routing table since they have the same Locale ID as the LR Control VM and 2) The LR Control VM still sees the NSX Edges in both DCs.

To solve problem 1, NSX allows you to deploy a second LR Control VM from the other NSX Manager (up to 8 LR Control VMs, one per NSX Manager in the Cross vCenter NSX domain) and you assign that second LR Control VM the same Locale ID as the ESXi hosts in the DC2. The solution to problem 1 is shown in the diagram below.
To solve problem 2, it is necessary to prevent the LR Control VM in DC1 from forming routing adjacencies with the NSX Edge in DC2 (and vice-versa for our second LR Control VM in DC2). The solution involves creating a Logical Switch (Universal Logical Switch) per LR Control VM and connecting only entities (NSX Edge, LR Control VM) that reside in the same “location” to the Logical Switch, as shown in the diagram below.

Y ya. The combination of Locale ID, multiple LR Control VMs and a dedicated Logical Switch per “location” ensures that egress layer 3 traffic from one location doesn’t cross the DCI.

Elver’s Opinion: I don’t think I have an opinion today…actually I do (now that I think about it). Locale ID is a good solution that NSX-v introduces to handle local egress, but I think there is a missed opportunity here. VMware could allow the NSX Edge to be assigned a Locale ID (after all the LR Control VM IS a modified NSX Edge) and avoid the need to have a ULS per location. I’m not sure that I know the reason why they didn’t design the Locale ID solution in this way (or that they have a good one).

1 comment:

  1. I have two questions regarding what seems to be a too complex (made by vmware) solution.
    First, why can't we just manipulate the route metrics which edge routers are advertising to control VMs. Each ESG has two peers (control VMs) - we intentionally make worse the metrics that are sent to ESG on the opposite site. In that way DLRs will always use the local edge router instead of DCI link.
    If for some reason we cannot solve this with metric manipulation, then there is another idea (instead of using two separate ULS switches for peering separation). Can't we just make each edge router peer with its own local control VM? Why we have to make a full routing mesh between two edge and two control VM routers?

    FYI I am starting to understand NSX, still did not have much hands on practise so my comments may seem inappropriate.