Friday, September 16, 2016

Restoring NSX from Old Backup - With Control VM


Ok, yesterday I posted Restoring NSX from Old Backup - Impact on Distributed Network where I said I was 5 sigma sure the Control VM wouldn’t make a difference to the restore. 5 sigma is probably not as good as 6 sigma (whatever that is) so this post is to show NSX Manager’s recovery being done when the Logical Router is deployed with its Control VM.

Here is the logical view of the network with a Global Logical Router that has a Control VM:


That’s right, his diagram is the same diagram from yesterday, with no Control VM. That’s because the logical diagram depicts the Data Plane, not the Control Plane (gotcha). However, the Control VM does have a connection for its HA interface (formerly known as Management Interface), which I dropped in the dvPortgroup COM_A1-VMMGT. Below is a diagram of the vDS after deploying the Control VM (this time I showed the Uplinks so you can see the two ESXi hosts…sorry for missing that yesterday).


So, I removed the logical router (default+edge-6) that was there, made a backup (Backup4) of NSX Manager, deployed a new logical router (piratas+edge-7) with its Contol VM (that’s how I got the above vDS screenshot) and did one more backup (Backup5) to easily return back to the end-state. Below is a screenshot of the new logical router.


Bonus Point 1: What is the Tenant name of the new Logical Router? Answer provided at the end of this post… Now back to the show.

And here is what com-a1-esxi01 sees:


Bonus Point 2: Why does the output for Edge Active says No? Answer provided at the end of the post.

And here is the same output (with some additional show commands to find the host-id of com-a1-esxi01), but taken from NSXMGR:


And here is me consoling in the Control VM and showing its routing table:


After restoring Backup4 (no logical router), here is what the com-a1-esxi01 host sees.



Even NSX Manager also forgot about it:


However, vCenter still sees the Control VM (it is a VM after all):


We can also console in to the Control VM (or if we had bothered to put an IP in the HA interface and enabled SSH, we could've gone in-band) and show the routing table:


Are you surprised the Control VM still shows the LIFs as connected? Let’s ponder on this for a bit. The Control VM doesn’t communicate directly with the ESXi hosts, so it has no clue that all of them dropped the Logical Router. It receives its information (configuration wise, like the LIFs and IPs) from NSX Manager. NSX Manager has not told the Control VM (since it forgot about the Control VM's existence) that the Logical Router is no longer around, thus the Control VM continues to believe all is good and the LIFs are still connected (up/up)...even after a few hours of not “hearing” from NSX Manager.

After a few hours, I restored from Backup5 (the end-state), the logical router came back, and NSX Manager remembered about the Control VM.

Elver’s Opinion: I don’ think I have an opinion today (something all wise married men know how to do too well)…but I would brag a little that I was right when I said yesterday that the restore would have the same impact to the logical router whether it has a Control VM or not.

Bonus Points Answers: Gotcha again (actually, I lied this time). Instead giving you the answers, how about you tweet the answers to me, @ElverS_Opinion? The first person to tweet both answers will get a signed copy, in two languages mind you, of the VCP6-NV Official Cert Book1. Just make sure you follow me so you can send me your mailing address via private IM.


1 Offer only valid for those that can locate the Seven Kingdoms in a map, agree with the fact that Citizen Kane is the best movie EVER and know what is Bachata.

Thursday, September 15, 2016

Restoring NSX from Old Backup - Impact on Distributed Network

I’ve been slacking (from writing) for a few months now but at VMWorld 2016 ‪@LuisChanu‪ reminded me of a blog I had promised him. My first ever blog was NSXManager Backup and Restore but he wanted to know of a few “whatifs”, like what would happen if you restore NSX Manager using an old backup. So this post is to fulfill (better late than never) my promise to Luis and write about what happens to the distributed network when you use old backups to restore NSX Manager.

To get us started. Below is a logical diagram of the NSX setup. We have one Global Logical Router and two Global Logical Switches. Logical Switch 1 has VM ServerWeb01 and Logical Switch 2 has VMs ServerApp01 and ServerApp02. ServerWeb01 and ServerApp02 are running in the same ESXi host, com-a1-esxi01 (not shown in the diagram).


I used a single cluster with ESXi hosts com-a1-esxi01 and com-a1-esxi02, both members of the same vDS. The initial (no logical switches deployed yet) vDS topology is shown below.


I made two backups (actually three backups) of NSX Manager to a FTP server. Backup1 does not have any logical switches or the logical router. Backup2 has the logical switches but not the logical router. Backup3 is my end-state with all configurations (I did it so I could quickly go back to a working state during testing).

Elver’s Opinion: I’ve used the built-in backup feature of NSX to do this lab. I’m 5 sigma confident that the same result would’ve been obtained if you used another method that backups the NSX Manager Appliance. Btw, if writing-slacking is really out of me, I’ll soon do a follow up post to cover the impact to NSX Security when restoring NSX Manager from an old backup.

So we have our vSphere/NSX environment working the way we want it (end-state) when a gamma ray hits the right (or wrong depending how you look at it) chip in one of the DIMMs that happened to be hosting the memory pages of NSX Manager, corrupting its database and rendering it useless (yes, it could happen specially if your ESXi host is onboard the International Space Station).

Elver’s Opinion: Instead of restoring NSX in this ET event, you could call VMware support. They have some tricks up their sleeves to recover from some types of database corruptions.

Just before the gamma ray hit the RAM, this is what our vDS looked like:


And the deployed Logical Switches:


And the deployed Logical Router:


And what com-a1-esxi01 saw:


Good to know: A quick detour to point out something about the CLI output. Notice that both logical switches (VXLAN 32000 and 32001) have a Port Count of 2. Each logical switch has one connected VM running in com-a1-esxi01 plus one LIF from the logical router.

Now back on the road, we did some ping tests (from ServerApp01 to ServerWeb01) to show that traffic is flowing between the two logical switches, via the logical router.


Let’s go ahead and restore from Backup2, the one that has the logical switches but not the logical router. After NSX Manager finishes the restore, we log back in to the Web Client and see the logical router missing from the Network and Security view (which is what would be expected since we restored from a backup that didn’t have a logical router).




One thing NSX Manager does after reestablishing the connection to vCenter, it reaches out to the ESXi hosts (vCenter has nothing to do with this) and asks them (politely) to get rid of the logical router that it does not know about (actually, NSX Manager pushes the logical routers that it does know about to the ESXi hosts and the hosts purge everything else). Below is a CLI output from com-a1-esxi01 showing the logical switches with the Port Count field down to 1 and no logical router present.


If we try to ping from the VMs to the default gateway (remember the LIFs are gone), the pings fail.


Just for kicks and giggles, I restored Backup3 and the logical router returned. I was able to ping between Layer 2 segments via the logical router.

Elver’s Opinion: I deployed the logical router without a Control VM as I expect (again, with 5 sigma certainty, for which I expect to be nominated to a Novell Prize) that the results would be the same as if I had deployed the Control VM.

Now to restore from Backup1, with no logical switches and no logical router. After the usual routine of waiting for NSX Manager to finish the restore and logging back in to the Web Client, I confirmed there were no logical switches in the Network and Security view.



However (and this should’ve been expected by you), the dvPortgroups representing the logical switches remained. dvPortgroups are owned by vCenter and vCenter was not part of the restore process. Looking at the ESXi host, it still had the information for the logical switches:


Again, this should’ve been expected because the difference between a VLAN dvPortgroup and a VXLAN dvPortgroup are the Opaque Network fields (VXLAN ID, Multicast address) in the VXLAN dvPortgroup that were pushed by vCenter to each of the ESXi hosts in the vDS. NSX Manager gave the Opaque Field values to vCenter. When NSX Manager is restored from the old backup, it is not aware of the VXLAN dvPortgroup thus it has no way of telling vCenter to clean up (which is a good thing by the way). You won’t be able to make any changes to those logical switches (VXLAN dvPortgroups) but the Data Plane will continue to run.

A quick ping between ServerApp01 and ServerApp02 (which were running in different hosts) proved VXLAN was working between the VTEPs.



Elver’s Opinion: So we have a split verdict on the impact to the distributed network of restoring NSX from old backups. The Layer 3 (logical routers) would get affected (this is bad) while the Layer 2 (this is good) would not. As an aside, I didn’t test with the NSX Edge appliance as once it is deployed (configs pushed by NSX Manager) the Edge goes about its business in the Control/Data planes irrespective of what happens to NSX Manager.

Thursday, April 21, 2016

Software iSCSI and LACP - They Like Each Other

Every now and again I’m end up recommending LACP for a Software iSCSI deployment in a vSphere environment. And every now and then I get pushed back because VMware recommends that LACP not be used, ahem, discourages the use of LACP to carry iSCSI traffic. And every time I have to explain that we need to read the fine print before we take a vendor’s recommendation against the use of a widely used protocol and just run with it.

Everything else being equal, iSCSI will provide more efficient load sharing among two or more links between the Software iSCSI initiator with MPIO and a physical switch than LACP between the vDS and the physical switch. LACP will use at most the VLAN and frame information (L2/L3/L4) to define a “flow”, and pin that "flow" to one of the links in the Port Channel. MPIO uses application (iSCSI) information, such as connection IDs, to determine the link to send the egress iSCSI traffic, which provides more granularity on what a “flow” is (as compared to LACP). MPIO will then pin that narrower-defined (ßis that even a word?) “flow” to one of the links in the Port Channel.

But how do we we know when "Everything else" is not "equal"? To help decide, we should get additional information on the following:

Can LACP even be configured?
If no, carry on; nothing to see here. Chassis based servers (server blades) don’t support LACP to the blades. If yes, then read on.

What type of performance does the initiator require?
If you have a situation where the initiator needs higher throughput capacity than what is available over a single link, you probably want to go with MPIO, as it will load share the iSCSI traffic more efficiently than LACP. This will reduce the risk of sending traffic over an over-subscribed link and having the traffic dropped.

How fast must failover take place in case of link failure?
With MPIO, the initiator will fail the affected egress iSCSI traffic immediately upon detecting the link down. However the physical switch will not failover the traffic until it re-learns the initiator’s iSCSI MAC address from the failed link over one of the remaining links. Thus the physical switch has a direct dependency on the initiator. Contrast that with LACP, where both the initiator and the physical switch will do the failover immediately and independent of each other.

Elver’s Opinion: You should have some sort of business guidance on how many link failures should be tolerated in your uplinks (between the initiator and the first Physical switch). It is not unusual for a 2:1 rule to be applied; have twice the number of links you need so you can tolerate half the links failing.

Does the initiator have sufficient NICs to dedicate to iSCSI?
MPIO is like a 2-year old (and some adults I know): it does not share with others. Thus to use MPIO, you must have dedicated NICs for it. All other Ethernet traffic must use other NICs. If you don’t have sufficient NICs, you must share and should use LACP. If you want to add the additional NICs, you need to analyze the cost, and the level-of-effort to do so, vs. the reward. From experience, odds would tend to favor not getting the extra NICs.

Elver’s Opinion: LACP is not really required here but if you don’t configure it, the initiator would have less hashing options for load sharing all Ethernet traffic (iSCSI and non-iSCSI) among the available uplinks, and the physical switch will not do ANY load sharing of its own.

Does the Physical switch have sufficient ports to dedicate to the iSCSI?
If you need to add port capacity in the switch, you might need to weight the cost and level-of-effort required to add that capacity. Many times, it won’t be as simple as just replacing the switch or adding a new one.

How many Arrays (IPs) would the initiator talk to?
The more Arrays the initiator has to communicate with, the closer in load sharing performance LACP will get to MPIO. If you have a single Array (IP), LACP will see a single flow between the initiator and the target (unless the session drops and gets reestablished with a different TCP source port). LACP can be configured to use the destination IP as the load sharing hash, and the more there are, the better distribution over the uplinks.

Elver’s Opinion: Note of caution here. For the hash to do the best load sharing job it can, the IPs must have some sort of variance in the last octet, which is related to the number of active links in the Port Channel. If there are only two links in the Port Channel, you should try to get a similar number of even-numbered and odd-numbered Array IPs.

It looks like a tie, so which should we choose?

Ask Operations. They are the ones that will wake up at 2am to fix problems. The DC trend has been to consolidate as much as possible to maximize the use of physical resources. However, not all Enterprises have had their Operations teams update processes and knowledge transfer to take on a change in direction. Choosing the "wrong" one here may increase OpEx to the business.

Wednesday, April 20, 2016

A vCenter, NSX Manager, Multiple DCs...and I Can't Reach Them

Note: I wrote this post in somewhat of a rush and I didn’t have time to do diagrams. I have had a bit of a hiatus and I wanted to add something to the blog but I'll update the content with the diagrams at a later date. The diagrams have been added. Also, a question asked by a good friend inspired this post.

You conceded the point and your team will be using a single vCenter to manage multiple physical Data Centers. All right, not the end of the world, you’ll be fine. But a few developers are requiring Layer 2 across some of those Data Centers. Again, not the end of the World; besides that’s what NSX is for. However, do you understand what the impact to the Virtual Workloads in the stretched Layer 2 (VXLAN) would be if one of those physical Data Centers loses network connection to the Management Plane (vCenter, NSX Manager, etc…) and the Control Plane (NSX Controllers, Logical Router Control VMs, etc…)? To keep the topic to the network impact, we will assume that Virtual Workloads are using Storage local to their Data Center.

Figure 1: DC isolation

Elver’s Opinion: Since we have Logical Switches distributed among multiple physical Data Centers, I’ll make the very safe assumption that you won’t be doing Layer 2 Bridging. If you are trying to do Layer 2 Bridging, call me so I can talk you out of it.

To get the obvious out of the way, if you don’t have the Management Plane, you can forget about vMotion, Storage vMotion and any NSX Configuration changes.

Let’s tackle the slightly not so obvious within Layer 2. Virtual Workloads within the impacted Data Centers, and in the same Logical Switch, will be able to talk to each other via VXLAN (Overlay) as well as to other Virtual Workloads running in VTEPs within other Data Centers they can reach. NSX is built such that the VTEPs (ESXi hosts) will continue to communicate with each other in the event the NSX Controllers are not reachable. There will be an uptick of Broadcasts (specifically ARP Requests) and Unknown Unicast traffic being replicated by the VTEPs, but the uptick shouldn’t be much impacting. At the Control Plane, assuming the NSX Controllers are still operational, they will remove the “isolated” VTEPs, and their associated entries, from all their tables (Connection, VTEP, MAC, ARP) and tell the “reachable” VTEPs to remove the “isolated” VTEPs from their VTEP Tables.

Figure 2: Inter-Logical Switch

If two Virtual Workloads within the impacted Data Centers are in different Logical Switches (VXLANs) and those Logical Switches connect to the same Logical Router, the Virtual Workloads will be able to talk to each other; from the Logical Router’s perspective both subnets are directly connected.

Figure 3: Intra-Logical Switch - Same Logical Router

The not so obvious (because of the depends involved) is the impact on Layer 3 traffic that does not stay confined within the same Logical Router. The impact can be narrowed down to two types of traffic flows. One where the Source and Destination Workloads are hanging off different Logical Routers as their default gateways, and the other where the second Workload is not connected to a Logical Switch (think a Physical Workload or a Virtual Machine in a VLAN):

Elver’s Opinion: Type 2 flows are basically Layer 3 traffic between Virtual and Physical networks.

Type 1:
If the Source and Destination Workloads are hanging off different Logical Routers then you need an NSX Edge or another NFV Appliance to do routing (two Logical Routers can’t connect to the same Logical Switch nor same dvPortgroup). Is this Appliance within the impacted Data Centers? If not, the two Workloads won’t be able to talk to each other because there would be no logical path for the flow to reach the Appliance so that it can do the routing (remember the impacted Data Centers have some sort of “isolation”).

Figure 4: Intra-Logical Switch - Different Logical Routers

If the Appliance is within the impacted Data Center, then the two Workloads may reach each other. I say may because it all depends on whether there is a routing protocol between the Logical Routers and the Appliance. If you are using static routes, then yes the two Workloads can talk to each other. But if you are running a routing protocol, can the Logical Routers’ Control VMs reach the Appliance to exchange routing control traffic? If the Appliance lost connection to one or both of the Logical Routers’ Control VMs, then the Appliance will remove the routes to the Workload’s subnets from its routing table, thus making that subnet unreachable to itself.

While still in Type 1 flow, it is worthwhile to point out that if two Workloads in impacted Data Centers but in different Logical Routers can talk to each other, then Virtual Workloads in non-impacted Data Centers but the same Logical Routers will NOT be able to communicate because they won’t be able to reach the NSX Edge or NFV Appliance.

Type 2:
If the second Workload is not connected to a Logical Switch (Physical Workload or VM in a VLAN), we definitely need a Perimeter Edge or an NFV Appliance with a NFV connection to a VLAN dvPortgroup. We will assume that we are running a routing protocol. It is similar to the Layer 3 Type 1 flow but with a few variants.

Figure 5: Virtual to Physical

Variant 1: The Appliance can reach the Logical Router Control VM AND the second Workload is in one of the impacted Data Centers. In this instance communication between the Workloads will happen. However a Virtual Workload hanging off the same Logical Router but not within the impacted Data Centers will NOT be able to talk to the second Workload because the Appliance wouldn’t be reachable to it.

Variant 2: The Appliance can reach the Logical Router Control VM AND the second Workload is not in one of the impacted Data Centers. In this case the two won’t be able to communicate; remember the impacted Data Centers are “isolated” thus no traffic can come in or go out.

Variant 3: The Appliance can’t reach the Logical Router Control VM. It doesn’t matter where the second Workload is because the Appliance will remove the Virtual Workload’s network, thus making that subnet unreachable to itself. Thus the Workloads won’t be able to talk to each other. However, if you are using static routes, refer back to variants 1 and 2.

Variant 4: The Appliance is not in the impacted Data Centers. In this case there is no way for Logical Router to reach the Appliance, thus the Workloads won’t be able to talk to each other.

To wrap it up, please note that if the Perimeter Edge or NFV Appliance is located in one of the impacted Data Centers, no Virtual to Physical network traffic in the non-impacted Data Centers will be possible.