Saturday, June 24, 2017

Customer Loyalty or Short-Term Profits

I’m fortune that I’m very busy these days (and looking for partners with the right mindset/chemistry to help us out) but a downside of it is that I’m now making last minute travel decisions, which inevitably lead to mistakes done on my part. So what should travel companies (airlines, hotels, car rentals, etc…) do when I make these mistakes?

For example, I booked myself to flight out of EWR to TPA in two different flights four hours apart. I didn’t notice this until I had to check in last night, so I called United to inquire about this. The lady that answered my call confirmed that I indeed had two flights, asked me which one I wanted to cancel and provided a refund. I’m sure that my status with the airline played a role in her decision to provide the refund but she could’ve read me the EUA and claim that I was at fault or partially at fault for being too stupid to book two flights out of the same airport four hours apart. They also waived a change fee of $200 in May when I mistakenly booked a different flight for the wrong days and noticed after 24 hours have elapsed. In any case, United decided to sacrifice short-term profits (twice) for customer loyalty. You bet that I will continue to flight United.

Which leads me to Avis. I’ve been booking with Avis for years and only booked with other car rentals when Avis was not available. Earlier this week on my way back from Toronto (en route to NYC), I stopped for the day in Philly to present at the VMUG UserCon. I reasoned that it would be more efficient (and less costly) to rent the car for a day than it would be taking a Lyft (I avoid Uber as much as I can; I haven’t liked their business ethics for a long time). I needed to go from the PHL to the VMUG (about 35 miles) and then to the Amtrak station in Philly to catch the train to Penn Station (about another 30 miles). So far so good.

While at the VMUG Michael Fleischer (@michaelfleisher) tells me I could just go over the border to Trenton (about 30-40 miles from the VMUG, about same distance to the Amtrak station in Philly) and catch the NJ Transit train for about $16 (versus $112 for Amtrak). Thinking that was a no brainer I did just that (while in the process calling Avis to confirm the location of the Avis office in Trenton) and dropped off the car to then be given an over $300 bill. My first thought was “This is a mistake” so I asked the guy at the counter to explain the bill and he tells me that I was charged for miles ($201 plus taxes), and that I needed to call Avis customer service if I had further questions. I thanked him for his time and left to catch my train.

I called Avis (billing department), provided the rental agreement number and asked why I was charged for miles. The lady that answered my call proceeded to explain that per the EUA that if I change the drop off location (no mention that the drop off location was in another State) I automatically get charged for miles (which later she clarified by stating the rental rate could change in such cases). Throughout the entire conversation she was very professional and respectful. I acknowledged that I didn’t know what the EUA said (or that I had read it for that matter…I never do) and that now I understood that per the EUA I was fully responsible for the charges. I then asked her what could be done to remove the charges (I have rented one-way before with Avis and I have never been charged by the mile). She proceeds to quote the EUA again.

Then I decided to explain to her (or remind her I suppose since she had my account info in front of her) that I was a long time Avis customer, only rent cars with them and I would’ve expected, since this was a first time occurrence, the per-mile fees would be waived. She said no they won’t be waived because…the EUA. Sigh.

I asked her if I could speak to a supervisor and to my surprise, she was it (or as she stated it “I am in the line of escalation”). At this point she offered to refund 25% off the per-mile charges. I politely refused, acknowledging again that I was liable for the charges per the EUA (really, who reads the travel companies’ EUA anyway?) explaining to her that my expectation for being a very long-term loyal customer of Avis is that Avis would forgive this one-time occurrence. She said no that she couldn’t and then proceeded to quote the EUA again (and no I’m not about to start reading travel companies’ EUA; I have no time for that. I will continue to “initial here, here and here, sign here”).

Somehow she thought I was still claiming that I was not responsible for the charges and offered 50% off so I could still pay for my part of being not-too-smart-for-not-reading-the-EUA (my words, not hers). I politely refused and I explained myself again: I was not looking for a discount, I was looking for Avis to show me they value their loyal customers and if this was Avis position (to fall back on the EUA for minor things) then I would be electing not to do business which Avis (why should I do business with such a company?). Guess what she said now? Exactly, she proceeded to quote from the EUA. After she finished quoting I thanked her for her time and wished her a great day. The call lasted less than 8 minutes.

After the call I emailed Avis customer service asking them what the process is to cancel my account. I didn’t include in the email the reasons why I want to cancel the account hoping someone from Avis would reach out, hear me out and tell me that I had a bad dream, yes Avis values customer loyalty over short-term profits, wash and vacuum the cars after each return, and they will reimburse the mile charges (I hate breakups so giving them one last chance). We’ll see what happens. They are supposed to reply to my email within 3-4 days (no, I didn’t get this information from the EUA J; I got it from their automatic email acknowledging they received my email).


Elver’s Opinion: Avis is choosing short-term profits (very SMALL short-term profits) in lieu of customer loyalty. Which is fine by me. That's not how I run my business with my customer but the Avis execs run their company however they think is best. I spend my money where I feel valued.

Friday, April 7, 2017

NSX Local Egress

Better late (VERY late) than never I suppose. Here is my last blog post on the DC DCIs and it involves the Local Egress feature of NSX. I don’t quite remember all I wrote on my earlier posts (which you can read here, here and here) since it has been a bit since I wrote them (and I for some reason can’t get into reading stuff I’ve written); but I believe I explained what happens to egress traffic to the DC and how it affects the bandwidth requirements.

In version 6.1 (or was it 6.2?) of NSX-v, VMware introduced something they are calling Cross vCenter NSX. The architecture consists of having multiple NSX Managers (one per vCenter) that can exchange some configuration information as well as share a common set of NSX Controllers. The goody part of it is that it allows you to have multiple vCenters managing different DCs while stretching a broadcast domain (using VXLAN) among all the DCs (up to 8 DCs). For this blog post we will assume the DCs are physical and have DCIs between them.

On the surface this is a cool feature (putting aside that stretching a broadcast domain over multiple locations violates the 11th commandment) since it allows all VTEPs (belonging to the same Universal Transport Zone) in all DCs to be fully meshed with each other. However, this by itself doesn’t address the problem of keeping local layer 3 egress traffic localize and without crossing the DCIs.

To illustrate the problem, please have a look at the diagram below.

Notice that the local copy of the Logical Router (it is an Universal Logical Router, which is functionally the same as the Global Logical Router) will have two paths to reach the intranet: one via the NSX Edge in DC1 and another over the NSX Edge over DC2. By default NSX Components are unaware of locality (location) and thus to the Logical Router (specifically, the LR Control VM that owns the OSPF/BGP adjacency the NSX Edges) both NSX Edges are equally good to reach external networks and it can choose either NSX Edge to forward the traffic to. There is a bit of more complexity here about route costs, but we’ll skip those for this blog post.

So to handle the layer 3 egress traffic from the Logical Router and keep it from going over the DCI, NSX introduces a method to provide location awareness to the routing table creation. And the method is quite simple: Assign an ID to all Universal Logical Router entities (the LR Control VM and ESXi hosts but NOT the NSX Edges) that you want to belong to the same “location”. Using this location ID, or Locale ID as VMware calls it (which is 128 bits long and defaults to the UUID of the NSX Manager – but it can be changed), the NSX Controller (the Universal NSX Controller since we are doing Cross vCenter NSX) will only populate the routing table to those ESXi hosts that have the same Locale ID as the LR Control VM, as shown in the figure below.
If you have been following up to this point then you noticed that we introduced more problems than we actually solved (thus far). In the diagram above we have two different problems: 1) Only ESXi hosts in DC1 would get the routing table since they have the same Locale ID as the LR Control VM and 2) The LR Control VM still sees the NSX Edges in both DCs.

To solve problem 1, NSX allows you to deploy a second LR Control VM from the other NSX Manager (up to 8 LR Control VMs, one per NSX Manager in the Cross vCenter NSX domain) and you assign that second LR Control VM the same Locale ID as the ESXi hosts in the DC2. The solution to problem 1 is shown in the diagram below.
To solve problem 2, it is necessary to prevent the LR Control VM in DC1 from forming routing adjacencies with the NSX Edge in DC2 (and vice-versa for our second LR Control VM in DC2). The solution involves creating a Logical Switch (Universal Logical Switch) per LR Control VM and connecting only entities (NSX Edge, LR Control VM) that reside in the same “location” to the Logical Switch, as shown in the diagram below.

Y ya. The combination of Locale ID, multiple LR Control VMs and a dedicated Logical Switch per “location” ensures that egress layer 3 traffic from one location doesn’t cross the DCI.

Elver’s Opinion: I don’t think I have an opinion today…actually I do (now that I think about it). Locale ID is a good solution that NSX-v introduces to handle local egress, but I think there is a missed opportunity here. VMware could allow the NSX Edge to be assigned a Locale ID (after all the LR Control VM IS a modified NSX Edge) and avoid the need to have a ULS per location. I’m not sure that I know the reason why they didn’t design the Locale ID solution in this way (or that they have a good one).

Monday, February 13, 2017

DC Egress Traffic with Stretched Layer 2

In the last blog I spent a lot of writing (and funky formulas) just to say that the DCI circuits between two Data Centers need to be larger than the biggest DC WAN link plus the inter-DC traffic (which increases your cost). When stretching layer 2 across DCs, there is not much that can be done to force DC ingress traffic to come in via the DC WAN link where the destination workload (VM) is running. However, there are some things you can do to force the egress traffic to go out the DC where the VM is located (and avoid using the DCI circuits) to reduce some of the cost associated with the DCI links. For this blog post I’m going to assume that we have Active/Active DCs.

Dual Default Gateways
When you stretch the layer 2 across the DC, the default getaway for the stretched layer 2 segments could be physically located in one of the two DCs. We don’t care about that use case (that would probably require a standalone blog post to talk about the cons of this design). Instead let’s assume that we have default gateway services in both DCs, and to provide redundancy, we will have two routers as default gateways in each DC, running FHRP (something like VRRP), as shown in the diagram.



In this design, the VM will forward traffic to its local default gateway, which in turn will forward the traffic out of it local DC WAN. For this design to work (1) there must be a mechanism to stop each pair of DC default gateways from seeing each other (otherwise you won’t get both pairs of FHRP routers to be Active with the same virtual MAC) and (2) to prevent a VM in one DC from receiving ARP replies for its default gateway from the other DC’s pair of default gateways. You could achieve this with Access List (too manual) or you could stretch the layer 2 with something like Cisco OTV, which has built-in mechanism (less manual) to isolate FHRP in each DC.

This design does have some potential issues that must be taken into account. If each pair of default gateways uses different virtual MAC addresses when replying to ARP, a VM that moves DCs will lose connectivity (until it re-ARPs for its default gateway). Also, if both members of a pair of default gateways go down, you may have to remove the FHRP isolation to allow the impacted VMs to reach the default gateways in the other DC.

Distributed Default Gateway (Top of Rack)
An alternative to dual default gateways is to stretch the layer 2 using VXLAN (or another tunneling protocol) from the Top of Rack (ToR). In this design all ToR will have the Layer 3 boundary and be the default gateways for their own racks. Every time the ToR gets an ARP request, the ToR will respond to it (and provide local FHRP isolation), as shown in the diagram below. Two examples of this are Arista’s DCI with VXLAN and VARP, Brocade’s IP Fabric with Anycast gateway (for the time being until Broadcom decides what to do with Brocade’s Network business).



One advantage of this design over the previous one: there are a lot more routers (the ToRs) acting as default gateways and built in FHRP isolation. If a ToR dies, only the rack where it resides will be impacted as opposed to the entire DC. Also since all ToR have the use the same virtual MAC, when a VM moves DCs, it continues to have uninterrupted Layer 3 connectivity. One disadvantage is that you would need to fiddle with route advertisements to ensure the ToRs forward traffic straight up the local DC WAN; this many not be as easily done as it sounds.

Side note: there is a variation of this design where the ToR are strictly layer 2 (let’s call them Leafs) and the distribution switches (henceforth Spines) do the Layer 3, thus providing the default gateway services.

Distributed Default Gateway (software)
Just like the physical version, you stretch the layer 2 using a tunneling protocol (like VXLAN or GENEVE) but you have a layer 3 process in each hypervisor that serves as the default gateway (e.g. virtual router). Each virtual router will have the same IP and virtual MAC (thus VMs can move between DCs at will) and locally respond to ARP requests. And like the physical version, you must manipulate routes to force each virtual router to send traffic to its local DC WAN, as shown in the diagram.



VMware’s NSX-v (distributed logical router) achieves this functionality. Each logical router is the “same” in each hypervisor except for their routing tables. Each logical router in each DC will get routes only relevant to it. This way, each logical router is “forced” to forward traffic using its local WAN.

Elver’s Opinion: This blog post should (mostly) conclude my thoughts on stretching Layer 2 across the DCI (think hard before doing it). At first I thought I would use this blog to also talk about local egress in NSX (to wrap up my thoughts on the matter), but as I wrote I realize I would need more space than I thought, so I’ll be writing another blog post just on local egress in NSX.

Friday, January 13, 2017

Impact of Stretched Layer 2 on DCI

I was not clear in my DC Ingress blog post as to why it matters which is the entry/exit point for flows coming from/going outside the DCs for the application that is using the stretched layer 2 in an infrastructure supporting BC with an Active/Active WAN architecture. One word can summarize why it matters: cost. The moment you allow traffic not sourced to/destined in the DCs to go over the links between the Data Centers, that link becomes a transit segment and you must increase its speed to accommodate the additional traffic.

Let me put back up the $1 diagram I used last time, but now showing the connection between the Data Centers (the Data Center Interconnect, or DCI).


The DCI between the two DCs needs to be big enough to handle all inter-DC traffic (traffic with source and destination of the DCs; doesn’t include transit traffic coming from/going outside the DCs). Lets call traffic from DC1 to DC2 DCI1 and DC2 to DC1 DCI2. The speed of your DC1 WAN circuit must be as big as the amount of ingress traffic in DC1. Same goes for DC2. If we call the DC ingress traffic DCi1 and DCi2, and we are not doing any sort of route manipulation, then some DCi1 traffic will transit the DCI to reach VMs in DC2 and some DCi2 traffic will transit the DCI to reach VMs in DC1.

Since we don’t know how much “some” is going to be, we should architect for worst-case scenario, like a WAN disruption changing flow patterns, or risk having some traffic dropped before it goes over the DCI. So this is how much traffic the DCI would have to handle:

If DCI1 + DCi1  DCI2 + DCi2 then DCI1 + DCi1, else DCI2 + DCi2


What this little formula says is that the speed of the DCI link must be as big as the larger of traffic from DC1 or from DC2 (I’m making the assumption the DCI is symmetrical; none of that asymmetrical bandwidth you get from your home ISP).

But this formula is not complete. You see, the VMs will be sending traffic back to the user (egress traffic). Let’s pretend the traffic flow goes back the same way the ingress traffic came (worst-case again, as we can't predict what would happen in the WAN). Using DCe1 to represent the VMs in DC1 replying back to the user and DCe2 to represent the VMs in DC2 replying back to the user, the formula becomes this:

If DCI1 + DCi1 + DCe1  DCI2 + DCi2 + DCe2 then DCI1 + DCi1 + DCe1, else DCI2 + DCi2 + DCe2
This formula is a bit long, so let’s do some thinking and see if we can simplify this. Since we are architecting for worst-case scenario and we are thinking BC, we can use the larger of DCi1 or DCi2 and call it DCiB. DCiB will be coming in one of the WAN circuits of the DCs. Let’s give the same treatment to DCe1 and DCe2, and call it DCeB. DCeB will be going out of one of the WAN circuits in the DCs.

Elver’s Opinion: Since flow patterns are never static, it is a good idea to make the WAN circuits in both DCs the same size, and be the larger of DCiB or DCeB.

For sizing our DCI, we actually care about the larger of DCiB or DCeB; let’s call it DCB. The reason for this is that in the event of WAN failure at DC2 all ingress traffic comes in DC1 and transits over the DCI to DC2, and all egress traffic will go from DC2 and transit the DCI to DC1 (and the following week, flow patterns reverse). This allows us to replace DCiX + DCeX for DCB.

We now make some substitutions to get this:

If DCI1 + DCB  DCI2 + DCB then DCI1 + DCB, else DCI2 + DCB

Which can be rewritten as:

If DCI1  DCI2 then DCI1 + DCB, else DCI2 + DCB

All of this writing and formulas just to say that the DCI speed must be at least as big as your largest DC WAN circuit plus the largest inter-DC traffic. Or put another way, the DCI circuit speed will be the inter-DC traffic plus the transit traffic…and transit traffic we established is the ingress/egress traffic in support of the application that is using the stretched layer 2.

The higher the speed of the DCI circuit(s), the higher the cost. It might not be as obvious, but the higher cost is not just for the actual circuit. It is also for the hardware that is needed at both ends of the circuit to support it and the intra-DC hardware required to support any other higher-speed links that will have to carry the transit traffic.

I’ll write another post to discuss how to minimize the egress traffic becoming DCI transit traffic. It is quite straightforward nowadays to accomplish, with most major network vendors providing solutions for it. I will give special placement to NSX, as it has to achieve it doing something different from what the other vendors do.

Elver’s Opinion: Yes there are traffic pattern schemes that would leverage a smaller size DCI than the last formula above. However those cases don’t occur much in the wild when you are tasked to provide an infrastructure that supports BC with Active/Active WAN and stretched layer 2 for applications.

Wednesday, January 11, 2017

DC Ingress Traffic with Stretched Layer 2

Thank you 1970s for giving us two great things: yours truly and TCP/IP. One thing TCP/IP assumes is that a subnet resides in a single location (you only have one gateway, and it must reside somewhere). However, developers love(d) to code so their application components reside in the same subnet (and same layer 2 so they don’t have to worry about default gateways and what not).

During DR (Disaster Recovery) scenarios it was typical to migrate an application to the backup DC without re-IPing it. So far so good; subnet still resides in “one location” at a time. However, DR evolved to BC (Business Continuity - think about it, why drop a bunch of money on gear, space, and such not to use it?) and Active/Active DCs, and our good friends the developers decided to make it an infrastructure technical requirement to stretch the layer 2 their applications were using across multiple DCs (heaven forbid they would re-architect their applications or that you suggest GLB). TCP/IP is not happy. Elver neither.

All this presents a problem an opportunity to network designers. It is probably better to first illustrate it, and then explain it.


In the diagram, a user wants to reach the presentation layer of some application that serves requests out of two DCs. If the user wants to reach a VM that happens to be in DC2, there is no native way for the network to know where the VM resides and thus forward the traffic directly to DC2. It is a 50/50 chance of which DC will receive (ingress) the traffic (more on that below). This is because the network knows about subnets, not individual IPs. When a router does a lookup in its routing table to decide the next hop for a packet in transit, it looks for the smallest subnet in its routing table that matches the destination IP. If the router has two or more next hops as options for the matching subnet, it would select one (mostly based on some hashing of the header of the packet in transit) and forwards the packet to the selected next hop.

If the user happens to be “closer” to DC2 than DC1, then it is most likely that the user’s traffic will ingress via DC2. However, “closer” is not about physical proximity but about network path cost and other variables. Also, the network is not a static entity; there are changes happening frequently enough that may affect the “closeness” of the user to the DC/VM.

Why am I telling you all this? Because recently I got into a lively conversation while discussing x-vCenter NSX. x-vCenter NSX allows for layer 2 to be stretched across multiple DCs while providing gateway/FHR (Fist Hop Router/Routing). There is nothing in NSX that can force the user’s ingress traffic via the DC where the destination VM is. If anyone ever tells you otherwise, whatever solution they provide is not unique to NSX but rather a general networking trick.

So what are those networking tricks? Here are some (not all-inclusive) of them with their potential impacts:

Active/Passive Ingress – Allow the layer 2 to be stretched across both DCs, but advertise the subnet out of only one of the two DCs. If this feels like cheating, it is because it is cheating. You only solve the ingress problem for some of the VMs, and not the others. You also really don’t have BC here because in case of the “Active” DC going down, some intervention will be required to advertise the subnet out of the “Passive” DC; there will be an outage for the application.

Active/"Active" Ingress – Here you advertise the subnet out of both DCs, but you make one DC look “really farther away” than the other by manipulating the cost of the subnet in the routing protocol (like BGP AS pre-pending). You would have BC since network failover is automated, but again there is cheating here because you are (mostly) solving the problem for some of the VMs and not the others. Also you could have users that are “so close” to the "backup" Active DC that no feasible amount of cost manipulation would affect them.

Advertise Host Routes – There is nothing that prevents the turning a VM’s IP into a /32 subnet and injecting that into the routing process. You can achieve this by adding a static route for each VM IP (/32) in the presentation layer and redistributing the routes into the routing process. Since you can’t get a subnet that is more specific than /32, there would never be a router (outside the DCs) with two equal-costs paths to the /32 pointing to different DCs. You truly get ingress traffic to the DC where the destination VM is. But before I continue explaining this one, let me just note that the burning sensation you are feeling right now on the back of your neck is the Operations Manager giving you the evil look. With this solution you SUBSTANTIALLY increase the size of the routing table and complexity in the network. And this solution breaks down when a VM changes DCs as there is no automated way where the /32 is being injected into he routing table.

Cisco LISP – To wrap it up it is worth mentioning Cisco LISP (Locator ID Separation Protocol). LISP attempts to solve the ingress situation by leveraging the /32 trick but restricting where the /32 are sent. The idea is to create a network “bubble” around the DCs and place LISP routers at the edge of the bubble. All users must reside outside of the bubble so all ingress traffic goes through the LISP routers. The LISP routers in term communicate directly with the FHR with the subnet in question (the stretched layer 2 in both DCs) to find out where each VM (IP) resides. When the user traffic reaches the LISP router, the LISP router looks up where the destination IP is located and forwards the traffic to the FHR (via a tunnel). If a VM moves DCs, the FHRs would update the LISP routers with the new VM (IP) location. The problem with this solution is the bubble. Where do you place the LISP routers? and what do you do in a brownfield deployment? It can get expensive and very complicated to achieve.

Elver’s Opinion: As Developers continue to better understand the impact to infrastructure of their design decisions (DevOps), they are building applications that work within the constraints of infrastructure protocols (Cloud Native Apps). So the need to stretch layer 2 across DCs is becoming less and less of an infrastructure technical requirement.

Monday, November 7, 2016

vRNI - Initial Setup


We now have vRNI installed and ready to go. The first thing you probably want to do is to change the default passwords. You can either setup LADP/AD or create a local admin account to login to the Platform. Either way, you want to not have to use the default admin@local account.

To setup AD or create local user account, scoop over to the top right of the screen to click the cog and choose settings (where we will spend most of the time in this post). I didn’t get around seting up my LDAP server so I’ll be skipping that part (and you can always google how to configure a LDAP server if you don’t already know). So I just created a new user (where it says User Management), elver@piratas.caribe, and gave it a role of administrator (the user must be in the form of an email address). I then log off admin@local and re-logged in with the new user. Returning to User Management you now have the option of deleting the admin@local account.


Elver’s Opinion: You also want to change the CLI user password but I couldn’t figure out how to do it. I reached out to some folks at VMware and will put an update here once I hear back from them.

Next you want to add some Data Sources. vRNI’s purpose for its existence is to gather data from different Data Center infrastructure entities, such as vCenter, NSX Manager (the main vRNI selling point), physical servers and network devices (another vRNI selling point) and do some wizardly on that data. Collectively these guys are referred to as Data Sources. Two Data Sources you really want to add are vCenter and NSX Manager. There does not seem to be a limit of how many of each you can add, however every NSX Manager must be linked to an existing vCenter Data Source (so vCenter must always be added first).

When adding a Data Source you select the type of Data Source you want and then populate the required fields. For vCenter, you must provide:

  • The vRNI Proxy to use (if the Platform has two or more Proxies associated with it. More on that in a future post)
  • The IP or FQDN of vCenter
  • Admin credentials for vCenter


Once vRNI validates it can authenticate with vCenter, you have the option to enable IPFIX (or Netflow if you prefer to use Cisco’s terminology) in any vDS that exists in vCenter. If you do enable IPFIX in the vDS, you will have an option to enable it per dvPortgroup. Then give your vCenter a vRNI nickname and save it (submit). Btw, enabling IPFIX will cause vRNI to configure IPFIX for you in the vDS using the Proxy’s IP as the collector. If your proxy is behind a NAT, you will need to go to vCenter, and manually edit the collector’s IP to the NATted IP AND punch a hole in the NAT router to allow IPFIX traffic to get back to the Proxy (UDP default port 2055)

Elver’s Opinion: Be careful with enabling IPFIX/Netflow in a production environment as it may tax the ESXi hosts. Only enable it if there is business value in doing so AND your ESXi hosts are not currently burdened with production workloads.

The steps to add NSX Manager are similar to those of vCenter’s but you need to select the vCenter that is associated with NSX Manager (otherwise how would vRNI correlate NSX Manager’s data with that of vCenter’s?). In addition, you can have vRNI to connect to the NSX Controllers to collect control plane data from it and to the NSX Edges (directly via SSH to the NSX Edges or via NSX Manager’s central CLI, which requires NSX 6.2).

Elver’s Opinion: I added a Brocade VDX as a source but I couldn’t get SNMP collection to work. Seriously, it is SNMP; that should work just because. I’ll keep trying and put up something in a future post if I’m successful. I’m also going to add my UCS once I get my mobile AC up and running in the server room.

And speaking of data, what exactly if vRNI collecting from vCenter? For starters, it collects a list of all VMs under vCenter’s management as well as compute, storage, VM placement (what host/cluster the VM is) and network information (basically the same info you get when using vCenter’s native monitor’s view). From NSX Manager, it collects info such as what Logical Switches the VM’s connect to and who is their default gateway (this is where the NSX Manager to vCenter correlation comes in).


Now the last paragraph is no reason to go buy vRNI. Hell, there are a million and one tools/collectors that can do this, many that are free or low cost. However, what vRNI can do (enter Platform) is correlate all the data and events collected from all the sources that would in the pass take operations team hours to do (which is why the Platform appliance has such a BIG CPU/Memory footprint). It has built in modules that can link vCenter and NSX data, and present nice pictures and charts to help identify problems in the environment (in particular, the network infrastructure). This is a time saver (and for a business, higher uptime with less negative reputational/financial impact).

I’ll see about writing the next post on how to use some of the operations and troubleshooting goodies of vRNI. I can’t promise when I will get around to do it, but I do promise that I will.

Elver's Opinion: Do you see the Topology chart in the last picture? I don't like it. It is a poor attempt to put unrelated information (storage, network, hosts, etc...) for the VM into one picture. Luckily , you can drag charts around and move them somewhere where they bother you less.