Thursday, April 21, 2016

Software iSCSI and LACP - They Like Each Other

Every now and again I’m end up recommending LACP for a Software iSCSI deployment in a vSphere environment. And every now and then I get pushed back because VMware recommends that LACP not be used, ahem, discourages the use of LACP to carry iSCSI traffic. And every time I have to explain that we need to read the fine print before we take a vendor’s recommendation against the use of a widely used protocol and just run with it.

Everything else being equal, iSCSI will provide more efficient load sharing among two or more links between the Software iSCSI initiator with MPIO and a physical switch than LACP between the vDS and the physical switch. LACP will use at most the VLAN and frame information (L2/L3/L4) to define a “flow”, and pin that "flow" to one of the links in the Port Channel. MPIO uses application (iSCSI) information, such as connection IDs, to determine the link to send the egress iSCSI traffic, which provides more granularity on what a “flow” is (as compared to LACP). MPIO will then pin that narrower-defined (ßis that even a word?) “flow” to one of the links in the Port Channel.

But how do we we know when "Everything else" is not "equal"? To help decide, we should get additional information on the following:

Can LACP even be configured?
If no, carry on; nothing to see here. Chassis based servers (server blades) don’t support LACP to the blades. If yes, then read on.

What type of performance does the initiator require?
If you have a situation where the initiator needs higher throughput capacity than what is available over a single link, you probably want to go with MPIO, as it will load share the iSCSI traffic more efficiently than LACP. This will reduce the risk of sending traffic over an over-subscribed link and having the traffic dropped.

How fast must failover take place in case of link failure?
With MPIO, the initiator will fail the affected egress iSCSI traffic immediately upon detecting the link down. However the physical switch will not failover the traffic until it re-learns the initiator’s iSCSI MAC address from the failed link over one of the remaining links. Thus the physical switch has a direct dependency on the initiator. Contrast that with LACP, where both the initiator and the physical switch will do the failover immediately and independent of each other.

Elver’s Opinion: You should have some sort of business guidance on how many link failures should be tolerated in your uplinks (between the initiator and the first Physical switch). It is not unusual for a 2:1 rule to be applied; have twice the number of links you need so you can tolerate half the links failing.

Does the initiator have sufficient NICs to dedicate to iSCSI?
MPIO is like a 2-year old (and some adults I know): it does not share with others. Thus to use MPIO, you must have dedicated NICs for it. All other Ethernet traffic must use other NICs. If you don’t have sufficient NICs, you must share and should use LACP. If you want to add the additional NICs, you need to analyze the cost, and the level-of-effort to do so, vs. the reward. From experience, odds would tend to favor not getting the extra NICs.

Elver’s Opinion: LACP is not really required here but if you don’t configure it, the initiator would have less hashing options for load sharing all Ethernet traffic (iSCSI and non-iSCSI) among the available uplinks, and the physical switch will not do ANY load sharing of its own.

Does the Physical switch have sufficient ports to dedicate to the iSCSI?
If you need to add port capacity in the switch, you might need to weight the cost and level-of-effort required to add that capacity. Many times, it won’t be as simple as just replacing the switch or adding a new one.

How many Arrays (IPs) would the initiator talk to?
The more Arrays the initiator has to communicate with, the closer in load sharing performance LACP will get to MPIO. If you have a single Array (IP), LACP will see a single flow between the initiator and the target (unless the session drops and gets reestablished with a different TCP source port). LACP can be configured to use the destination IP as the load sharing hash, and the more there are, the better distribution over the uplinks.

Elver’s Opinion: Note of caution here. For the hash to do the best load sharing job it can, the IPs must have some sort of variance in the last octet, which is related to the number of active links in the Port Channel. If there are only two links in the Port Channel, you should try to get a similar number of even-numbered and odd-numbered Array IPs.

It looks like a tie, so which should we choose?

Ask Operations. They are the ones that will wake up at 2am to fix problems. The DC trend has been to consolidate as much as possible to maximize the use of physical resources. However, not all Enterprises have had their Operations teams update processes and knowledge transfer to take on a change in direction. Choosing the "wrong" one here may increase OpEx to the business.

Wednesday, April 20, 2016

A vCenter, NSX Manager, Multiple DCs...and I Can't Reach Them

Note: I wrote this post in somewhat of a rush and I didn’t have time to do diagrams. I have had a bit of a hiatus and I wanted to add something to the blog but I'll update the content with the diagrams at a later date. The diagrams have been added. Also, a question asked by a good friend inspired this post.

You conceded the point and your team will be using a single vCenter to manage multiple physical Data Centers. All right, not the end of the world, you’ll be fine. But a few developers are requiring Layer 2 across some of those Data Centers. Again, not the end of the World; besides that’s what NSX is for. However, do you understand what the impact to the Virtual Workloads in the stretched Layer 2 (VXLAN) would be if one of those physical Data Centers loses network connection to the Management Plane (vCenter, NSX Manager, etc…) and the Control Plane (NSX Controllers, Logical Router Control VMs, etc…)? To keep the topic to the network impact, we will assume that Virtual Workloads are using Storage local to their Data Center.

Figure 1: DC isolation

Elver’s Opinion: Since we have Logical Switches distributed among multiple physical Data Centers, I’ll make the very safe assumption that you won’t be doing Layer 2 Bridging. If you are trying to do Layer 2 Bridging, call me so I can talk you out of it.

To get the obvious out of the way, if you don’t have the Management Plane, you can forget about vMotion, Storage vMotion and any NSX Configuration changes.

Let’s tackle the slightly not so obvious within Layer 2. Virtual Workloads within the impacted Data Centers, and in the same Logical Switch, will be able to talk to each other via VXLAN (Overlay) as well as to other Virtual Workloads running in VTEPs within other Data Centers they can reach. NSX is built such that the VTEPs (ESXi hosts) will continue to communicate with each other in the event the NSX Controllers are not reachable. There will be an uptick of Broadcasts (specifically ARP Requests) and Unknown Unicast traffic being replicated by the VTEPs, but the uptick shouldn’t be much impacting. At the Control Plane, assuming the NSX Controllers are still operational, they will remove the “isolated” VTEPs, and their associated entries, from all their tables (Connection, VTEP, MAC, ARP) and tell the “reachable” VTEPs to remove the “isolated” VTEPs from their VTEP Tables.

Figure 2: Inter-Logical Switch

If two Virtual Workloads within the impacted Data Centers are in different Logical Switches (VXLANs) and those Logical Switches connect to the same Logical Router, the Virtual Workloads will be able to talk to each other; from the Logical Router’s perspective both subnets are directly connected.

Figure 3: Intra-Logical Switch - Same Logical Router

The not so obvious (because of the depends involved) is the impact on Layer 3 traffic that does not stay confined within the same Logical Router. The impact can be narrowed down to two types of traffic flows. One where the Source and Destination Workloads are hanging off different Logical Routers as their default gateways, and the other where the second Workload is not connected to a Logical Switch (think a Physical Workload or a Virtual Machine in a VLAN):

Elver’s Opinion: Type 2 flows are basically Layer 3 traffic between Virtual and Physical networks.

Type 1:
If the Source and Destination Workloads are hanging off different Logical Routers then you need an NSX Edge or another NFV Appliance to do routing (two Logical Routers can’t connect to the same Logical Switch nor same dvPortgroup). Is this Appliance within the impacted Data Centers? If not, the two Workloads won’t be able to talk to each other because there would be no logical path for the flow to reach the Appliance so that it can do the routing (remember the impacted Data Centers have some sort of “isolation”).

Figure 4: Intra-Logical Switch - Different Logical Routers

If the Appliance is within the impacted Data Center, then the two Workloads may reach each other. I say may because it all depends on whether there is a routing protocol between the Logical Routers and the Appliance. If you are using static routes, then yes the two Workloads can talk to each other. But if you are running a routing protocol, can the Logical Routers’ Control VMs reach the Appliance to exchange routing control traffic? If the Appliance lost connection to one or both of the Logical Routers’ Control VMs, then the Appliance will remove the routes to the Workload’s subnets from its routing table, thus making that subnet unreachable to itself.

While still in Type 1 flow, it is worthwhile to point out that if two Workloads in impacted Data Centers but in different Logical Routers can talk to each other, then Virtual Workloads in non-impacted Data Centers but the same Logical Routers will NOT be able to communicate because they won’t be able to reach the NSX Edge or NFV Appliance.

Type 2:
If the second Workload is not connected to a Logical Switch (Physical Workload or VM in a VLAN), we definitely need a Perimeter Edge or an NFV Appliance with a NFV connection to a VLAN dvPortgroup. We will assume that we are running a routing protocol. It is similar to the Layer 3 Type 1 flow but with a few variants.

Figure 5: Virtual to Physical

Variant 1: The Appliance can reach the Logical Router Control VM AND the second Workload is in one of the impacted Data Centers. In this instance communication between the Workloads will happen. However a Virtual Workload hanging off the same Logical Router but not within the impacted Data Centers will NOT be able to talk to the second Workload because the Appliance wouldn’t be reachable to it.

Variant 2: The Appliance can reach the Logical Router Control VM AND the second Workload is not in one of the impacted Data Centers. In this case the two won’t be able to communicate; remember the impacted Data Centers are “isolated” thus no traffic can come in or go out.

Variant 3: The Appliance can’t reach the Logical Router Control VM. It doesn’t matter where the second Workload is because the Appliance will remove the Virtual Workload’s network, thus making that subnet unreachable to itself. Thus the Workloads won’t be able to talk to each other. However, if you are using static routes, refer back to variants 1 and 2.

Variant 4: The Appliance is not in the impacted Data Centers. In this case there is no way for Logical Router to reach the Appliance, thus the Workloads won’t be able to talk to each other.

To wrap it up, please note that if the Perimeter Edge or NFV Appliance is located in one of the impacted Data Centers, no Virtual to Physical network traffic in the non-impacted Data Centers will be possible. 

Saturday, February 6, 2016

10 Network Experts in a Room, 11 Opinions on SDN

I’m shocked, shocked I tell you (I’m not actually). In a room full of Network Experts (most of them bloggers, some whom I know and respect), there was disagreement on what Software Defined Network (SDN) means; I recommend you watch the video. I landed on the video via Ivan Pepelnjak, who referenced John Herbert, who linked to the video. I have earlier written a piece about what SDN is and is not (if you have not yet read the twelfth opinion).

The participants in the room did not project to fully understanding the forces pushing for SDN in the Data Center as their opinions seem to be very Network centric. So why SDN? The primary driver behind SDN is that traditional Networks are the last component of the Data Center Infrastructure (Compute, Network and Storage) that continues to hinder Application and Business Services evolution. All Data Center workloads will be virtualized and there is no way to get an efficient Cloud Infrastructure deployed, to support the Virtualized Workloads, with a Traditional Network. And Good luck with having a zero-touch Disaster Recovery Plan with just a Traditional Network.

Many people think of the Cloud as a way to run your Virtualized Workloads in someone else’s servers and disks. Although there is some truth to that (and leaving the distinction between Private and Public Clouds aside), it is my opinion we should look at the Cloud from the perspective of the Clients (both Public and Private sectors, henceforth Business). The Client’s job boils down to do either 1) Grow the metric by which the Business measures success (such as revenue or service delivery) or 2) Increase Business efficiency (such as reducing expenses or providing more services with the same or less). It is the same job the Client has been doing since, well, always. The Cloud is a Client tool that facilitates the successful execution of the Client’s job by reducing the perceived technology complexity of the IT Infrastructure while improving its efficiency and minimizing time-to-market of Business Services.

You know what has little perceived technology complexity in the Data Center? Compute and Storage (Compute more so than Storage, but the gap is rapidly narrowing). You know what is really efficient in the Data Center today? Compute and Storage. And you know what offers minimal time-to-market of Business Services? You are correct! Compute and Storage.

SDN allows the removal of the perceived Traditional Network complexity by facilitating the deployments of networks that do not have some of the inherited limitations of Traditional Network. For example, there is no way to get a Layer 2 loop in a SDN solution, thus you won’t have to worry about configuring STP within the SDN. Have you heard of an OSPF Virtual Link? You will never see that in a SDN because the idea of a broken OSPF Backbone just doesn’t exist.

Since I mentioned STP, let me use it to also point out that in a SDN without STP you have way more efficient utilization of your links. Better yet, if you have Virtual Workloads are in the same Hypervisor, traffic between those Virtual Workloads will get all of its Network Services without leaving the Hypervisor. It is hard to see how you get more Network-efficient than that. If the workloads are physical and they connect to the same SDN Top of Rack (ToR), then the traffic between those workloads would get all the Network Services in the same ToR.

As for time-to-market, all I need to say is this: deploy 50 identical networks consisting in 20 routers, 87 Layer 2 domains and allow internal traffic to reach the same external entity. GO. That’s right, it would take you months with a Traditional Network to accomplish this task. With SDN, minutes. Seriously, minutes.

Disaster Recovery
Many people still think of a DR (not to be confused for Dominican Republic) event as a sinkhole swallowing the primary Data Center (or if the Data Center is in Florida, we add to the list a hurricane dropping your Data Center in the sea). Thinking about it this way omits the Client’s perspective. To the Client a DR event may just be not having a Business Critical Service available. The reason for that Business Critical Service not being available could be as simple as a Tier in the Application becoming unreachable due to the Tier's default gateway having a tantrum.

TCP/IP was developed with the assumption that a subnet resides in a single physical location. If you want the subnet to be reachable via two or more physical locations, about the only tool you had at your disposal is to stretch the Layer 2 where the subnet resides (I will not discuss the alternative of using BGP to advertise the subnet from multiple locations as the point below would be the same).

Traditional Networks do not allow you to automatically migrate subnets. So when a workload needs to be moved to recovered at a new location (such as the Backup Data Center), almost always the RTO includes involvement of a Network Administrator; which involves a Change Control; which potentially leads to some red tape; which adds more time to the RTO. SDN does not really natively understand what a location is and therefore can facilitate for automatic recovery of about any DR situation the business may encounter.

For example, a hypervisor’s number came up, taking down with it all the Business Critical Virtual Workloads (bad design by the way, but it happens). The Virtualization Plane would then recover all those Virtual Workloads in new hypervisors. By the time those Virtual Workloads are fully powered up, the SDN solution would have updated all necessary components in the hypervisors with the necessary Network information to provide the Network Services to those Virtual Workloads. SDN doesn’t care where those hypervisors are located and not a single Network Administrator’s involvement was required (nor change controls). That translate to potentially much lower RTO.

Circling back to the default gateway having a tantrum, let's assume a design where not all members of the same Tier run in the same hypervisor/off the same ToR (you shouldn't place all your eggs in one basket). SDN has a distributed Data Plane among all hypervisors/ToRs, and a default gateway having a tantrum will only affect the members of the Tier that are running in the hypervisor/off the same ToR where the faulty default gateway resides. All other Tier members will continue to receive normal Network Services and the Client never even experiences a DR event.

Another benefit of SDN, as it relates to DR, is that SDN allows you to rethink the architecture to support DR events. Since SDN does not have native location awareness, there are no such thing as Primary and Backup Data Centers. They are just Data Centers in Active/Active configurations. Heck, you can extend this idea to multiple Data Centers and have them all be Active. This can potentially translate to smaller Data Center footprints (square meters/feet) with increased geographical resiliency.

Elver’s Opinion: We can't answer where we are going unless we understand where we are coming from. And to understand that we need feedback from others outside of our field. I feel the conversation among the smart people in the video about what is SDN could’ve benefited by having more Virtualization and Storage (and some Application Developers too) Experts in the room.

Tuesday, February 2, 2016

Virtual SAN Node Communication

Our Network Security Administrator gave us some guidance on how to architect the Physical Infrastructure to provide Network Security for Virtual SAN. What she has not yet clarified for us is which Virtual SAN node talks to which Virtual SAN node, and over which IPs, Multicast Groups and ports.

Each Virtual SAN node has one of three roles: Master, Backup and Agent. Among the Master’s responsibilities is disseminating Virtual SAN Metadata (the Virtual SAN Datastore). The Backup keeps an eye on the Master and is ready to take over if the Master becomes unavailable. Every other Virtual SAN node in the same Virtual SAN cluster is an Agent.

So which Virtual SAN nodes chat with others Virtual SAN nodes? VMware has provided some of that information but the information is not granular enough to deploy an effective Network Security policy. Thus, without further ado, here is what the Network Security Administrator discovered while doing some packet captures of Virtual SAN traffic with no powered on Virtual Machines (think of this as Virtual SAN Control Plane traffic).

The Master communicates with every other member of the same Virtual SAN cluster. One of the communications takes place over Unicast. The Master will send Unicast traffic to all Agents and the Backup over TCP port 2233. The interesting part here is that the Master has two Unicast flows going with each other node. One flow is initiated by the Master using the destination TCP port 2233. Other nodes, also using the destination TCP port 2233, initiate the other Unicast flow towards the Master.

Below is some output of two Unicast flows between the same two nodes. Both egress packets were captured in the Master.

The Master also communicates with everyone (Agents and Backup) using the Agent Group Multicast Address with a destination UDP port of 23451. This communication is used to update Metadata with all Virtual SAN cluster members. The default Agent Group Multicast Group is Both the Agent Group MG and destination UDP port can be changed in the cli using the command:

The Master listens to both the Agent Group MG and the Master Group MG.

The majority of the Backup Unicast communication takes place with the Master, although some infrequent Unicast traffic can flow between the Backup and some Agents. The Backup communicates with the Master using the dual-flow Unicasts using TCP port 2233.

The Backup also communicates with everyone (Agents and Master) using the Agent Group Multicast Address with a destination UDP port of 23451. Minor detail: the Backup is also kind of an Agent.

The Backup also communicates with the Master using the Master Group Multicast Address with a destination UDP port of 12345. The default Master Group Multicast Group is Both the Master Group MG and destination UDP port can be changed in the cli using the command:

The Backup listens to both the Agent Group MG and the Master Group MG.

The Agents communicate via Unicast with both the Master and the Backup, although the communication with the Backup is infrequent.

The Agents will send updates to the Agent Group Multicast Address with a destination port of 23451.

The Agents will listen to the Agent Group MG.

Powered On Virtual Machines
When Virtual Machines are powered on in the Virtual SAN Datastore (think Virtual SAN Data Plane), the above traffic flows continue (the Control Plane doesn't stop). In addition, all members of the Virtual SAN cluster that have a powered on Virtual Machines in the Virtual SAN Datastore will send Unicast packets, on source TCP port 2233, to all other nodes that have a copy of the Virtual Machines' VMDKs.

Thus from a security perspective, every Virtual SAN node in the same Virtual SAN cluster is expected, at some point, to communicate via Unicast on TCP port 2233 with all other Virtual SAN nodes.

Elver's Opinion: The biggest security risk with Virtual SAN is unauthorized access to the Virtual SAN Datastore. Since access to the Virtual SAN Datastore takes place over TPC 2233, blocking that port from unauthorized end-points will substantialy help reduce the risk of a breach. At a minimum, 1) the Virtual SAN Layer 2 domains should have direct connectivity locked down to only the Virtual SAN VMkernel port of the Virtual SAN nodes in the same Virtual SAN cluster and 2) an Access List restricting flows over TCP 2233 should be placed in the Virtual SAN segments' default gateways.

Sunday, January 31, 2016

Virtual SAN Network Security

In our last episode we left our Security Administrator simmering in the fact that Virtual SAN is like many other Infrastructure-Consumer Solutions that rely on the Network Infrastructure to provide Security. Luckily she has seen this movie before. Matter of fact, it used to be the norm that the Physical Network would provide almost all of the Security for Applications. Thus there are well known ways to provide Network Security for Virtual SAN.

As a reference note, it is hard to separate the roles/services of Networking and Security when talking about the physical IP infrastructure. For shortness (and convenience) the physical IP infrastructure is referred as the Physical Network, and the Physical Network provides both roles/services to Infrastructure-Consumer Solutions. As a second reference note, anything that uses the Network's Data Plane is considered an Application. So Virtual SAN is an Application from the Physical Network perspective...and so is iSCSI, NFS, FCoE...

...and now back to the our regular programming...

Before you can provide any type of Network Security we need to enlist the help of a Network Administrator and you need to understand the Application's behavior.

In this instance our Security Administrator is already well-versed in the matters of Networking (yes, she is that talented and for that her job title shall henceforth be Network Security Administrator), so we are good there. As for our Application, Virtual SAN, we know the following:

1) It uses a combination of IP Unicast and IP Multicast to communicate.
2) The Application works across Layer 3 boundaries.
3) The Application end points (the Virtual SAN nodes) are known.
4) The Application uses known port numbers.

So using this information, below is the Network Security Administrator's recommendation for providing Network Security for Virtual SAN with the goal of restricting unauthorized access to the Virtual SAN Datastore.

There are two strategic concepts for executing Network Security over Layer 3: route obscurity and end-point access restrictions. Route obscurity is achieved by limiting the number of routers that know of an end-point's subnet. If a router has no path to a destination, the packets for that destination will be dropped. End-point access restriction is achieved with your old fashioned Access List (or its few derivates). Identify whom end-points need to chat with each other and on what ports, and only allow those conversations.

Elver's Opinion: If you can achieve 100% route obscurity then you may not need to employ end-point access restrictions. However, it is VERY challenging to obtain 100% route obscurity...unless you don't need to talk to anyone outside of your subnet.

Layer 2 Only Deployment
Virtual SAN was originally designed to only work in Layer 2 (meaning, you couldn't get a VMware supported deployment of Virtual SAN over Layer 3 boundaries). Thus in this deployment all Virtual SAN nodes will have the Virtual SAN VMkernel interface in the same subnet/VLAN (which should not be VLAN 1). There is absolutely no need for the Virtual SAN nodes to communicate over the Virtual SAN VMkernel interface with an entity not in the Virtual SAN VLAN. Thus the only security considerations you need to cover for are these:

1) Ensure that only ESXi hosts that are part of the Virtual SAN cluster have access to the Virtual SAN VLAN.
2) Limit the diameter of the Virtual SAN VLAN to those Access switches that have Virtual SAN ESXi hosts connected and any Distribution switches providing a Data Path between the Access switches.

Route Obscurity
The Virtual SAN subnet does not need to be advertised to the rest of the Network. A subnet that is not advertised to the rest of the Network is a called an isolated subnet. Isolated subnets are the easiest to protect. It is not possible to reach an end-point, via IP, in an isolated subnet if you are not already part of the isolated subnet.

End-Point Access Restriction
You should restrict connectivity to the Virtual SAN VLAN to only the ESXi hosts that are part of the Virtual SAN cluster. Since most Data Center switches will have their interfaces in a default VLAN different from the Virtual SAN VLAN, this becomes an Operations exercise with some regular audits.

If you want to get really protective about it, you could use MAC Access Lists as well. However, experience has demonstrated that managing MAC Access Lists is very cumbersome and is not worth the trade off when compared to the added level of security that you get in return.

You may also be tempted to create Community Private VLANS. Elver's Opinion: Don't do it. If you lock down connectivity to the Virtual VSAN VLAN to only those ESXi hosts in the Virtual SAN cluster, Private VLANs will buy you nothing. Absolutely nothing...You would get more value out of pounding sand. Plus, you would be adding unnecessary complexity to the deployment.

Layer 3 Deployment
With Virtual SAN 6.0, you can deploy multiple ESXi hosts in the same Virtual SAN Cluster in different Virtual SAN subnets. This is a bit more involved compared to a Layer 2 only deployment, however it is doable and manageable as long as some planing is put in place first.

Route Obscurity
Each Virtual VSAN subnet, per Virtual SAN cluster, must be identified (you don't want to be making Virtual SAN related Network and Security changes every 3 months). VMware only supports the Virtual SAN VMkernel port in the Default TCP/IP stack, which is used by the Management VMkernel interface.

To ensure egress traffic goes out the correct VMkernel interface, create static routes to the Virtual SAN subnets (in each Virtual SAN node) pointing out to the Virtual SAN subnet's default gateway. You should NOT place the Virtual SAN VMkernel port in the same subnet as the Management VMkernel port. Would it work if you place them in the same subnet? Of course it would much as your New Year resolution. It'll be fun for a bit but then it would get old fast and the reality of ensuring Security is not breached between entities in the same subnet that shouldn't talk to each other will then sink is not worth it.

If you only have a single router (with multiple interfaces) acting as the default gateway for all Virtual SAN subnets, then you have nothing to advertise as the Virtual SAN subnets would be unreachable to anyone without a direct connection to one of the router subnets (similar to a Layer 2 Only Deployment).

However if two or more routers are the default gateways (not counting the VRRP scenario), then you have to figure out how to announce to all those routers (and the routers in between on a need to know basis) how to reach the other Virtual SAN subnets. Depending on the Physical Network layout, you may need to advertise the Virtual SAN subnets via a routing protocol. In such a case, consider using a routing protocol (such as BGP) that supports route filtering and use those filters to advertise the Virtual SAN subnets to only those routers that need them.

If you have a small number of routers, consider using static routes in the routers to reach the Virtual SAN subnets. With static routes you can manually control which routers have a Data Path to the Virtual SAN subnets.

Alternatively, if the routers support VRFs, create a Virtual SAN VRF. Today almost all routers (and Layer 3 switches) that are deployed in the Data Center support VRFs. Make the Virtual SAN default gateways CEs/PEs, and restrict the Virtual SAN VRF to the smallest feasible diameter within the confines of the Data Center.

End-Point Access Restriction
Since we are aware of the Virtual SAN subnets, it is relatively straightforward to provide some end-point security for the Virtual SAN nodes. In the Virtual SAN default gateways, put Access Lists only permitting traffic sourced from the Virtual SAN subnets, and the Virtual SAN port numbers, destined other Virtual SAN subnets and the Virtual SAN Multicast Groups. Also if using a RP, consider restricting membership to only the Virtual SAN Multicast Groups.

If the Virtual SAN subnets must be advertised widely in the Network, create Access Lists in the entry points where the Data Center Firewalls are located. If you have the ability to do so, move these Firewalls as close as possible to the Virtual SAN subnets. No entity outside the Data Center should be allowed to reach the Virtual SAN subnets (on any port) nor be the source or listener of the Virtual SAN Multicast Groups.

Elver's Opinion: None of these Network Security suggestions would matter much if you don't have physical security for the Physical Network and the vSphere Infrastructure.


Friday, January 29, 2016

A Tale of a Solution And Three Points of View

What are a vSphere Administrator, a Storage Administrator and a Security Administrator doing having lunch together? They are embracing the new world order. That’s the Data Center world we live in now. There are great expectations that as we move towards Services-based infrastructure that the line between Data Center IT silos is blurring, forcing Administrators from “unrelated” fields to cooperate much more than they did before. Who knows; may be not too far down the road it would only be two of them left.

Let’s talk about a neat (unsupported) feature of Virtual SAN. Virtual SAN relies on the IP Network to provide the transport between its clients (the Operating System such as ESXi) and its servers (the Storage Container, such as the Virtual SAN hosts) to securely store Virtual Machines. The Architecture of Virtual SAN is so that the Configuration Plane resides with vCenter and the Control/Data Planes reside with the Virtual SAN nodes. Or is it? If you read about Virtual SAN in William Lam’s VirtualGhetto (or if you prefer your readings to be in Spanish, pase por la pagina de Leandro Leonhardt, BlogVMware), you will realize (as the vSphere Administrator has) that you can do significant Virtual SAN configurations in the ESXi hosts via CLI. VMware only supports Virtual SAN for nodes that are members of the same vSphere cluster, but what got my attention was the fact that you can add an ESXi host to any Virtual SAN cluster. This would get the attention of the Storage Administrator and the Security Administrator.

So I setup a 3-node Virtual SAN cluster using vCenter (all in the same cluster, following VMware’s instructions), and ssh into one of the nodes to get the Virtual SAN cluster UUID (using the command esxcli vsan cluster get, the field is called Sub-Cluster Master UUID).

In a fourth ESXi host in a different vSphere cluster (doesn’t have to be in a cluster or even the same vCenter but I thought "why not? put it in a cluster one anyway"), I created a VMkernel port using the vSphere Web Client and enabled it for Virtual SAN. I could’ve done this via the CLI but I was feeling lazy. I then ssh into the fourth ESXi host and ran the join command:

I gave it a minute and sure enough the fourth host had joined the Virtual SAN cluster. Just to be sure I did it for a fifth host. Below is the output of the esxcli vsan cluster get in the fifth host.

Not only did the two hosts from a different vSphere cluster joined the Virtual SAN cluster, but one of them, the fifth one above, actually became the Virtual SAN Master for the Virtual SAN cluster.

Back in vCenter…let’s just say that it wasn’t happy. First thing I noticed is that vCenter spilled out an error message under Network status, as shown in the figure below.

From what I’ve been able to gather, this is a genetic error message that means something like “Something is going on with the Network and I, vCenter, have no clue what it is. Go grab the Network Administrator and the two of you go do some Network troubleshooting. Don’t forget to check the Network Multicast configuration while you are at it”. By the way, you will also lose some of the Virtual VSAN monitoring capabilities that vCenter provides.

Now, looking again at the picture above, notice that the Total capacity of Virtual SAN datastore is 59.23 GB. That’s because each of the 3 original Virtual SAN nodes (the 3 that are in the same vSphere cluster) is contributing 20 GB (plus 4 GB SSD) to the Virtual SAN Datastore. And more importantly, the two new nodes (cuarto and quinto) were able to access this new Datastore, a fact that will must definitely intrigue the Security Administrator.

Here is a screenshot of the fifth host’s Datastores.

So that got me thinking (and it didn’t hurt): Is there a way to make the fourth node provide storage capacity to the Virtual SAN Datastore? It turns out there is, which will probably please the Storage Administrator (AGAIN, this is UNSUPPORTED by VMware). First, identify the disks (at least one SSD) in the host that you want to add to the Virtual SAN Datastore and run this command (where -s is the SSD disk and -d is the HDD disk) to create the Virtual SAN disk group in the host:

You won’t get an acknowledgement that the join was successful, so run this command to confirm the Virtual SAN disk group got created:

Back again in vCenter, it will report the storage we just added by adding it to the 59.23 GB from earlier. Now the Total capacity of Virtual SAN datastore reads 78.97 GB:

And for the record, all disks in the Virtual SAN Datastore were usable. I created a storage policy of Host Failures to Tolerate 2, added disks in the fifth host to the Virtual SAN Datastore (since a Host Failures to Tolerate of uses the formula 2n + 1 to determine how many host are required to provide storage to the Virtual SAN Datastore) , assigned a Virtual Machine to it and sure enough…

In the image above we can see the consumed space in Virtual SAN Datastore is greater than 0. I can only see the used storage for the Virtual SAN Disk Groups seen by vCenter (the original 3 nodes). So of 3.32 GB that is currently used in the Virtual SAN Datastore, about 2.2 GB (1.1 GB + 372 MB + 768 MB) are being consumed in the original 3 node Virtual SAN clusters and the remainder is being stored with the two other ESXi hosts.

Let me stop here since the lunch break is over and everyone must get back to work.

Elver’s Opinion: The vSphere Administrator thinks this is awesome. Although not supported by VMware in Virtual SAN 6.1, the foundation is already embedded in the Virtual SAN code to support a Multi-vSphere Clusters Virtual SAN deployment. One application that quickly comes to mind: a single dedicated Virtual SAN cluster for the vSphere environment’s storage needs that can be quickly and relatively inexpensively scaled up.

For the Storage Administrator, it is a mixed bag. The good of it is that his job just got so much easier. Now he will have some time back to be more efficient at his day-day job. He can now go on vacation without having to announce his plans 8 months in advance and be concerned that the sky will fall when he's is not at the office. The bad of it is that his job just got so much easier. Unless he figures out a way that he can continue to be productive and of continue providing value to the busines, his role might be chopped up and distributed among other Data Center Administrators. For reference case of this, google Voice Engineer circa 2003 or Server Administrator circa 2007.

The Security Administrator on the other hand is probably concerned. But she is not concerned about job security. She just witnessed how two potentially rogue entities (the two ESXi hosts in their own little cluster) were able to access company data in the Virtual SAN Datastore with nothing more than the Virtual SAN cluster UUID.  What gives her some peace of mind is that you would need access to one of the Virtual SAN nodes to obtain the Virtual SAN cluster UUID. However she knows she has some work ahead of her to tighten security around Virtual SAN as Virtual SAN is an "Infrastructure-Consumer Solution" that does not have built-in security access-restriction mechanisms.


Monday, January 25, 2016

VSAN ESXCLI MG and Cluster Information

I have a confession to make: I’m lazy. Quite lazy; but not that lazy that I’ve started programing yet, although I’m close to it. If you ask me to do something (or I know I have to do something), I’ll google the “best and least burdensome” way to do it. The more spoon-feeding the process, the better.

So I’ve been getting my hands dirty with Virtual SAN (the official name, since VSAN is already taken by Cisco to represent multiple SANs in their MDS; kind of a play on VLAN). Virtual SAN is configured from the vSphere Web Client (although you can also do it via the CLI, as explained here in William Lam’s article) and hard as I looked, there is limited non-storage information that you get regarding the state of the Virtual SAN cluster and its member. I can’t tell if this is by design or what, but I don’t like it.

When configuring a Virtual SAN, I want to know things like who is the Master, what MGs are being used and which are the Virtual SAN members. True to my nature, I googled around a bit (not that hard really, just the top 4-5 hits on Google) and I couldn’t quite find what I was looking for. Thus I had no choice to go into the CLI and get the answers myself. I’m putting this here as 1) a parking lot for me to reference in the future and 2) a page to help fellow slackers when they need to find out the state of their Virtual SAN deployment.

Without further ado, these are the states I’m interested in looking up, follow buy the commands that will provide you the information:

  1. VMkernel port used for Virtual SAN (this you can get from the vSphere Web Client by looking at the host’s VMkernel ports in Manageà Networking )
  2. Multicast Group (MG) used by Virtual SAN cluster.
  3. The Port Numbers used by Virtual SAN’s MG traffic.
  4. The Virtual SAN host role (Master, Backup or Agent)

This provides me a, b and c above. This command provides Multicast information, including the Time To Live (TTL) for the Multicast traffic. The default value is 5, which means that if Virtual SAN is being deployed over Layer 3, you can’t have this host being more than 5 routers away from the other hosts. Of course, this setting can be changed with the command esxcli vsan network ipv4 set --intrface-name vmk# --multicast-ttl {0<#<256}.

On a side note, you should seriously consider changing the MGs if you are deploying Virtual SAN over Layer 3. Chances are pretty good that the Data Center network team has a MG addressing plan that does not include the default MGs for Virtual SAN. Remember that all Virtual SAN cluster members must have the same Master MG address and Agent MG address. The command to change the default MGs is esxcli vsan network  --intrface-name vmk# --agent-mc-addr x.x.x.x --master-mc-addr y.y.y.y.

Elver’s Opinion: If you have multiple Virtual SAN clusters over Layer 3, then change the default MGs for each Virtual SAN cluster.

This provides me with 4 above. This command tells you who the Master and Backup Virtual SAN nodes are and, if this host is either of them or an Agent. There are only three node roles in Virtual SAN. I believe this command doesn’t need explanation other than to clarify that I have a three node Virtual SAN cluster.