Elver's Opinion

Tuesday, December 5, 2017

We Moved...

But don’t despair as I didn’t go too far. I am going to be blogging in Hydra 1303’s blog page, https://www.hydra1303.com/all-posts/. The hope is that we will have regular posts (as opposed to my posts here). I hope you will come visit.

Thursday, August 3, 2017

My Home Router is a VyOS VM

I had my home’s Netgear hacked with DD-WRT and I was running a PPTP server until Apple decided to stop support of PPTP VPNs last year. Since then I’ve been without a VPN service I could use when traveling abroad (setting up OpenVPN in DD-WRT ended up being more work than was worth it). But then the new season of Game of Thrones was about to start again and I wasn’t going to go a week or more without watching the episodes. I’m not paying someone else to provide me VPN service…and no, I’m not using the free ones either; they have to sell something to make money and guess what that something is.

I also had a need to create a DMZ to separate Hydra’s lab gear (I have a small room with a rack and servers and stuff, which is used by other ~~A.I.M~~ Hydra members) from my personal stuff so I decided to give VyOS a go at the task (L2TP VPN and DMZ). I already have servers running 24x7 so no reason why I couldn’t add one more VM (VyOS) to the mix and reconfigure my network. I will skip the details of all the trial and errors I had to go thought to get this to work (1- VyOS documentation is limited and 2- my brain decided to shutoff for 2 days) so I’ll go straight to the VyOS configs with some explanations.

Below is a diagram of the original state with the Netgear. Subnet External is there because of historical reasons (I’ve had it for something like 10 years now) and I just didn’t feel like removing it (sentimental reasons I suppose). Also, my APs mgt IPs are in that subnet (I have three UBNTs UniFi AP-AC dotted around the house). Router 1 is a Layer 3 switch (also from UBNT) that I got two years ago to provide PoE to the APs. For completeness let me add that I manage the APs using the Unifi Controller running in a VM (also running in subnet External); subnet External has no Internet access.

And this is how the new environment looks like. Subnet HydraLab is treated as an untrusted segment. It has access to the Internet and can’t initiate any sessions to the Internal subnets (hanging off Router 1). I also moved Subnet Cameras to the VyOS because otherwise I would’ve had to do a src and dst NAT when accessing the cameras from inside the house (the Netgear/DD-WRT did it automatically where as the VyOS needs to be configured to do it; I took the path of least resistance).

And these are the VyOS configurations with some explanation - where I think it is helpful. If the command is self-explanatory or is for a well-established protocol, I skipped the explaining.

Interfaces

set interfaces ethernet eth0 address dhcp

set interfaces ethernet eth0 description Internet

set interfaces ethernet eth1 address x.x.x.x/y

set interfaces ethernet eth1 description HydraLab

set interfaces ethernet eth2 address x.x.x.x/y

set interfaces ethernet eth2 description Interno

set interfaces ethernet eth3 address x.x.x.x/y

set interfaces ethernet eth3 description Cameras

LT2P VPN

set vpn ipsec ipsec-interfaces interface eth0

!The following command tells the VyOS what the source IP of the VPN client can be. I assume I’ll be connecting from RFP1918 addresses. You can also use 0.0.0.0/0 to just allow from any subnet.

set vpn ipsec nat-networks allowed-network 10.0.0.0/8

set vpn ipsec nat-networks allowed-network 172.16.0.0/12

set vpn ipsec nat-networks allowed-network 192.168.0.0/16

set vpn ipsec nat-traversal enable

set vpn l2tp remote-access authentication local-users username User password password

set vpn l2tp remote-access authentication mode local

!The following command tells the VyOS what IP range to use for IP assignment. It doesn’t matter much where the IP comes from as long as the IPs are available/unused. I used a range from the HydraLab subnet to conserve IPs.

set vpn l2tp remote-access client-ip-pool start First IP

set vpn l2tp remote-access client-ip-pool stop Last IP

set vpn l2tp remote-access dns-servers server-1 8.8.8.8

set vpn l2tp remote-access ipsec-settings authentication mode pre-shared-secret

set vpn l2tp remote-access ipsec-settings authentication pre-shared-secret Secret

set vpn l2tp remote-access ipsec-settings ike-lifetime 3600

!The following command is the one of two that bomb with VyOS. I need to use the IP of the Internet interface (eth0), which is acquired by DHCP. This command only takes an IP (not an interface). The day will come when my ISP (Frontier, formerly Verizon) would decide to give me a different IP and my VPN will be broken. I have a backup plan for such event.

set vpn l2tp remote-access outside-address Internet Interface IP

NAT Rules

!The rule numbers can be whatever you want.

set nat source rule 1 description Internet Access for Hydra LAB

set nat source rule 1 outbound-interface eth0

set nat source rule 1 source address HydraLab Subnet

set nat source rule 1 translation address masquerade

set nat source rule 2 description Internet Traffic for Internal

set nat source rule 2 outbound-interface eth0

set nat source rule 2 source address Internal Subnets

set nat source rule 2 translation address masquerade

!The following rule allows NAT traffic going to the cameras from the Internet. I only listed one camera. You would need to repeat these commands, with different rule numbers, for each camera you have.

set nat destination rule 101 description Camera1

!!Just like the remote-access outside-address in the VPN section, this next command needs an actual IP. This is the second command that sucks.

set nat destination rule 101 destination address Internet Interface IP

!!Specify the port number you use to connect to the camera.

set nat destination rule 101 destination port xxx

set nat destination rule 101 inbound-interface eth0

set nat destination rule 101 protocol tcp

set nat destination rule 101 translation address CAMERA1 IP

!!Specify the port number the camera listens to.

set nat destination rule 101 translation port yyy

!The following rule allows NAT traffic going to the cameras from inside the house. I only listed one camera. You would need to repeat these commands, with different rule numbers, for each camera you have.

set nat destination rule 201 description Camera1

!!Specify the port number you use to connect to the camera.

set nat destination rule 201 destination port xxx

set nat destination rule 201 inbound-interface eth2

set nat destination rule 201 protocol tcp

set nat destination rule 201 translation address CAMERA1 IP

!!Specify the port number the camera listens to.

set nat destination rule 201 translation port yyy

Security Rules

!Global (optional) Commands.

set firewall all-ping enable

set firewall broadcast-ping disable

set firewall config-trap disable

set firewall group

set firewall ipv6-receive-redirects disable

set firewall ipv6-src-route disable

set firewall ip-src-route disable

set firewall log-martians enable

set firewall receive-redirects disable

set firewall send-redirects disable

set firewall source-validation disable

set firewall state-policy invalid action reject

set firewall syn-cookies enable

set firewall twa-hazards-protection disable

!These are the rules that will be enforced for inter-zone traffic. The rule numbers can be whatever you want.

set firewall name ESTABLISHED default-action drop

set firewall name ESTABLISHED enable-default-log

set firewall name ESTABLISHED rule 1001 action accept

set firewall name ESTABLISHED rule 1001 state established enable

set firewall name ESTABLISHED rule 1001 state related enable

set firewall name FromHYDRALAB default-action drop

set firewall name FromHYDRALAB enable-default-log

set firewall name FromHYDRALAB rule 2001 action accept

set firewall name FromHYDRALAB rule 2001 state established enable

set firewall name FromHYDRALAB rule 2001 state related enable

set firewall name LOCALMGT default-action drop

set firewall name LOCALMGT rule 3001 action accept

set firewall name LOCALMGT rule 3001 log enable

set firewall name LOCALMGT rule 3001 state established enable

set firewall name LOCALMGT rule 3001 state new enable

set firewall name LOCALMGT rule 3001 state related enable

set firewall name OUTGOING default-action drop

set firewall name OUTGOING enable-default-log

set firewall name OUTGOING rule 4001 action accept

set firewall name OUTGOING rule 4001 state established enable

set firewall name OUTGOING rule 4001 state new enable

set firewall name OUTGOING rule 4001 state related enable

set firewall name ToROUTER default-action drop

set firewall name ToROUTER enable-default-log

set firewall name ToROUTER rule 5001 action accept

set firewall name ToROUTER rule 5001 destination port 4500

set firewall name ToROUTER rule 5001 log enable

set firewall name ToROUTER rule 5001 protocol udp

set firewall name ToROUTER rule 5002 action accept

set firewall name ToROUTER rule 5002 destination port 500

set firewall name ToROUTER rule 5002 protocol udp

set firewall name ToROUTER rule 5003 action accept

set firewall name ToROUTER rule 5003 destination port 1701

set firewall name ToROUTER rule 5003 ipsec match-ipsec

set firewall name ToROUTER rule 5003 protocol udp

set firewall name ToROUTER rule 5004 action accept

set firewall name ToROUTER rule 5004 protocol esp

set firewall name ToROUTER rule 5005 action accept

set firewall name ToROUTER rule 5005 state established enable

set firewall name ToROUTER rule 5005 state related enable

set firewall name ToHYDRALAB default-action drop

set firewall name ToHYDRALAB enable-default-log

set firewall name ToHYDRALAB rule 6001 action accept

set firewall name ToHYDRALAB rule 6001 state established enable

set firewall name ToHYDRALAB rule 6001 state related enable

set firewall name ToINTERNO default-action drop

set firewall name ToINTERNO enable-default-log

set firewall name ToINTERNO rule 7001 action accept

set firewall name ToINTERNO rule 7001 state established enable

set firewall name ToINTERNO rule 7001 state related enable

set firewall name ToCAMERA default-action drop

set firewall name ToCAMERA rule 8001 action accept

set firewall name ToCAMERA rule 8001 destination port port number the cameras listen to

set firewall name ToCAMERA rule 8001 log enable

set firewall name ToCAMERA rule 8001 protocol tcp

Inter-Zone Security Rule Mapping (DMZ)

!Indicate traffic that is allowed to reach Internet.

set zone-policy zone INTERNET default-action drop

set zone-policy zone INTERNET from HYDRALAB firewall name OUTGOING

set zone-policy zone INTERNET from INTERNO firewall name OUTGOING

set zone-policy zone INTERNET from LOCAL firewall name OUTGOING

set zone-policy zone INTERNET from CAMERA firewall name ESTABLISHED

set zone-policy zone INTERNET from VPN firewall name OUTGOING

set zone-policy zone INTERNET interface eth0

!Indicate traffic that can enter HydraLab zone.

set zone-policy zone HYDRALAB default-action drop

set zone-policy zone HYDRALAB from INTERNET firewall name ToHYDRALAB

set zone-policy zone HYDRALAB from INTERNO firewall name OUTGOING

set zone-policy zone HYDRALAB from LOCAL firewall name OUTGOING

set zone-policy zone HYDRALAB from VPN firewall name OUTGOING

set zone-policy zone HYDRALAB interface eth1

!Indicate traffic that can enter Internal zone.

set zone-policy zone INTERNO default-action drop

set zone-policy zone INTERNO from HYDRALAB firewall name FromHYDRALAB

set zone-policy zone INTERNO from INTERNET firewall name ToINTERNO

set zone-policy zone INTERNO from LOCAL firewall name ESTABLISHED

set zone-policy zone INTERNO from CAMERA firewall name ESTABLISHED

set zone-policy zone INTERNO interface eth2

!Indicate traffic that can reach Cameras.

set zone-policy zone CAMERA default-action drop

set zone-policy zone CAMERA from INTERNET firewall name ToCAMERA

set zone-policy zone CAMERA from INTERNO firewall name ToCAMERA

set zone-policy zone CAMERA interface eth3

!Indicate traffic that can reach the router (think management, VPN, SSH, etc…).

set zone-policy zone LOCAL default-action drop

set zone-policy zone LOCAL from HYDRALAB firewall name ESTABLISHED

set zone-policy zone LOCAL from INTERNET firewall name ToROUTER

set zone-policy zone LOCAL from INTERNO firewall name LOCALMGT

set zone-policy zone LOCAL local-zone

!Indicate return traffic VPN users will see.

set zone-policy zone VPN from HYDRALAB firewall name FromHYDRALAB

set zone-policy zone VPN from INTERNET firewall name ESTABLISHED

!!I added support for four concurrent VPN connections (iPad, iPhone, Laptop, and JustInCase)

set zone-policy zone VPN interface l2tp0

set zone-policy zone VPN interface l2tp1

set zone-policy zone VPN interface l2tp2

set zone-policy zone VPN interface l2tp3

Elver’s Opinion: I didn’t find the VyOS configs to be too challenging; the challenge was finding the correct command references to get this done. There are a few more details on how my environment is setup and I may put up more in a later blog post (time permitting).

Saturday, June 24, 2017

Customer Loyalty or Short-Term Profits

I’m fortunate that I’m very busy these days (and looking for partners with the right mindset/chemistry to help us out) but a downside of it is that I’m now making last minute travel decisions, which inevitably lead to mistakes done on my part. So what should travel companies (airlines, hotels, car rentals, etc…) do when I make these mistakes?

For example, I booked myself to flight out of EWR to TPA in two different flights four hours apart. I didn’t notice this until I had to check in last night, so I called United to inquire about this. The lady that answered my call confirmed that I indeed had two flights, asked me which one I wanted to cancel and provided a refund. I’m sure that my status with the airline played a role in her decision to provide the refund but she could’ve read me the EUA and claim that I was at fault or partially at fault for being too stupid to book two flights out of the same airport four hours apart. They also waived a change fee of $200 in May when I mistakenly booked a different flight for the wrong days and noticed after 24 hours have elapsed. In any case, United decided to sacrifice short-term profits (twice) for customer loyalty. You bet that I will continue to flight United.

Which leads me to Avis. I’ve been booking with Avis for years and only booked with other car rentals when Avis was not available. Earlier this week on my way back from Toronto (en route to NYC), I stopped for the day in Philly to present at the VMUG UserCon. I reasoned that it would be more efficient (and less costly) to rent the car for a day than it would be taking a Lyft (I avoid Uber as much as I can; I haven’t liked their business ethics for a long time). I needed to go from the PHL to the VMUG (about 35 miles) and then to the Amtrak station in Philly to catch the train to Penn Station (about another 30 miles). So far so good.

While at the VMUG Michael Fleischer (@michaelfleisher) tells me I could just go over the border to Trenton (about 30-40 miles from the VMUG, about same distance to the Amtrak station in Philly) and catch the NJ Transit train for about $16 (versus $112 for Amtrak). Thinking that was a no brainer I did just that (while in the process calling Avis to confirm the location of the Avis office in Trenton) and dropped off the car to then be given an over $300 bill. My first thought was “This is a mistake” so I asked the guy at the counter to explain the bill and he tells me that I was charged for miles ($201 plus taxes), and that I needed to call Avis customer service if I had further questions. I thanked him for his time and left to catch my train.

I called Avis (billing department), provided the rental agreement number and asked why I was charged for miles. The lady that answered my call proceeded to explain that per the EUA that if I change the drop off location (no mention that the drop off location was in another State) I automatically get charged for miles (which later she clarified by stating the rental rate could change in such cases). Throughout the entire conversation she was very professional and respectful. I acknowledged that I didn’t know what the EUA said (or that I had read it for that matter…I never do) and that now I understood that per the EUA I was fully responsible for the charges. I then asked her what could be done to remove the charges (I have rented one-way before with Avis and I have never been charged by the mile). She proceeds to quote the EUA again.

Then I decided to explain to her (or remind her I suppose since she had my account info in front of her) that I was a long time Avis customer, only rent cars with them and I would’ve expected, since this was a first time occurrence, the per-mile fees would be waived. She said no they won’t be waived because…the EUA. Sigh.

I asked her if I could speak to a supervisor and to my surprise, she was it (or as she stated it “I am in the line of escalation”). At this point she offered to refund 25% off the per-mile charges. I politely refused, acknowledging again that I was liable for the charges per the EUA (really, who reads the travel companies’ EUA anyway?) explaining to her that my expectation for being a very long-term loyal customer of Avis is that Avis would forgive this one-time occurrence. She said no that she couldn’t and then proceeded to quote the EUA again (and no I’m not about to start reading travel companies’ EUA; I have no time for that. I will continue to “initial here, here and here, sign here”).

Somehow she thought I was still claiming that I was not responsible for the charges and offered 50% off so I could still pay for my part of being not-too-smart-for-not-reading-the-EUA (my words, not hers). I politely refused and I explained myself again: I was not looking for a discount, I was looking for Avis to show me they value their loyal customers and if this was Avis position (to fall back on the EUA for minor things) then I would be electing not to do business which Avis (why should I do business with such a company?). Guess what she said now? Exactly, she proceeded to quote from the EUA. After she finished quoting I thanked her for her time and wished her a great day. The call lasted less than 8 minutes.

After the call I emailed Avis customer service asking them what the process is to cancel my account. I didn’t include in the email the reasons why I want to cancel the account hoping someone from Avis would reach out, hear me out and tell me that I had a bad dream, yes Avis values customer loyalty over short-term profits, wash and vacuum the cars after each return, and they will reimburse the mile charges (I hate breakups so giving them one last chance). We’ll see what happens. They are supposed to reply to my email within 3-4 days (no, I didn’t get this information from the EUA J; I got it from their automatic email acknowledging they received my email).

Elver’s Opinion: Avis is choosing short-term profits (very SMALL short-term profits) in lieu of customer loyalty. Which is fine by me. That's not how I run my business with my customer but the Avis execs run their company however they think is best. I spend my money where I feel valued.

Friday, April 7, 2017

NSX Local Egress

Better late (VERY late) than never I suppose. Here is my last blog post on the DC DCIs and it involves the Local Egress feature of NSX. I don’t quite remember all I wrote on my earlier posts (which you can read here, here and here) since it has been a bit since I wrote them (and I for some reason can’t get into reading stuff I’ve written); but I believe I explained what happens to egress traffic to the DC and how it affects the bandwidth requirements.

In version 6.1 (or was it 6.2?) of NSX-v, VMware introduced something they are calling Cross vCenter NSX. The architecture consists of having multiple NSX Managers (one per vCenter) that can exchange some configuration information as well as share a common set of NSX Controllers. The goody part of it is that it allows you to have multiple vCenters managing different DCs while stretching a broadcast domain (using VXLAN) among all the DCs (up to 8 DCs). For this blog post we will assume the DCs are physical and have DCIs between them.

On the surface this is a cool feature (putting aside that stretching a broadcast domain over multiple locations violates the 11^th commandment) since it allows all VTEPs (belonging to the same Universal Transport Zone) in all DCs to be fully meshed with each other. However, this by itself doesn’t address the problem of keeping local layer 3 egress traffic localize and without crossing the DCIs.

To illustrate the problem, please have a look at the diagram below.

Notice that the local copy of the Logical Router (it is an Universal Logical Router, which is functionally the same as the Global Logical Router) will have two paths to reach the intranet: one via the NSX Edge in DC1 and another over the NSX Edge over DC2. By default NSX Components are unaware of locality (location) and thus to the Logical Router (specifically, the LR Control VM that owns the OSPF/BGP adjacency the NSX Edges) both NSX Edges are equally good to reach external networks and it can choose either NSX Edge to forward the traffic to. There is a bit of more complexity here about route costs, but we’ll skip those for this blog post.

So to handle the layer 3 egress traffic from the Logical Router and keep it from going over the DCI, NSX introduces a method to provide location awareness to the routing table creation. And the method is quite simple: Assign an ID to all Universal Logical Router entities (the LR Control VM and ESXi hosts but NOT the NSX Edges) that you want to belong to the same “location”. Using this location ID, or Locale ID as VMware calls it (which is 128 bits long and defaults to the UUID of the NSX Manager – but it can be changed), the NSX Controller (the Universal NSX Controller since we are doing Cross vCenter NSX) will only populate the routing table to those ESXi hosts that have the same Locale ID as the LR Control VM, as shown in the figure below.

If you have been following up to this point then you noticed that we introduced more problems than we actually solved (thus far). In the diagram above we have two different problems: 1) Only ESXi hosts in DC1 would get the routing table since they have the same Locale ID as the LR Control VM and 2) The LR Control VM still sees the NSX Edges in both DCs.

To solve problem 1, NSX allows you to deploy a second LR Control VM from the other NSX Manager (up to 8 LR Control VMs, one per NSX Manager in the Cross vCenter NSX domain) and you assign that second LR Control VM the same Locale ID as the ESXi hosts in the DC2. The solution to problem 1 is shown in the diagram below.

To solve problem 2, it is necessary to prevent the LR Control VM in DC1 from forming routing adjacencies with the NSX Edge in DC2 (and vice-versa for our second LR Control VM in DC2). The solution involves creating a Logical Switch (Universal Logical Switch) per LR Control VM and connecting only entities (NSX Edge, LR Control VM) that reside in the same “location” to the Logical Switch, as shown in the diagram below.

Y ya. The combination of Locale ID, multiple LR Control VMs and a dedicated Logical Switch per “location” ensures that egress layer 3 traffic from one location doesn’t cross the DCI.

Elver’s Opinion: I don’t think I have an opinion today…actually I do (now that I think about it). Locale ID is a good solution that NSX-v introduces to handle local egress, but I think there is a missed opportunity here. VMware could allow the NSX Edge to be assigned a Locale ID (after all the LR Control VM IS a modified NSX Edge) and avoid the need to have a ULS per location. I’m not sure that I know the reason why they didn’t design the Locale ID solution in this way (or that they have a good one).

Monday, February 13, 2017

DC Egress Traffic with Stretched Layer 2

In the last blog I spent a lot of writing (and funky formulas) just to say that the DCI circuits between two Data Centers need to be larger than the biggest DC WAN link plus the inter-DC traffic (which increases your cost). When stretching layer 2 across DCs, there is not much that can be done to force DC ingress traffic to come in via the DC WAN link where the destination workload (VM) is running. However, there are some things you can do to force the egress traffic to go out the DC where the VM is located (and avoid using the DCI circuits) to reduce some of the cost associated with the DCI links. For this blog post I’m going to assume that we have Active/Active DCs.

Dual Default Gateways

When you stretch the layer 2 across the DC, the default getaway for the stretched layer 2 segments could be physically located in one of the two DCs. We don’t care about that use case (that would probably require a standalone blog post to talk about the cons of this design). Instead let’s assume that we have default gateway services in both DCs, and to provide redundancy, we will have two routers as default gateways in each DC, running FHRP (something like VRRP), as shown in the diagram.

In this design, the VM will forward traffic to its local default gateway, which in turn will forward the traffic out of it local DC WAN. For this design to work (1) there must be a mechanism to stop each pair of DC default gateways from seeing each other (otherwise you won’t get both pairs of FHRP routers to be Active with the same virtual MAC) and (2) to prevent a VM in one DC from receiving ARP replies for its default gateway from the other DC’s pair of default gateways. You could achieve this with Access List (too manual) or you could stretch the layer 2 with something like Cisco OTV, which has built-in mechanism (less manual) to isolate FHRP in each DC.

This design does have some potential issues that must be taken into account. If each pair of default gateways uses different virtual MAC addresses when replying to ARP, a VM that moves DCs will lose connectivity (until it re-ARPs for its default gateway). Also, if both members of a pair of default gateways go down, you may have to remove the FHRP isolation to allow the impacted VMs to reach the default gateways in the other DC.

Distributed Default Gateway (Top of Rack)

An alternative to dual default gateways is to stretch the layer 2 using VXLAN (or another tunneling protocol) from the Top of Rack (ToR). In this design all ToR will have the Layer 3 boundary and be the default gateways for their own racks. Every time the ToR gets an ARP request, the ToR will respond to it (and provide local FHRP isolation), as shown in the diagram below. Two examples of this are Arista’s DCI with VXLAN and VARP, Brocade’s IP Fabric with Anycast gateway (for the time being until Broadcom decides what to do with Brocade’s Network business).

One advantage of this design over the previous one: there are a lot more routers (the ToRs) acting as default gateways and built in FHRP isolation. If a ToR dies, only the rack where it resides will be impacted as opposed to the entire DC. Also since all ToR have the use the same virtual MAC, when a VM moves DCs, it continues to have uninterrupted Layer 3 connectivity. One disadvantage is that you would need to fiddle with route advertisements to ensure the ToRs forward traffic straight up the local DC WAN; this many not be as easily done as it sounds.

Side note: there is a variation of this design where the ToR are strictly layer 2 (let’s call them Leafs) and the distribution switches (henceforth Spines) do the Layer 3, thus providing the default gateway services.

Distributed Default Gateway (software)

Just like the physical version, you stretch the layer 2 using a tunneling protocol (like VXLAN or GENEVE) but you have a layer 3 process in each hypervisor that serves as the default gateway (e.g. virtual router). Each virtual router will have the same IP and virtual MAC (thus VMs can move between DCs at will) and locally respond to ARP requests. And like the physical version, you must manipulate routes to force each virtual router to send traffic to its local DC WAN, as shown in the diagram.

VMware’s NSX-v (distributed logical router) achieves this functionality. Each logical router is the “same” in each hypervisor except for their routing tables. Each logical router in each DC will get routes only relevant to it. This way, each logical router is “forced” to forward traffic using its local WAN.

Elver’s Opinion: This blog post should (mostly) conclude my thoughts on stretching Layer 2 across the DCI (think hard before doing it). At first I thought I would use this blog to also talk about local egress in NSX (to wrap up my thoughts on the matter), but as I wrote I realize I would need more space than I thought, so I’ll be writing another blog post just on local egress in NSX.

Friday, January 13, 2017

Impact of Stretched Layer 2 on DCI

I was not clear in my DC Ingress blog post as to why it matters which is the entry/exit point for flows coming from/going outside the DCs for the application that is using the stretched layer 2 in an infrastructure supporting BC with an Active/Active WAN architecture. One word can summarize why it matters: cost. The moment you allow traffic not sourced to/destined in the DCs to go over the links between the Data Centers, that link becomes a transit segment and you must increase its speed to accommodate the additional traffic.

Let me put back up the $1 diagram I used last time, but now showing the connection between the Data Centers (the Data Center Interconnect, or DCI).

The DCI between the two DCs needs to be big enough to handle all inter-DC traffic (traffic with source and destination of the DCs; doesn’t include transit traffic coming from/going outside the DCs). Lets call traffic from DC1 to DC2 DCI1 and DC2 to DC1 DCI2. The speed of your DC1 WAN circuit must be as big as the amount of ingress traffic in DC1. Same goes for DC2. If we call the DC ingress traffic DCi1 and DCi2, and we are not doing any sort of route manipulation, then some DCi1 traffic will transit the DCI to reach VMs in DC2 and some DCi2 traffic will transit the DCI to reach VMs in DC1.

Since we don’t know how much “some” is going to be, we should architect for worst-case scenario, like a WAN disruption changing flow patterns, or risk having some traffic dropped before it goes over the DCI. So this is how much traffic the DCI would have to handle:

If DCI1 + DCi1 ≥ DCI2 + DCi2 then DCI1 + DCi1, else DCI2 + DCi2

What this little formula says is that the speed of the DCI link must be as big as the larger of traffic from DC1 or from DC2 (I’m making the assumption the DCI is symmetrical; none of that asymmetrical bandwidth you get from your home ISP).

But this formula is not complete. You see, the VMs will be sending traffic back to the user (egress traffic). Let’s pretend the traffic flow goes back the same way the ingress traffic came (worst-case again, as we can't predict what would happen in the WAN). Using DCe1 to represent the VMs in DC1 replying back to the user and DCe2 to represent the VMs in DC2 replying back to the user, the formula becomes this:

If DCI1 + DCi1 + DCe1 ≥ DCI2 + DCi2 + DCe2 then DCI1 + DCi1 + DCe1, else DCI2 + DCi2 + DCe2

This formula is a bit long, so let’s do some thinking and see if we can simplify this. Since we are architecting for worst-case scenario and we are thinking BC, we can use the larger of DCi1 or DCi2 and call it DCiB. DCiB will be coming in one of the WAN circuits of the DCs. Let’s give the same treatment to DCe1 and DCe2, and call it DCeB. DCeB will be going out of one of the WAN circuits in the DCs.

Elver’s Opinion: Since flow patterns are never static, it is a good idea to make the WAN circuits in both DCs the same size, and be the larger of DCiB or DCeB.

For sizing our DCI, we actually care about the larger of DCiB or DCeB; let’s call it DCB. The reason for this is that in the event of WAN failure at DC2 all ingress traffic comes in DC1 and transits over the DCI to DC2, and all egress traffic will go from DC2 and transit the DCI to DC1 (and the following week, flow patterns reverse). This allows us to replace DCiX + DCeX for DCB.

We now make some substitutions to get this:

If DCI1 + DCB ≥ DCI2 + DCB then DCI1 + DCB, else DCI2 + DCB

Which can be rewritten as:

If DCI1 ≥ DCI2 then DCI1 + DCB, else DCI2 + DCB

All of this writing and formulas just to say that the DCI speed must be at least as big as your largest DC WAN circuit plus the largest inter-DC traffic. Or put another way, the DCI circuit speed will be the inter-DC traffic plus the transit traffic…and transit traffic we established is the ingress/egress traffic in support of the application that is using the stretched layer 2.

The higher the speed of the DCI circuit(s), the higher the cost. It might not be as obvious, but the higher cost is not just for the actual circuit. It is also for the hardware that is needed at both ends of the circuit to support it and the intra-DC hardware required to support any other higher-speed links that will have to carry the transit traffic.

I’ll write another post to discuss how to minimize the egress traffic becoming DCI transit traffic. It is quite straightforward nowadays to accomplish, with most major network vendors providing solutions for it. I will give special placement to NSX, as it has to achieve it doing something different from what the other vendors do.

Elver’s Opinion: Yes there are traffic pattern schemes that would leverage a smaller size DCI than the last formula above. However those cases don’t occur much in the wild when you are tasked to provide an infrastructure that supports BC with Active/Active WAN and stretched layer 2 for applications.