But don’t despair as I didn’t go too far. I am going to be blogging in Hydra 1303’s blog page, https://www.hydra1303.com/all-posts/. The hope is that we will have regular posts (as opposed to my posts here). I hope you will come visit.
Elver's Opinion
Quasi random neuron firings on SDx and other stuff
Tuesday, December 5, 2017
Thursday, August 3, 2017
My Home Router is a VyOS VM
I had my home’s Netgear hacked with DD-WRT and I was running
a PPTP server until Apple decided to stop support of PPTP VPNs last year. Since
then I’ve been without a VPN service I could use when traveling abroad (setting up OpenVPN in DD-WRT ended up being more work than was worth it). But
then the new season of Game of Thrones was about to start again and I wasn’t
going to go a week or more without watching the episodes. I’m not paying
someone else to provide me VPN service…and no, I’m not using the free ones
either; they have to sell something
to make money and guess what that something
is.
I also had a need to create a DMZ to separate Hydra’s lab
gear (I have a small room with a rack and servers and stuff, which is used by
other A.I.M Hydra members) from my personal stuff so I decided to give
VyOS a go at the task (L2TP VPN and DMZ). I already have servers running 24x7 so no reason why I
couldn’t add one more VM (VyOS) to the mix and reconfigure my network. I will
skip the details of all the trial and errors I had to go thought to get this to
work (1- VyOS documentation is limited and 2- my brain decided to shutoff for 2
days) so I’ll go straight to the VyOS configs with some explanations.
Below is a diagram of the original state with the Netgear.
Subnet External is there because of
historical reasons (I’ve had it for something like 10 years now) and I just didn’t
feel like removing it (sentimental reasons I suppose). Also, my APs mgt IPs are
in that subnet (I have three UBNTs UniFi AP-AC dotted around the house). Router
1 is a Layer 3 switch (also from UBNT) that I got two years ago to provide PoE to
the APs. For completeness let me add
that I manage the APs using the Unifi Controller running in a VM (also running
in subnet External); subnet External
has no Internet access.
And this is how the new environment looks like. Subnet HydraLab is treated as an untrusted
segment. It has access to the Internet and can’t initiate any sessions to the
Internal subnets (hanging off Router 1). I also moved Subnet Cameras to the VyOS because otherwise I would’ve
had to do a src and dst NAT when accessing the cameras from inside the house
(the Netgear/DD-WRT did it automatically where as the VyOS needs to be
configured to do it; I took the path of least resistance).
And these are the VyOS configurations with some explanation - where I think it is helpful. If the command is self-explanatory or is for a
well-established protocol, I skipped the explaining.
Interfaces
set interfaces ethernet eth0
address dhcp
set interfaces ethernet eth0
description Internet
set interfaces ethernet eth1
address x.x.x.x/y
set interfaces ethernet eth1
description HydraLab
set interfaces ethernet eth2
address x.x.x.x/y
set interfaces ethernet eth2
description Interno
set interfaces ethernet eth3
address x.x.x.x/y
set interfaces ethernet eth3
description Cameras
LT2P VPN
set vpn ipsec
ipsec-interfaces interface eth0
!The following command
tells the VyOS what the source IP of the VPN client can be. I assume I’ll be
connecting from RFP1918 addresses. You can also use 0.0.0.0/0 to just allow
from any subnet.
set vpn ipsec nat-networks
allowed-network 10.0.0.0/8
set vpn ipsec nat-networks
allowed-network 172.16.0.0/12
set vpn ipsec nat-networks
allowed-network 192.168.0.0/16
set vpn ipsec nat-traversal
enable
set vpn l2tp remote-access
authentication local-users username User
password password
set vpn l2tp remote-access
authentication mode local
!The following command
tells the VyOS what IP range to use for IP assignment. It doesn’t matter much
where the IP comes from as long as the IPs are available/unused. I used a range
from the HydraLab subnet to conserve IPs.
set vpn l2tp remote-access
client-ip-pool start First IP
set vpn l2tp remote-access
client-ip-pool stop Last IP
set vpn l2tp remote-access
dns-servers server-1 8.8.8.8
set vpn l2tp remote-access
ipsec-settings authentication mode pre-shared-secret
set vpn l2tp remote-access
ipsec-settings authentication pre-shared-secret Secret
set vpn l2tp remote-access
ipsec-settings ike-lifetime 3600
!The following command
is the one of two that bomb with VyOS. I need to use the IP of the Internet
interface (eth0), which is acquired by DHCP. This command only takes an IP (not
an interface). The day will come when my ISP (Frontier, formerly Verizon) would
decide to give me a different IP and my VPN will be broken. I have a backup
plan for such event.
set vpn l2tp remote-access
outside-address Internet Interface IP
NAT Rules
!The rule numbers can
be whatever you want.
set nat source rule 1
description Internet Access for Hydra LAB
set nat source rule 1
outbound-interface eth0
set nat source rule 1 source
address HydraLab Subnet
set nat source rule 1 translation
address masquerade
set nat source rule 2
description Internet Traffic for Internal
set nat source rule 2
outbound-interface eth0
set nat source rule 2 source
address Internal Subnets
set nat source rule 2 translation
address masquerade
!The following rule allows
NAT traffic going to the cameras from the Internet. I only listed one camera.
You would need to repeat these commands, with different rule numbers, for each
camera you have.
set nat destination rule 101
description Camera1
!!Just like the
remote-access outside-address in the VPN section, this next command needs an
actual IP. This is the second command that sucks.
set nat destination rule 101
destination address Internet Interface IP
!!Specify the port
number you use to connect to the camera.
set nat destination rule 101
destination port xxx
set nat destination rule 101
inbound-interface eth0
set nat destination rule 101
protocol tcp
set nat destination rule 101
translation address CAMERA1 IP
!!Specify the port
number the camera listens to.
set nat destination rule 101
translation port yyy
!The following rule allows
NAT traffic going to the cameras from inside the house. I only listed one
camera. You would need to repeat these commands, with different rule numbers, for each camera you have.
set nat destination rule 201
description Camera1
!!Specify the port
number you use to connect to the camera.
set nat destination rule 201
destination port xxx
set nat destination rule 201
inbound-interface eth2
set nat destination rule 201
protocol tcp
set nat destination rule 201
translation address CAMERA1 IP
!!Specify the port
number the camera listens to.
set nat destination rule 201
translation port yyy
Security Rules
!Global (optional)
Commands.
set firewall all-ping enable
set firewall broadcast-ping
disable
set firewall config-trap
disable
set firewall group
set firewall
ipv6-receive-redirects disable
set firewall ipv6-src-route
disable
set firewall ip-src-route
disable
set firewall log-martians
enable
set firewall
receive-redirects disable
set firewall send-redirects
disable
set firewall
source-validation disable
set firewall state-policy
invalid action reject
set firewall syn-cookies
enable
set firewall
twa-hazards-protection disable
!These are the rules
that will be enforced for inter-zone traffic. The rule numbers can be whatever
you want.
set firewall name ESTABLISHED
default-action drop
set firewall name ESTABLISHED
enable-default-log
set firewall name ESTABLISHED
rule 1001 action accept
set firewall name ESTABLISHED
rule 1001 state established enable
set firewall name ESTABLISHED
rule 1001 state related enable
set firewall name
FromHYDRALAB default-action drop
set firewall name
FromHYDRALAB enable-default-log
set firewall name
FromHYDRALAB rule 2001 action accept
set firewall name
FromHYDRALAB rule 2001 state established enable
set firewall name
FromHYDRALAB rule 2001 state related enable
set firewall name LOCALMGT
default-action drop
set firewall name LOCALMGT
rule 3001 action accept
set firewall name LOCALMGT
rule 3001 log enable
set firewall name LOCALMGT
rule 3001 state established enable
set firewall name LOCALMGT
rule 3001 state new enable
set firewall name LOCALMGT
rule 3001 state related enable
set firewall name OUTGOING
default-action drop
set firewall name OUTGOING
enable-default-log
set firewall name OUTGOING
rule 4001 action accept
set firewall name OUTGOING
rule 4001 state established enable
set firewall name OUTGOING
rule 4001 state new enable
set firewall name OUTGOING
rule 4001 state related enable
set firewall name ToROUTER
default-action drop
set firewall name ToROUTER
enable-default-log
set firewall name ToROUTER
rule 5001 action accept
set firewall name ToROUTER
rule 5001 destination port 4500
set firewall name ToROUTER
rule 5001 log enable
set firewall name ToROUTER
rule 5001 protocol udp
set firewall name ToROUTER
rule 5002 action accept
set firewall name ToROUTER
rule 5002 destination port 500
set firewall name ToROUTER
rule 5002 protocol udp
set firewall name ToROUTER
rule 5003 action accept
set firewall name ToROUTER
rule 5003 destination port 1701
set firewall name ToROUTER
rule 5003 ipsec match-ipsec
set firewall name ToROUTER
rule 5003 protocol udp
set firewall name ToROUTER
rule 5004 action accept
set firewall name ToROUTER
rule 5004 protocol esp
set firewall name ToROUTER
rule 5005 action accept
set firewall name ToROUTER
rule 5005 state established enable
set firewall name ToROUTER
rule 5005 state related enable
set firewall name ToHYDRALAB
default-action drop
set firewall name ToHYDRALAB
enable-default-log
set firewall name ToHYDRALAB
rule 6001 action accept
set firewall name ToHYDRALAB
rule 6001 state established enable
set firewall name ToHYDRALAB
rule 6001 state related enable
set firewall name ToINTERNO
default-action drop
set firewall name ToINTERNO
enable-default-log
set firewall name ToINTERNO
rule 7001 action accept
set firewall name ToINTERNO
rule 7001 state established enable
set firewall name ToINTERNO
rule 7001 state related enable
set firewall name ToCAMERA
default-action drop
set firewall name ToCAMERA rule
8001 action accept
set firewall name ToCAMERA rule
8001 destination port port number the
cameras listen to
set firewall name ToCAMERA rule
8001 log enable
set firewall name ToCAMERA rule
8001 protocol tcp
Inter-Zone Security
Rule Mapping (DMZ)
!Indicate traffic that
is allowed to reach Internet.
set zone-policy zone INTERNET
default-action drop
set zone-policy zone INTERNET
from HYDRALAB firewall name OUTGOING
set zone-policy zone INTERNET
from INTERNO firewall name OUTGOING
set zone-policy zone INTERNET
from LOCAL firewall name OUTGOING
set zone-policy zone INTERNET
from CAMERA firewall name ESTABLISHED
set zone-policy zone INTERNET
from VPN firewall name OUTGOING
set zone-policy zone INTERNET
interface eth0
!Indicate traffic that
can enter HydraLab zone.
set zone-policy zone HYDRALAB
default-action drop
set zone-policy zone HYDRALAB
from INTERNET firewall name ToHYDRALAB
set zone-policy zone HYDRALAB
from INTERNO firewall name OUTGOING
set zone-policy zone HYDRALAB
from LOCAL firewall name OUTGOING
set zone-policy zone HYDRALAB
from VPN firewall name OUTGOING
set zone-policy zone HYDRALAB
interface eth1
!Indicate traffic that
can enter Internal zone.
set zone-policy zone INTERNO
default-action drop
set zone-policy zone INTERNO
from HYDRALAB firewall name FromHYDRALAB
set zone-policy zone INTERNO
from INTERNET firewall name ToINTERNO
set zone-policy zone INTERNO
from LOCAL firewall name ESTABLISHED
set zone-policy zone INTERNO
from CAMERA firewall name ESTABLISHED
set zone-policy zone INTERNO
interface eth2
!Indicate traffic that
can reach Cameras.
set zone-policy zone CAMERA
default-action drop
set zone-policy zone CAMERA
from INTERNET firewall name ToCAMERA
set zone-policy zone CAMERA
from INTERNO firewall name ToCAMERA
set zone-policy zone CAMERA
interface eth3
!Indicate traffic that
can reach the router (think management, VPN, SSH, etc…).
set zone-policy zone LOCAL
default-action drop
set zone-policy zone LOCAL
from HYDRALAB firewall name ESTABLISHED
set zone-policy zone LOCAL
from INTERNET firewall name ToROUTER
set zone-policy zone LOCAL
from INTERNO firewall name LOCALMGT
set zone-policy zone LOCAL
local-zone
!Indicate return
traffic VPN users will see.
set zone-policy zone VPN from
HYDRALAB firewall name FromHYDRALAB
set zone-policy zone VPN from
INTERNET firewall name ESTABLISHED
!!I added support for
four concurrent VPN connections (iPad, iPhone, Laptop, and JustInCase)
set zone-policy zone VPN
interface l2tp0
set zone-policy zone VPN
interface l2tp1
set zone-policy zone VPN
interface l2tp2
set zone-policy zone VPN
interface l2tp3
Elver’s Opinion: I didn’t
find the VyOS configs to be too challenging; the challenge was finding the
correct command references to get this done. There are a few more details on
how my environment is setup and I may put up more in a later blog post (time
permitting).
Saturday, June 24, 2017
Customer Loyalty or Short-Term Profits
I’m fortunate that I’m very busy these days (and looking for
partners with the right mindset/chemistry to help us out) but a downside of it
is that I’m now making last minute travel decisions, which inevitably lead to
mistakes done on my part. So what should travel companies (airlines, hotels,
car rentals, etc…) do when I make these mistakes?
For example, I booked myself to flight out of EWR to TPA in
two different flights four hours apart. I didn’t notice this until I had to
check in last night, so I called United to inquire about this. The lady that
answered my call confirmed that I indeed had two flights, asked me which one I
wanted to cancel and provided a refund. I’m sure that my status with the
airline played a role in her decision to provide the refund but she could’ve
read me the EUA and claim that I was at fault or partially at fault for being
too stupid to book two flights out of the same airport four hours apart. They
also waived a change fee of $200 in May when I mistakenly booked a different
flight for the wrong days and noticed after 24 hours have elapsed. In any case,
United decided to sacrifice short-term profits (twice) for customer loyalty.
You bet that I will continue to flight United.
Which leads me to Avis. I’ve been booking with Avis for
years and only booked with other car rentals when Avis was not available.
Earlier this week on my way back from Toronto (en route to NYC), I stopped for
the day in Philly to present at the VMUG UserCon. I reasoned that it would be
more efficient (and less costly) to rent the car for a day than it would be taking
a Lyft (I avoid Uber as much as I can; I haven’t liked their business ethics
for a long time). I needed to go from the PHL to the VMUG (about 35 miles) and
then to the Amtrak station in Philly to catch the train to Penn Station (about
another 30 miles). So far so good.
While at the VMUG Michael Fleischer (@michaelfleisher) tells me I could just go
over the border to Trenton (about 30-40 miles from the VMUG, about same
distance to the Amtrak station in Philly) and catch the NJ Transit train for
about $16 (versus $112 for Amtrak). Thinking that was a no brainer I did just
that (while in the process calling Avis to confirm the location of the Avis
office in Trenton) and dropped off the car to then be given an over $300 bill.
My first thought was “This is a mistake” so I asked the guy at the counter to
explain the bill and he tells me that I was charged for miles ($201 plus taxes),
and that I needed to call Avis customer service if I had further questions. I
thanked him for his time and left to catch my train.
I called Avis (billing department), provided the rental
agreement number and asked why I was charged for miles. The lady that answered
my call proceeded to explain that per the EUA that if I change the drop off
location (no mention that the drop off location was in another State) I
automatically get charged for miles (which later she clarified by stating the
rental rate could change in such
cases). Throughout the entire conversation she was very professional and
respectful. I acknowledged that I didn’t know what the EUA said (or that I had
read it for that matter…I never do) and that now I understood that per the EUA
I was fully responsible for the charges. I then asked her what could be done to
remove the charges (I have rented one-way before with Avis and I have never
been charged by the mile). She proceeds to quote the EUA again.
Then I decided to explain to her (or remind her I suppose
since she had my account info in front of her) that I was a long time Avis
customer, only rent cars with them and I would’ve expected, since this was a
first time occurrence, the per-mile fees would be waived. She said no they
won’t be waived because…the EUA. Sigh.
I asked her if I could speak to a supervisor and to my
surprise, she was it (or as she stated it “I am in the line of escalation”). At
this point she offered to refund 25% off the per-mile charges. I politely
refused, acknowledging again that I was liable for the charges per the EUA
(really, who reads the travel companies’ EUA anyway?) explaining to her that my
expectation for being a very long-term loyal customer of Avis is that Avis
would forgive this one-time occurrence. She said no that she couldn’t and then
proceeded to quote the EUA again (and no I’m not about to start reading travel
companies’ EUA; I have no time for that. I will continue to “initial here, here
and here, sign here”).
Somehow she thought I was still claiming that I was not
responsible for the charges and offered 50% off so I could still pay for my
part of being not-too-smart-for-not-reading-the-EUA (my words, not hers). I
politely refused and I explained myself again: I was not looking for a
discount, I was looking for Avis to show me they value their loyal customers
and if this was Avis position (to fall back on the EUA for minor things) then I
would be electing not to do business which Avis (why should I do business with
such a company?). Guess what she said now? Exactly, she proceeded to quote from
the EUA. After she finished quoting I thanked her for her time and wished her a
great day. The call lasted less than 8 minutes.
After the call I emailed Avis customer service asking them
what the process is to cancel my account. I didn’t include in the email the
reasons why I want to cancel the account hoping someone from Avis would reach
out, hear me out and tell me that I had a bad dream, yes Avis values customer
loyalty over short-term profits, wash and vacuum the cars after each return,
and they will reimburse the mile charges (I hate breakups so giving them one
last chance). We’ll see what happens. They are supposed to reply to my email
within 3-4 days (no, I didn’t get this information from the EUA J; I got it from their
automatic email acknowledging they received my email).
Elver’s Opinion: Avis is
choosing short-term profits (very SMALL short-term profits) in lieu of customer
loyalty. Which is fine by me. That's not how I run my business with my customer but the Avis execs run their company however they think is
best. I spend my money where I feel valued.
Friday, April 7, 2017
NSX Local Egress
Better late (VERY late) than never I suppose. Here is my
last blog post on the DC DCIs and it involves the Local Egress feature of NSX.
I don’t quite remember all I wrote on my earlier posts (which you can read
here, here and here) since it has been a bit since I wrote them (and I for some
reason can’t get into reading stuff I’ve written); but I believe I explained
what happens to egress traffic to the DC and how it affects the bandwidth
requirements.
In version 6.1 (or was it 6.2?) of NSX-v, VMware introduced
something they are calling Cross vCenter NSX. The architecture consists of
having multiple NSX Managers (one per vCenter) that can exchange some
configuration information as well as share a common set of NSX Controllers. The
goody part of it is that it allows you to have multiple vCenters managing
different DCs while stretching a broadcast domain (using VXLAN) among all the
DCs (up to 8 DCs). For this blog post we will assume the DCs are physical and
have DCIs between them.
On the surface this is a cool feature (putting aside that
stretching a broadcast domain over multiple locations violates the 11th
commandment) since it allows all VTEPs (belonging to the same Universal Transport Zone) in all DCs to be fully meshed with each other. However, this by
itself doesn’t address the problem of keeping local layer 3 egress traffic
localize and without crossing the DCIs.
To illustrate the problem, please have a look at the diagram
below.
Notice that the local copy of the Logical Router (it is an
Universal Logical Router, which is functionally the same as the Global Logical
Router) will have two paths to reach the intranet: one via the NSX Edge in DC1
and another over the NSX Edge over DC2. By default NSX Components are unaware of
locality (location) and thus to the Logical Router (specifically, the LR
Control VM that owns the OSPF/BGP adjacency the NSX Edges) both NSX Edges are
equally good to reach external networks and it can choose either NSX Edge to
forward the traffic to. There is a bit of more complexity here about route
costs, but we’ll skip those for this blog post.
So to handle the layer 3 egress traffic from the Logical
Router and keep it from going over the DCI, NSX introduces a method to provide
location awareness to the routing table creation. And the method is quite
simple: Assign an ID to all Universal Logical Router entities (the LR Control
VM and ESXi hosts but NOT the NSX Edges) that you want to belong to the same
“location”. Using this location ID, or Locale ID as VMware calls it (which is
128 bits long and defaults to the UUID of the NSX Manager – but it can be
changed), the NSX Controller (the Universal NSX Controller since we are doing
Cross vCenter NSX) will only populate the routing table to those ESXi hosts
that have the same Locale ID as the LR Control VM, as shown in the figure
below.
If you have been following up to this point then you noticed
that we introduced more problems than we actually solved (thus far). In the
diagram above we have two different problems: 1) Only ESXi hosts in DC1 would get
the routing table since they have the same Locale ID as the LR Control VM and
2) The LR Control VM still sees the NSX Edges in both DCs.
To solve problem 1, NSX allows you to deploy a second LR
Control VM from the other NSX Manager (up to 8 LR Control VMs, one per NSX
Manager in the Cross vCenter NSX domain) and you assign that second LR Control
VM the same Locale ID as the ESXi hosts in the DC2. The solution to problem 1
is shown in the diagram below.
To solve problem 2, it is necessary to prevent the LR
Control VM in DC1 from forming routing adjacencies with the NSX Edge in DC2
(and vice-versa for our second LR Control VM in DC2). The solution involves
creating a Logical Switch (Universal Logical Switch) per LR Control VM and
connecting only entities (NSX Edge, LR Control VM) that reside in the same “location”
to the Logical Switch, as shown in the diagram below.
Y ya. The combination of Locale ID, multiple LR Control VMs
and a dedicated Logical Switch per “location” ensures that egress layer 3
traffic from one location doesn’t cross the DCI.
Elver’s Opinion: I don’t
think I have an opinion today…actually I do (now that I think about it). Locale
ID is a good solution that NSX-v introduces to handle local egress, but I
think there is a missed opportunity here. VMware could allow the NSX Edge to be
assigned a Locale ID (after all the LR Control VM IS a modified NSX Edge) and
avoid the need to have a ULS per location. I’m not sure that I know the reason
why they didn’t design the Locale ID solution in this way (or that they have a good one).
Monday, February 13, 2017
DC Egress Traffic with Stretched Layer 2
In the last blog I spent a lot of writing (and funky
formulas) just to say that the DCI circuits between two Data Centers need to be
larger than the biggest DC WAN link plus the inter-DC traffic (which increases
your cost). When stretching layer 2 across DCs, there is not much that can be
done to force DC ingress traffic to come in via the DC WAN link where the
destination workload (VM) is running. However, there are some things you can do
to force the egress traffic to go out the DC where the VM is located (and avoid
using the DCI circuits) to reduce some of the cost associated with the DCI
links. For this blog post I’m going to assume that we have Active/Active DCs.
Dual Default Gateways
When you stretch the layer 2 across the DC, the default
getaway for the stretched layer 2 segments could be physically located in one
of the two DCs. We don’t care about that use case (that would probably require
a standalone blog post to talk about the cons of this design). Instead let’s
assume that we have default gateway services in both DCs, and to provide
redundancy, we will have two routers as default gateways in each DC, running FHRP
(something like VRRP), as shown in the diagram.
In this design, the VM will forward traffic to its local
default gateway, which in turn will forward the traffic out of it local DC WAN.
For this design to work (1) there must be a mechanism to stop each pair of DC
default gateways from seeing each other (otherwise you won’t get both pairs of
FHRP routers to be Active with the same virtual MAC) and (2) to prevent a VM in
one DC from receiving ARP replies for its default gateway from the other DC’s
pair of default gateways. You could achieve this with Access List (too manual)
or you could stretch the layer 2 with something like Cisco OTV, which has
built-in mechanism (less manual) to isolate FHRP in each DC.
This design does have some potential issues that must be
taken into account. If each pair of default gateways uses different virtual MAC
addresses when replying to ARP, a VM that moves DCs will lose connectivity
(until it re-ARPs for its default gateway). Also, if both members of a pair of
default gateways go down, you may have to remove the FHRP isolation to allow
the impacted VMs to reach the default gateways in the other DC.
Distributed Default
Gateway (Top of Rack)
An alternative to dual default gateways is to stretch the
layer 2 using VXLAN (or another tunneling protocol) from the Top of Rack (ToR).
In this design all ToR will have the Layer 3 boundary and be the default
gateways for their own racks. Every time the ToR gets an ARP request, the ToR
will respond to it (and provide local FHRP isolation), as shown in the diagram
below. Two examples of this are Arista’s DCI with VXLAN and VARP, Brocade’s IP
Fabric with Anycast gateway (for the time being until Broadcom decides what to
do with Brocade’s Network business).
One advantage of this design over the previous one: there
are a lot more routers (the ToRs) acting as default gateways and built in FHRP
isolation. If a ToR dies, only the rack where it resides will be impacted as
opposed to the entire DC. Also since all ToR have the use the same virtual MAC,
when a VM moves DCs, it continues to have uninterrupted Layer 3 connectivity.
One disadvantage is that you would need to fiddle with route advertisements to
ensure the ToRs forward traffic straight up the local DC WAN; this many not be
as easily done as it sounds.
Side note: there
is a variation of this design where the ToR are strictly layer 2 (let’s call
them Leafs) and the distribution switches (henceforth Spines) do the Layer 3, thus
providing the default gateway services.
Distributed Default
Gateway (software)
Just like the physical version, you stretch the layer 2
using a tunneling protocol (like VXLAN or GENEVE) but you have a layer 3 process
in each hypervisor that serves as the default gateway (e.g. virtual router).
Each virtual router will have the same IP and virtual MAC (thus VMs can move
between DCs at will) and locally respond to ARP requests. And like the physical
version, you must manipulate routes to force each virtual router to send
traffic to its local DC WAN, as shown in the diagram.
VMware’s NSX-v (distributed logical router) achieves this
functionality. Each logical router is the “same” in each hypervisor except for
their routing tables. Each logical router in each DC will get routes only relevant
to it. This way, each logical router is “forced” to forward traffic using its
local WAN.
Elver’s Opinion: This blog
post should (mostly) conclude my thoughts on stretching Layer 2 across the DCI
(think hard before doing it). At first I thought I would use this blog to also
talk about local egress in NSX (to wrap up my thoughts on the matter), but as I
wrote I realize I would need more space than I thought, so I’ll be writing
another blog post just on local egress in NSX.
Friday, January 13, 2017
Impact of Stretched Layer 2 on DCI
I was not clear in my DC Ingress blog post as to why it
matters which is the entry/exit point for flows coming from/going outside the
DCs for the application that is using the stretched layer 2 in an
infrastructure supporting BC with an Active/Active WAN architecture. One word
can summarize why it matters: cost. The moment you allow traffic not sourced to/destined
in the DCs to go over the links between the Data Centers, that link becomes a
transit segment and you must increase its speed to accommodate the additional
traffic.
Let me put back up the $1 diagram I used last time, but now
showing the connection between the Data Centers (the Data Center Interconnect,
or DCI).
The DCI between the two DCs needs to be big enough to handle
all inter-DC traffic (traffic with source and destination of the DCs; doesn’t
include transit traffic coming from/going outside the DCs). Lets call traffic
from DC1 to DC2 DCI1 and DC2 to DC1 DCI2. The speed of your DC1 WAN circuit
must be as big as the amount of ingress traffic in DC1. Same goes for DC2. If
we call the DC ingress traffic DCi1 and DCi2, and we are not doing any sort of
route manipulation, then some DCi1 traffic will transit the DCI to reach VMs in
DC2 and some DCi2 traffic will transit the DCI to reach VMs in DC1.
Since we don’t know how much “some” is going to be, we should
architect for worst-case scenario, like a WAN disruption changing flow patterns, or risk having some traffic dropped before
it goes over the DCI. So this is how much traffic the DCI would have to
handle:
If DCI1 + DCi1 ≥ DCI2 + DCi2 then DCI1 + DCi1, else DCI2 + DCi2
What this little formula says is that the speed of the DCI
link must be as big as the larger of traffic from DC1 or from DC2 (I’m making
the assumption the DCI is symmetrical; none of that asymmetrical bandwidth you
get from your home ISP).
But this formula is not complete. You see, the VMs will be
sending traffic back to the user (egress traffic). Let’s pretend the traffic
flow goes back the same way the ingress traffic came (worst-case again, as we
can't predict what would happen in the WAN). Using DCe1 to represent the VMs
in DC1 replying back to the user and DCe2 to represent the VMs in DC2 replying
back to the user, the formula becomes this:
If DCI1 + DCi1 + DCe1 ≥ DCI2 + DCi2 + DCe2 then DCI1 + DCi1 + DCe1, else DCI2 + DCi2 + DCe2
This formula is a bit long, so let’s do some thinking and
see if we can simplify this. Since we are architecting for worst-case scenario
and we are thinking BC, we can use the larger of DCi1 or DCi2 and call it DCiB.
DCiB will be coming in one of the WAN circuits of the DCs. Let’s give the same
treatment to DCe1 and DCe2, and call it DCeB. DCeB will be going out of one of the
WAN circuits in the DCs.
Elver’s Opinion: Since flow patterns are never static, it is a good idea to make the WAN circuits in both DCs the same size, and be the larger of DCiB or DCeB.
Elver’s Opinion: Since flow patterns are never static, it is a good idea to make the WAN circuits in both DCs the same size, and be the larger of DCiB or DCeB.
For sizing our DCI, we actually care about the larger of
DCiB or DCeB; let’s call it DCB. The reason for this is that in the event of WAN failure at DC2 all ingress
traffic comes in DC1 and transits over the DCI to DC2, and all egress traffic
will go from DC2 and transit the DCI to DC1 (and the following week, flow patterns reverse). This allows us to replace DCiX + DCeX for DCB.
We now make some substitutions to get this:
If DCI1 + DCB ≥ DCI2 + DCB then DCI1 + DCB, else DCI2 +
DCB
Which can be rewritten as:
If DCI1 ≥ DCI2 then DCI1 + DCB, else DCI2 + DCB
All of this writing and formulas just to say that the DCI
speed must be at least as big as your largest DC WAN circuit plus the largest
inter-DC traffic. Or put another way, the DCI circuit speed will be the
inter-DC traffic plus the transit traffic…and transit traffic we established is
the ingress/egress traffic in support of the application that is using the
stretched layer 2.
The higher the speed of the
DCI circuit(s), the higher the cost. It might not be as obvious, but the higher cost is not just for the
actual circuit. It is also for the hardware that is needed at both ends of the
circuit to support it and the intra-DC hardware required to support any other higher-speed
links that will have to carry the transit traffic.
I’ll write another post to discuss how to minimize the
egress traffic becoming DCI transit traffic. It is quite straightforward
nowadays to accomplish, with most major network vendors providing solutions for
it. I will give special placement to NSX, as it has to achieve it doing
something different from what the other vendors do.
Elver’s Opinion: Yes there
are traffic pattern schemes that would leverage a smaller size DCI than the
last formula above. However those cases don’t occur much in the wild when you
are tasked to provide an infrastructure that supports BC with Active/Active WAN
and stretched layer 2 for applications.
Subscribe to:
Posts (Atom)