Friday, January 13, 2017

Impact of Stretched Layer 2 on DCI

I was not clear in my DC Ingress blog post as to why it matters which is the entry/exit point for flows coming from/going outside the DCs for the application that is using the stretched layer 2 in an infrastructure supporting BC with an Active/Active WAN architecture. One word can summarize why it matters: cost. The moment you allow traffic not sourced to/destined in the DCs to go over the links between the Data Centers, that link becomes a transit segment and you must increase its speed to accommodate the additional traffic.

Let me put back up the $1 diagram I used last time, but now showing the connection between the Data Centers (the Data Center Interconnect, or DCI).


The DCI between the two DCs needs to be big enough to handle all inter-DC traffic (traffic with source and destination of the DCs; doesn’t include transit traffic coming from/going outside the DCs). Lets call traffic from DC1 to DC2 DCI1 and DC2 to DC1 DCI2. The speed of your DC1 WAN circuit must be as big as the amount of ingress traffic in DC1. Same goes for DC2. If we call the DC ingress traffic DCi1 and DCi2, and we are not doing any sort of route manipulation, then some DCi1 traffic will transit the DCI to reach VMs in DC2 and some DCi2 traffic will transit the DCI to reach VMs in DC1.

Since we don’t know how much “some” is going to be, we should architect for worst-case scenario, like a WAN disruption changing flow patterns, or risk having some traffic dropped before it goes over the DCI. So this is how much traffic the DCI would have to handle:

If DCI1 + DCi1  DCI2 + DCi2 then DCI1 + DCi1, else DCI2 + DCi2


What this little formula says is that the speed of the DCI link must be as big as the larger of traffic from DC1 or from DC2 (I’m making the assumption the DCI is symmetrical; none of that asymmetrical bandwidth you get from your home ISP).

But this formula is not complete. You see, the VMs will be sending traffic back to the user (egress traffic). Let’s pretend the traffic flow goes back the same way the ingress traffic came (worst-case again, as we can't predict what would happen in the WAN). Using DCe1 to represent the VMs in DC1 replying back to the user and DCe2 to represent the VMs in DC2 replying back to the user, the formula becomes this:

If DCI1 + DCi1 + DCe1  DCI2 + DCi2 + DCe2 then DCI1 + DCi1 + DCe1, else DCI2 + DCi2 + DCe2
This formula is a bit long, so let’s do some thinking and see if we can simplify this. Since we are architecting for worst-case scenario and we are thinking BC, we can use the larger of DCi1 or DCi2 and call it DCiB. DCiB will be coming in one of the WAN circuits of the DCs. Let’s give the same treatment to DCe1 and DCe2, and call it DCeB. DCeB will be going out of one of the WAN circuits in the DCs.

Elver’s Opinion: Since flow patterns are never static, it is a good idea to make the WAN circuits in both DCs the same size, and be the larger of DCiB or DCeB.

For sizing our DCI, we actually care about the larger of DCiB or DCeB; let’s call it DCB. The reason for this is that in the event of WAN failure at DC2 all ingress traffic comes in DC1 and transits over the DCI to DC2, and all egress traffic will go from DC2 and transit the DCI to DC1 (and the following week, flow patterns reverse). This allows us to replace DCiX + DCeX for DCB.

We now make some substitutions to get this:

If DCI1 + DCB  DCI2 + DCB then DCI1 + DCB, else DCI2 + DCB

Which can be rewritten as:

If DCI1  DCI2 then DCI1 + DCB, else DCI2 + DCB

All of this writing and formulas just to say that the DCI speed must be at least as big as your largest DC WAN circuit plus the largest inter-DC traffic. Or put another way, the DCI circuit speed will be the inter-DC traffic plus the transit traffic…and transit traffic we established is the ingress/egress traffic in support of the application that is using the stretched layer 2.

The higher the speed of the DCI circuit(s), the higher the cost. It might not be as obvious, but the higher cost is not just for the actual circuit. It is also for the hardware that is needed at both ends of the circuit to support it and the intra-DC hardware required to support any other higher-speed links that will have to carry the transit traffic.

I’ll write another post to discuss how to minimize the egress traffic becoming DCI transit traffic. It is quite straightforward nowadays to accomplish, with most major network vendors providing solutions for it. I will give special placement to NSX, as it has to achieve it doing something different from what the other vendors do.

Elver’s Opinion: Yes there are traffic pattern schemes that would leverage a smaller size DCI than the last formula above. However those cases don’t occur much in the wild when you are tasked to provide an infrastructure that supports BC with Active/Active WAN and stretched layer 2 for applications.

No comments:

Post a Comment