Saturday, February 6, 2016

10 Network Experts in a Room, 11 Opinions on SDN

I’m shocked, shocked I tell you (I’m not actually). In a room full of Network Experts (most of them bloggers, some whom I know and respect), there was disagreement on what Software Defined Network (SDN) means; I recommend you watch the video. I landed on the video via Ivan Pepelnjak, who referenced John Herbert, who linked to the video. I have earlier written a piece about what SDN is and is not (if you have not yet read the twelfth opinion).

The participants in the room did not project to fully understanding the forces pushing for SDN in the Data Center as their opinions seem to be very Network centric. So why SDN? The primary driver behind SDN is that traditional Networks are the last component of the Data Center Infrastructure (Compute, Network and Storage) that continues to hinder Application and Business Services evolution. All Data Center workloads will be virtualized and there is no way to get an efficient Cloud Infrastructure deployed, to support the Virtualized Workloads, with a Traditional Network. And Good luck with having a zero-touch Disaster Recovery Plan with just a Traditional Network.

Many people think of the Cloud as a way to run your Virtualized Workloads in someone else’s servers and disks. Although there is some truth to that (and leaving the distinction between Private and Public Clouds aside), it is my opinion we should look at the Cloud from the perspective of the Clients (both Public and Private sectors, henceforth Business). The Client’s job boils down to do either 1) Grow the metric by which the Business measures success (such as revenue or service delivery) or 2) Increase Business efficiency (such as reducing expenses or providing more services with the same or less). It is the same job the Client has been doing since, well, always. The Cloud is a Client tool that facilitates the successful execution of the Client’s job by reducing the perceived technology complexity of the IT Infrastructure while improving its efficiency and minimizing time-to-market of Business Services.

You know what has little perceived technology complexity in the Data Center? Compute and Storage (Compute more so than Storage, but the gap is rapidly narrowing). You know what is really efficient in the Data Center today? Compute and Storage. And you know what offers minimal time-to-market of Business Services? You are correct! Compute and Storage.

SDN allows the removal of the perceived Traditional Network complexity by facilitating the deployments of networks that do not have some of the inherited limitations of Traditional Network. For example, there is no way to get a Layer 2 loop in a SDN solution, thus you won’t have to worry about configuring STP within the SDN. Have you heard of an OSPF Virtual Link? You will never see that in a SDN because the idea of a broken OSPF Backbone just doesn’t exist.

Since I mentioned STP, let me use it to also point out that in a SDN without STP you have way more efficient utilization of your links. Better yet, if you have Virtual Workloads are in the same Hypervisor, traffic between those Virtual Workloads will get all of its Network Services without leaving the Hypervisor. It is hard to see how you get more Network-efficient than that. If the workloads are physical and they connect to the same SDN Top of Rack (ToR), then the traffic between those workloads would get all the Network Services in the same ToR.

As for time-to-market, all I need to say is this: deploy 50 identical networks consisting in 20 routers, 87 Layer 2 domains and allow internal traffic to reach the same external entity. GO. That’s right, it would take you months with a Traditional Network to accomplish this task. With SDN, minutes. Seriously, minutes.

Disaster Recovery
Many people still think of a DR (not to be confused for Dominican Republic) event as a sinkhole swallowing the primary Data Center (or if the Data Center is in Florida, we add to the list a hurricane dropping your Data Center in the sea). Thinking about it this way omits the Client’s perspective. To the Client a DR event may just be not having a Business Critical Service available. The reason for that Business Critical Service not being available could be as simple as a Tier in the Application becoming unreachable due to the Tier's default gateway having a tantrum.

TCP/IP was developed with the assumption that a subnet resides in a single physical location. If you want the subnet to be reachable via two or more physical locations, about the only tool you had at your disposal is to stretch the Layer 2 where the subnet resides (I will not discuss the alternative of using BGP to advertise the subnet from multiple locations as the point below would be the same).

Traditional Networks do not allow you to automatically migrate subnets. So when a workload needs to be moved to recovered at a new location (such as the Backup Data Center), almost always the RTO includes involvement of a Network Administrator; which involves a Change Control; which potentially leads to some red tape; which adds more time to the RTO. SDN does not really natively understand what a location is and therefore can facilitate for automatic recovery of about any DR situation the business may encounter.

For example, a hypervisor’s number came up, taking down with it all the Business Critical Virtual Workloads (bad design by the way, but it happens). The Virtualization Plane would then recover all those Virtual Workloads in new hypervisors. By the time those Virtual Workloads are fully powered up, the SDN solution would have updated all necessary components in the hypervisors with the necessary Network information to provide the Network Services to those Virtual Workloads. SDN doesn’t care where those hypervisors are located and not a single Network Administrator’s involvement was required (nor change controls). That translate to potentially much lower RTO.

Circling back to the default gateway having a tantrum, let's assume a design where not all members of the same Tier run in the same hypervisor/off the same ToR (you shouldn't place all your eggs in one basket). SDN has a distributed Data Plane among all hypervisors/ToRs, and a default gateway having a tantrum will only affect the members of the Tier that are running in the hypervisor/off the same ToR where the faulty default gateway resides. All other Tier members will continue to receive normal Network Services and the Client never even experiences a DR event.

Another benefit of SDN, as it relates to DR, is that SDN allows you to rethink the architecture to support DR events. Since SDN does not have native location awareness, there are no such thing as Primary and Backup Data Centers. They are just Data Centers in Active/Active configurations. Heck, you can extend this idea to multiple Data Centers and have them all be Active. This can potentially translate to smaller Data Center footprints (square meters/feet) with increased geographical resiliency.

Elver’s Opinion: We can't answer where we are going unless we understand where we are coming from. And to understand that we need feedback from others outside of our field. I feel the conversation among the smart people in the video about what is SDN could’ve benefited by having more Virtualization and Storage (and some Application Developers too) Experts in the room.

No comments:

Post a Comment