Thursday, April 21, 2016

Software iSCSI and LACP - They Like Each Other

Every now and again I’m end up recommending LACP for a Software iSCSI deployment in a vSphere environment. And every now and then I get pushed back because VMware recommends that LACP not be used, ahem, discourages the use of LACP to carry iSCSI traffic. And every time I have to explain that we need to read the fine print before we take a vendor’s recommendation against the use of a widely used protocol and just run with it.

Everything else being equal, iSCSI will provide more efficient load sharing among two or more links between the Software iSCSI initiator with MPIO and a physical switch than LACP between the vDS and the physical switch. LACP will use at most the VLAN and frame information (L2/L3/L4) to define a “flow”, and pin that "flow" to one of the links in the Port Channel. MPIO uses application (iSCSI) information, such as connection IDs, to determine the link to send the egress iSCSI traffic, which provides more granularity on what a “flow” is (as compared to LACP). MPIO will then pin that narrower-defined (ßis that even a word?) “flow” to one of the links in the Port Channel.

But how do we we know when "Everything else" is not "equal"? To help decide, we should get additional information on the following:

Can LACP even be configured?
If no, carry on; nothing to see here. Chassis based servers (server blades) don’t support LACP to the blades. If yes, then read on.

What type of performance does the initiator require?
If you have a situation where the initiator needs higher throughput capacity than what is available over a single link, you probably want to go with MPIO, as it will load share the iSCSI traffic more efficiently than LACP. This will reduce the risk of sending traffic over an over-subscribed link and having the traffic dropped.

How fast must failover take place in case of link failure?
With MPIO, the initiator will fail the affected egress iSCSI traffic immediately upon detecting the link down. However the physical switch will not failover the traffic until it re-learns the initiator’s iSCSI MAC address from the failed link over one of the remaining links. Thus the physical switch has a direct dependency on the initiator. Contrast that with LACP, where both the initiator and the physical switch will do the failover immediately and independent of each other.

Elver’s Opinion: You should have some sort of business guidance on how many link failures should be tolerated in your uplinks (between the initiator and the first Physical switch). It is not unusual for a 2:1 rule to be applied; have twice the number of links you need so you can tolerate half the links failing.

Does the initiator have sufficient NICs to dedicate to iSCSI?
MPIO is like a 2-year old (and some adults I know): it does not share with others. Thus to use MPIO, you must have dedicated NICs for it. All other Ethernet traffic must use other NICs. If you don’t have sufficient NICs, you must share and should use LACP. If you want to add the additional NICs, you need to analyze the cost, and the level-of-effort to do so, vs. the reward. From experience, odds would tend to favor not getting the extra NICs.

Elver’s Opinion: LACP is not really required here but if you don’t configure it, the initiator would have less hashing options for load sharing all Ethernet traffic (iSCSI and non-iSCSI) among the available uplinks, and the physical switch will not do ANY load sharing of its own.

Does the Physical switch have sufficient ports to dedicate to the iSCSI?
If you need to add port capacity in the switch, you might need to weight the cost and level-of-effort required to add that capacity. Many times, it won’t be as simple as just replacing the switch or adding a new one.

How many Arrays (IPs) would the initiator talk to?
The more Arrays the initiator has to communicate with, the closer in load sharing performance LACP will get to MPIO. If you have a single Array (IP), LACP will see a single flow between the initiator and the target (unless the session drops and gets reestablished with a different TCP source port). LACP can be configured to use the destination IP as the load sharing hash, and the more there are, the better distribution over the uplinks.

Elver’s Opinion: Note of caution here. For the hash to do the best load sharing job it can, the IPs must have some sort of variance in the last octet, which is related to the number of active links in the Port Channel. If there are only two links in the Port Channel, you should try to get a similar number of even-numbered and odd-numbered Array IPs.

It looks like a tie, so which should we choose?

Ask Operations. They are the ones that will wake up at 2am to fix problems. The DC trend has been to consolidate as much as possible to maximize the use of physical resources. However, not all Enterprises have had their Operations teams update processes and knowledge transfer to take on a change in direction. Choosing the "wrong" one here may increase OpEx to the business.

No comments:

Post a Comment