Every now and again I’m end up recommending LACP for a
Software iSCSI deployment in a vSphere environment. And every now and then I
get pushed back because VMware recommends that LACP not be used, ahem, discourages
the use of LACP to carry iSCSI traffic. And every time I have to explain that
we need to read the fine print before we take a vendor’s recommendation against
the use of a widely used protocol and just run with it.
Everything else being equal, iSCSI will provide more
efficient load sharing among two or more links between the Software iSCSI
initiator with MPIO and a physical switch than LACP between the vDS and the
physical switch. LACP will use at most the VLAN and frame information
(L2/L3/L4) to define a “flow”, and pin that "flow" to one of the links in the
Port Channel. MPIO uses application (iSCSI) information, such as connection
IDs, to determine the link to send the egress iSCSI traffic, which provides more
granularity on what a “flow” is (as compared to LACP). MPIO will then pin that
narrower-defined (ßis
that even a word?) “flow” to one of the links in the Port Channel.
But how do we we know when "Everything else" is not "equal"? To help decide, we
should get additional information on the following:
Can LACP even be
configured?
If no, carry on; nothing to see here. Chassis based servers
(server blades) don’t support LACP to the blades. If yes, then read on.
What type of
performance does the initiator require?
If you have a situation where the initiator needs higher
throughput capacity than what is available over a single link, you probably
want to go with MPIO, as it will load share the iSCSI traffic more efficiently
than LACP. This will reduce the risk of sending traffic over an over-subscribed
link and having the traffic dropped.
How fast must
failover take place in case of link failure?
With MPIO, the initiator will fail the affected egress iSCSI
traffic immediately upon detecting the link down. However the physical switch
will not failover the traffic until it re-learns the initiator’s iSCSI MAC
address from the failed link over one of the remaining links. Thus the physical switch has a direct
dependency on the initiator. Contrast that with LACP,
where both the initiator and the physical switch will do the failover
immediately and independent of each other.
Elver’s Opinion: You should
have some sort of business guidance on how many link failures should be
tolerated in your uplinks (between the initiator and the first Physical
switch). It is not unusual for a 2:1 rule to be applied; have twice the number
of links you need so you can tolerate half the links failing.
Does the initiator
have sufficient NICs to dedicate to iSCSI?
MPIO is like a 2-year old (and some adults I know): it does
not share with others. Thus to use MPIO, you must have dedicated NICs for it.
All other Ethernet traffic must use other NICs. If you don’t have sufficient
NICs, you must share and should use LACP. If you want to add the additional
NICs, you need to analyze the cost, and the level-of-effort to do so, vs. the
reward. From experience, odds would tend to favor not getting the extra NICs.
Elver’s Opinion: LACP is not
really required here but if you don’t configure it, the initiator would have
less hashing options for load sharing all Ethernet traffic (iSCSI and non-iSCSI)
among the available uplinks, and the physical switch will not do ANY load
sharing of its own.
Does the Physical
switch have sufficient ports to dedicate to the iSCSI?
If you need to add port capacity in the switch, you might
need to weight the cost and level-of-effort required to add that capacity. Many
times, it won’t be as simple as just replacing the switch or adding a new one.
How many Arrays (IPs)
would the initiator talk to?
The more Arrays the initiator has to communicate with, the closer
in load sharing performance LACP will get to MPIO. If you have a single Array
(IP), LACP will see a single flow between the initiator and the target (unless
the session drops and gets reestablished with a different TCP source port). LACP
can be configured to use the destination IP as the load sharing hash, and the
more there are, the better distribution over the uplinks.
Elver’s Opinion: Note of
caution here. For the hash to do the best load sharing job it can, the IPs must
have some sort of variance in the last octet, which is related to the number of
active links in the Port Channel. If there are only two links in the Port Channel,
you should try to get a similar number of even-numbered and odd-numbered Array
IPs.
It looks like a tie,
so which should we choose?
Ask Operations. They are the ones that will wake up at 2am
to fix problems. The DC trend has been to consolidate as much as possible
to maximize the use of physical resources. However, not all Enterprises have
had their Operations teams update processes and knowledge transfer to take on a
change in direction. Choosing the "wrong" one here may increase OpEx to the business.
No comments:
Post a Comment