LACP and vSphere (ESXi) hosts: not a very good marriage

I receive a lot of questions from customers if they should implement LACP rather or not, so without further ado:

In this blog I’m going to talk about if it is a good idea to configure LACP between your ESXi hosts and the physical switched network? The Link Aggregation Control Protocol delivers enhanced features for a Link Aggregation Group (LAG), ensuring a stable connection between 2 network devices over multiple physical links by exchanging LACP-BPDU’s. Sound good right? And yes, from a network perspective it surely does! But from an vSphere ESXi hosts there are other “things” to think about, which I will cover in following chapters.

ESXi virtual switches and LACP

ESXi has 2 options when it comes to virtual networking: the vSphere Standard Switch (VSS) and the Distributed vSwitch (DVS). The VSS is ESXi local, which means it can only be managed from the ESXi host itself. Where the DVS can only be managed through the vCenter server, its configuration is being distributed to all connected ESXi hosts using Host Proxy Switches. The DVS offers several improvements over a VSS, like (for example) LACP support. The VSS does not support LACP.

When configuring LACP you have to configure the LAG on the DVS and have to manually add the physical NICS (pNIC or VMNICs) of each individual host to LAG uplinks (or you can do it scripted as you should).

The amount of uplink ports in a LAG have to be configured globally and this amount of LAG-uplinks are distributed to all Host Proxy Switches, which means that when 2 LAG uplinks are configured at the DVS/vCenter server level all hosts connected to that DVS will receive two LAG uplinks. The connected vSphere ESXi hosts cannot deviate from that amount of LAG uplinks.

The LAG as a logical link is being handled as an DVS-uplink itself, so you can use it to load-balance traffic between normal DVS uplinks and LAG uplinks.

The benefit is that LAG as logical link can utilize all the available bandwidth and you can add additional bandwidth by adding additional physical links. It can also help in case of failed physical connection, the connection will automatically be detected by LACP and will be thrown out of the logical link. This is what we call a Layer 2 high availability solution: the logical path is controlled by LACP and automatically scales when needed creating an optimal path between two devices.

Let me clarify the “logical link can utilize all the available bandwidth” feature: LACP is using IP Hashing for its load balancing algoritme, which means that a single network flow cannot exceed the bandwidth of a single physical connection.

In the worst case (example) you can end up having two elephant flows and one mice flow: The two elephant flows can reside on the same physical link and the mice flow on the other. The two elephant flows will have to share the available bandwidth of the same physical connection, resulting in poor network performance, while the mice flow has enough bandwidth available.
That’s just the nature of IP hashing.

Comparing Virtual Port ID and LACP

A DVS and/or VSS-switches offers multiple load-balancing options: by default load balancing based on Virtual Port id (sometimes called Source-MAC pinning) is being used on the VSS and DVS. It has the same drawback as IP hashing, but the good news is that having this type of load-balancing is that you do not have to configure the physical switch for layer 2 availability (LACP/IP hashing): A Virtual Machine is pinned to an uplink or an interface and it will stay there, as long as no failure occurs. When a failure on the pNIC occurs, the VM will be pinned to another available pNIC and (when configured properly). a RARP packet is being send to inform the physical switch so it can learn the MAC-address on this new interface, minimizing the outage time.

So let’s compare LACP/IP Hash and Source-MAC pinning/virtual port-id: they both offer the possibility to utilize the available bandwidth and both offer a form of physical interface resiliency, at the downside is that for LACP you need to manually configure LACP bundles on the physical switch AND on the DVS AND manually add the pNICs to the LAG (read: a lot of manual, prone-to-error configurations).

Compare LBT and LACP

A DVS offers the possibility to use the Load Balance Teaming (LBT)-load balancing option (= a good word for scrabble). You can see LBT as the enhanced Virtual Port ID load balancing option: It acts the same but with one mayor difference: It monitors the bandwidth utilization every 30 secondes and re-distributes the MAC-addresses (VMs) over the available pNICs when the bandwidth utilization is above the 75%. This will spread the bandwidth utilization over all available pNICs evenly. This will overcome the “pinning” problem which the “normal” Virtual Port ID load balancing option has. With LACP, it isn’t possible to distribute the load evenly over the available physical links: an IP address is pinned to a physical uplink. So in this battle against VMware vs LACP, this point is given to VMware as it offers a more enhanced feature compared to LACP.

Monitoring LACP on an vSphere ESXI host

So let’s continue with a operational task: Monitoring the LACP logical link.
Why is this important you may ask yourself? The answer to that question is quite simple: The more operational tasks are needed, the complexer the solution, the easier it is to make mistakes.

Keep in mind that with both Virtual Port ID and Load-Balanced Teaming (LBT) options, there is no configuration needed on the physical switch to enable the distribution of the VM workloads over the available physical NICs. Both teaming policies follow the networking standards and utilized all physical links by default (as long as they are active). So less configuration equals easier configuration, which lowers the operational complexity.

In the example below, I’m showing what is needed to monitor the LACP logical link between a VM and a switch:

In this example I’m assuming that the network cables are connected correctly and that LACP is also configured correctly.

Let’s start from the physical switch side.
From a physical switch perspective you can utilize the following command:

show port-channel brief

This command will show you the status of a LACP port channel:

result of show port-channel equivalent command

So as you can see here, is that LACP is configured as PortChannel 2 (PO2) and it is using layer 2 and is Up (status: SU), the configured ports are Fa0/10 and Fa0/11. Both ports are active (status:P).

If you have multiple ESXi host, each host has its own PortChannel ID. It is becoming a daunting task if you have a large amount of port channels, you will have to start tracking down the ports and the connected ESXi host. This piece of information is not show with this command. You can utilize CDP and/or LLDP or when those are not available, consult you (hopefully not outdated) networking documentation to track down the correct ESXi host. #prone-to-error

Monitoring LACP from a vSphere ESXi perspective is even a little bit harder.
The LACP configuration is being provisioned from the vCenter and distributed to the proxy switches hosted on the ESXI hosts: It is centrally managed. On the other hand: The LACP port channel status has to be monitored from the ESXi host itself. So if you have a large amount of ESXi hosts, you have to log into each individual ESX host to check the status.

You cannot check the LACP status from the GUI, you have the SSH into the ESXi host. Enabling SSH on a ESXi hosts is isn’t a best practice at the first place!
After logging in you have to execute the following command:

esxcli network vswitch dvs vmware lacp status get

You will receive a result like this:

result of the LACP STATUS GET command

Not a very nice overview if you ask me.
To summarize, you can see that both physical links (vmnic0 and vmnic1) have a status Flag of “SA”:
– it uses Slow LACPDU’s.
– it is Active.

You can state that monitoring the LACP LAG isn’t easy with VMware ESXi. The reason behind this is that LACP is only be integrated within ESXi because network admins just keep asking VMware to implement LACP. It just have been implemented as an afterthought as VMware ESXi offers good alternatives.

Using LACP with multiple physical switches.

In the above example we connected the ESXi host to one physical switch, which isn’t a best practice also: if the physical switch dies, you end up bare handed. Connecting a ESXi host to multiple switches imposes new challenges in regards to LACP.

LACP is using PDU’s (Protocol Data Units) to establish a logical connection between two (2) devices over multiple links. These devices will exchange PDU’s to see which links should be placed into the logical link (and which not). When a ESXi host is connected to multiple physical switches, the switches aren’t able to enumerate these PDU’s correctly. The switches work independently, have a separated management plane, which make them unusable for LACP. There is a solution to this problem called Multi-Chassis Link Aggregation Grouping (MLAG). With MLAG, two independent switches can build a LAG to a single device (or another MLAG) by acting as single device.

The switches needed to be interconnected to each other, which makes them able to:
– See if there neighbor is still alive. (usually over a separated physical link called the keepalive link)
– Send/receive packets to the other physical link (over the neighboring switch).

Below a schematic overview of a MLAG to another switch

MLAG overview

In the network world MLAGs are very common these days, as they overcome traditional network problems caused by loops. They still require a manual configuration AND the configuration on both MLAG peer switches must be exactly the same, which again imposes operational challenges, because you now have three (3) devices that need to be configured correctly in order to work. The amount of dependencies are constantly growing (with MLAG), but without an actual need as Virtual Port ID and LBT do not need LAG and/or MLAG. You can connect you ESXi host to two (or more) independent switches, use either Virtual Port ID or LBT and you are good to go.

Spanning Tree ?

A little bit off-topic.
You can now ask yourself: But how about the traditional network problems caused by loops with ESXi? The answer is simple: The virtual switch within ESXi, by default, prohibits loopings. In the physical network you use Spanning Tree to disable redundant links which protects agains loops: With ESXi there is no need for Spanning Tree and should be set to mode “portfast”, which essentially disables spanning tree for the ESXi host.

Conclusion.

With the available load-balancing algoritmes available in vSphere there is no need for a complex, prone-to-error LACP configuration, as VMware offers good (and some better) options from a configuration, utilization and availability perspective.

So why do some customer still use LACP you might ask? Usually it is because of the a lack of (VMware-) knowledge and add up the good experiences with LACP from the past by network admins.

LACP between switches and bare-metal servers are still a very good option, but VMWare offers some enhancement which neglect the use of LACP for vSphere environments.

3 gedachtes over “LACP and vSphere (ESXi) hosts: not a very good marriage

    1. I’m disagreeing with ALL your statements:
      – Starting with “performance of LACP is better”: LACP uses IP hashing, were a single flow cannot exceed the bandwidth of a physical link (which is the same for Virtual Port ID and LBT). So no performance gain for LACP here.
      LBT even offers performance improvements as it allows to migrate traffic flows to a lower utilized physical link. This point goes to LBT.
      – Regarding the “detection of upstream network issues”: LACP uses only LACPDU handshakes for its upstream device issue detection, LBT depends on the link state (by default) or it can also utilize beacon probing. Again, both solution provided upstream device issue detection.
      – “LACP is also the VSAN recommendation because VSAN cannot utilize LBT”, this statement is completely wrong as per https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.vsan-planning.doc/GUID-031F9637-EE29-4684-8644-7A93B9FD8D7B.html
      LBT (Route based on physical network adapter load) is supported and the recommended option. This recommendation is part of the VMware Validated Design: https://docs.vmware.com/en/VMware-Validated-Design/5.1/sddc-architecture-and-design/GUID-A60671C3-BBBE-4A87-A55F-0243A003F4F7.html

      Like

  1. This is a really useful article that addresses some of the issues I’m currently trying to resolve around undetected upstream link failures going undetected by VMs. I’ll do some more reading on LBT and Beacon Probing. Many thanks!

    Like

Laat een reactie achter op Jonathan Reactie annuleren

Vul je gegevens in of klik op een icoon om in te loggen.

WordPress.com logo

Je reageert onder je WordPress.com account. Log uit /  Bijwerken )

Google photo

Je reageert onder je Google account. Log uit /  Bijwerken )

Twitter-afbeelding

Je reageert onder je Twitter account. Log uit /  Bijwerken )

Facebook foto

Je reageert onder je Facebook account. Log uit /  Bijwerken )

Verbinden met %s