Tag Archives: RSS

Hyper-V dVMQ

Deep Dive: Configuring dVMQ in Hyper-V

Virtual Machine Queue (VMQ) is a mechanism for mapping physical queues in a NIC to the virtual NIC in a VM partition (Parent or guest). This mapping makes the handling of network traffic more efficient. The increased efficiency results in less CPU time in the parent partition and reduced latency of network traffic. Also, without VMQ, traffic for a vSwitch on particular network interface is all handled by a single CPU core. This limits total throughput on a 10GB interface to ~2.5-4.5GBits/sec (results will depend on speed of core and nature of traffic). VMQ is especially helpful on workloads that process a large amount of traffic, such as backup or deployment servers. For dVMQ to work with RSS, the parent partition must be running Server 2012R2, otherwise RSS can not coexists with VMQ.

VMQs are a finite resource. A VMQ is allocated when a virtual machine is powered on. A queue will be assigned to each vNIC with VMQ enabled until all of the VMQs are exauted. That assignment will remain in place until the VM is powered off or migrated to another hyper-v node. If you have more vNICs in your environment than VMQs on your physical adapter then you should only enable VMQ on the vNICs that will be handling the most traffic.

Static VMQ

NovsVMQ
This image represents a Hyper-V host configured without VMQ in place. All network traffic for all the VMs is handled by a single core. With static VMQ (available in 2008R2), a VMQ is assigned to a specific CPU, and will stay on the CPU independent of workloads.

Dynamic VMQ

dVMQ
This image introduces both Dynamic Virtual Machine Queue (dVMQ) and load balancing mode for NIC teaming. These features are new to Server 2012. dVMQ is very similar to VMQ with one major difference. Dynamic VMQ will scale the number of CPU cores used to handle the VMQs across pool of CPU cores. When the network workloads are light all the dVMQs will be handled by a single CPU core, but as the network workload increases so too will the number of CPU cores used. With dVMQ in 2012 each queue can only use one CPU core at a time. Also, a vNIC can only have one VMQ assigned to it.

Sum Mode/Min Mode

In our video we recommend Hyper-V Port AND Switch Independent for a Load Balance Failover Team (LBFO) configuration on switches supporting Hyper-V workloads. This load balancing mode and teaming mode will put the vSwitch in Sum mode. This mean that we will have the sum of all the VMQs from the NICs in the LBFO team. In the case of the left image above we have 2 NICs in the team each with 2 VMQs. With the team in Sum mode we have a total of 4 VMQs to allocate to vNICs. If we use AddressHash OR Switch Dependent configuration on the team, it will be placed in Min mode. In the right image above, the same hardware now only offers 2 VMQs for vNICs. This is because inbound traffic may come in on any network interface on the team for a particular vNIC. This may be a desirable configuration if you have very few vNICs on a vSwitch (vNIC count equal or less than the fewest VMQs on any NIC in the team).

Virtual Receive Side Scaling

Server 2012R2 introduces Virtual Receive Side Scaling (vRSS). This feature works with VMQ to distribute the CPU workload of recive traffic across multiple CPU cores in the VM. This effectively eliminates the CPU core bottleneck we experience with a single vNIC. To take full advantage of this feature both the host and guest need to be 2012R2. Enabling vRSS does come with the cost of extra CPU load in the VM and parent partition. For this reason, vRSS should only be enabled on vNICs that will be exceeding 2.5GBits/Sec on a regular basis.

Base and Max CPU

Base and Max CPU properties are used to configure what CPU cores will be used by VMQ.  The base processor is the first core in the group and max is the size of the group.  For example, Hyper Threading disabled and base=2 max=4 would assign cores 2-5.  VMQ will not leverage hyper threading (HT).  If HT is enabled then only even numbered cores will be used.  For example: HT enabled, base=2 max=4 would assign even numbered cores 2-8.  When ever possible it is best to choose a base value greater than 0 (or 1 in case of HT).  Creating CPU bottlenecks on core 0 has caused performance issues in some implementations.

Requirements and Configuration for VMQ

The following are required to use VMQ:
-Server 2008R2 (Static VMQ), Server 2012(dVMQ), Server 2012R2 (dVMQ+vRSS)
-Physical NICs must support VMQ
-BelowTenGigVmqEnabled = 1 for 1GB NICs (10GB NICs are auto enabled)
Follow these steps from the video to enable VMQ
0. Enable VMQ for 1GB if required
–HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\VMSMP\Parameters\BelowTenGigVmqEnabled = 1
1. Install the latest NIC driver/firmware
2. Enable VMQ in the driver for the NIC (Process will vary by NIC model and manufacturer)
3. Determine values for Base and Max CPU based on hardware configuration
4. Assign values for Base and Max CPU
5. Configure VMs

Recommendations for VMQ/dVQM/vRSS

-Use Switch Independent + Hyper-V Port to ensure the vSwitch is in SUM mode
-Always assign a base CPU other than CPU0 to ensure best performance and resiliency
-Remember when assigning Base/Max CPU using HyperThreading only even numbered cores are used
-Multiplexor Adaptors will show Base:Max of 0:0, do not change this item
-Configure Base and Max CPU for each NIC with as little overlap as possible
-Only assign Max Processor values of 1,2,4,8
–It is ok to have max processor extend past the last CPU core or number of VMQs on the NIC

Troubleshooting VMQ

Here are a few things we have seen in the field when supporting VMQ

  • Most issues with VMQ are resolved by updating to the latest version of the NIC driver!
  • VMQ appears enabled but is showing 0 queues. This may even only impact a single port on a multiport NIC.
    • *RssOrVmqPreference = 1 Must be set on all NICs that will leverage VMQ (Follow this Link for more information)
    •  HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4D36E972-E325-11CE-BFC1-08002BE10318\[GUID of NIC Port]\*RssOrVmqPreference = 1

If you have an issue that you have experienced in your environment not listed here let me know so I can add it to the list!

PowerShell Code to Auto Configure VMQ Base/Max Processor

ConfigureVMQ.ps1

$Teams = Get-NetLbfoTeam
$proc = Get-WmiObject -Class win32_processor
$cores = $proc| Measure-Object -Property NumberOfCores -Sum|select -ExpandProperty sum
$LPs = $proc| Measure-Object -Property NumberOfLogicalProcessors -Sum|select -ExpandProperty sum
$HT = if($cores -eq $LPs){$false}else{$true}
function SetVMQsettings ($NIC, $base,$max){
    #$nic|Set-NetAdapterVmq -BaseProcessorNumber $base -MaxProcessors $max
    Write-Host "$($nic.name):: Proc:$base, Max:$max"
}
#$LPs = 4 #testing var
#$ht = $false #testing var
foreach ($team in $teams){
	$VmqAdapters = Get-NetAdapterVmq -name ($team.members)
	#Create settings
	$VMQindex = 0
	Foreach($VmqAdapter in $VmqAdapterS){
		$VmqAdapterVMQs =$VmqAdapter.NumberOfReceiveQueues
        #$VmqAdapterVMQs = 2 #testing var
		if ($VMQindex -eq 0){#first team nic
			#base proc 1+HT and max eq to num remaining cores, num queues, whatever is less
			$base = 1+[int]$ht
			$max = ($LPs/(1+$HT)-1), $VmqAdapterVMQs|sort|select -Index 0
            SetVMQsettings -nic $VmqAdapter -base $base -max $max
           }
        else{#all other nics exclusing first team nic
            if ($VmqAdapterVMQs -gt ($LPs/(1+$HT))){ #queues exceeds core count, so just start at base+1
                $base = 1+[int]$ht
                $max = ($LPs/(1+$HT)-1), $VmqAdapterVMQs|sort|select -Index 0
                SetVMQsettings -nic $VmqAdapter -base $base -max $max
            }
            else{ #cores greater than Queues so ballancing is possible
                $StepSize = [int]((($LPs/(1+$HT))-$VmqAdapterVMQs-1)/($VmqAdapters.count-1))*$VMQindex+1
                $base = $StepSize * (1+$HT)
                $max = ($LPs/(1+$HT)-1), $VmqAdapterVMQs|sort|select -Index 0
                SetVMQsettings -nic $VmqAdapter -base $base -max $max
            }
        }
		$VMQindex++
	}
}

Resources

TechNet Networking Blog: Deep Dive VMQ Part 1, 2, 3