CLI madness part III – Useful vsish and esxcli commands for iSCSI environments
With most deployments of ESXi using software iSCSI initiator, having knowledge of the NICs dedicated to the vmk port for iSCSI storage is important. Remember, it is not just the ESX host or the storage array, the network in between is crucial to your success in building out a solid foundation (“private cloud” – COUGH, damn marketing jargon :-P). Here’s an example of NIC driver setting tweak that could impact performance: setting the interrupt throttle rate of the Intel igb driver could boost performance for the 1G Intel NICs (check out this VMW KB). Now here’s the fun part:
Q. How the hell do I find out what NIC driver my iSCSI vmnics are using?
A. This is totally NOT obvious in the VI Client – I tried clicking through “Network Adapter” and other places in the VI Client and I can’t seem to find that shit. If you had an easy way that I overlooked, please post a comment so I stand corrected. With the CLI, you could find it pretty easily:
# esxcli iscsi networkportal list
In the output, you could easily see the vmhba# & NIC driver that the sw/iscsi initiator belongs to, and from there, you could determine whether specific esx patches contain updates to the driver that your current iSCSI storage traffic depends on, or tunable parameters that could help boosting/resolving performance issues (like the recent VMW KB above)
The same info could be found in vsish as well – first find out which vmnic the sw/iscsi vmhba is tied to. Quickest way is to go into the VI Client->click on the ESX host->click on “Configuration” tab->select “Storage Adapters”->click on iSCSI Software Adapter->Properties then select “Network Configuration” tab
Once you’ve found the vmnic, do the following in the ESXi shell:
#vsish
/> cat /net/pNics/vmnic2/properties
The output will clearly show the driver that the vmnic uses, its version, as well as the NIC capabilities:
Server class NICs nowadays have a pretty rich set of capabilities to offload tasks such as packet checksum from the OS. If you care to know what each capability means, this site has a pretty good comment reference for each capability. If I were you, I’d simply check to make sure TSO capability exists and is activated.
Another interesting place to look is the pNIC stats:
#cd /net/pNics/vmnic2
/net/pNics/vmnic2/> cat stats
Pay attention to Tx/Rx errors, CRC errors, Tx/Rx error counters. Seeing a non-zero value for these counters don’t necessarily mean your network is screwed up. The typical rule of thumb is no more than 1 CRC error for every 5000 packets received, and not more than 1 packet tx error for every 5000 transmit errors. If you do see large number of errors, try swapping out the network adapter cable first, trigger some I/O, then check the counters again. Typical culprit for NIC adapter errors is from layer 1(cables, NIC/switch ports) so do the usual problem isolation and test one factor at a time. Layer 2 could also trigger these errors – a common culprit would be speed/duplex mismatch between the NIC and the switch port. #esxtop is useful in seeing real time performance data/network packet stats on the ESX host, but tracking historical stats is much easier with vsish.
Now that you have checked the stats for the NICs that your iSCSI storage relies on, here’s a quick command to list all the datastores/LUNs that have been presented to your ESX host from the iSCSI array:
# esxcli iscsi adapter target list
Pay attention to the last column “Last Error” to make sure there were no error associated with any datastore in your ESX host.
Before wrapping up this post, here’s a little food for thought on further tuning for iSCSi environment.
# esxcli iscsi adapter param get -A vmhba37 <—this command will get the list of parameters for the sw/iscsi adapter; I’ve noticed a few that look interesting. There are three things I plan to try out next to see what effect it has on the ESXi environment, if any…if you have tried these out, feel free to share your results!
Name Current Default Min Max Settable Inherit
——————– ———- ———- — ——– ——– ——-
FirstBurstLength 262144 262144 512 16777215 true false
MaxBurstLength 262144 262144 512 16777215 true false
MaxRecvDataSegment 131072 131072 512 16777215 true false
MaxOutstandingR2T 1 1 1 8 true false
LoginTimeout 5 5 1 60 true false
NoopOutInterval 15 15 1 60 true false
NoopOutTimeout 10 10 10 30 true false
RecoveryTimeout 10 10 1 120 true false
DelayedAck true true na na true false
HeaderDigest prohibited prohibited na na true false
DataDigest prohibited prohibited na na true false
Food for thought 1: increase MaxOutstandingR2T (this parameter controls how many outstanding R2T(ready to transmits) can be outstanding when existing PDUs have not been acknowledged)
Food for thought 2: the LUN queue depth for the sw/iscsi adapter is also tunable (with default value of 128 for ESXi5); increasing this reduces the number of LUNs you could have in the environment per vSphere documentation
Food for thought 3: PSP_RR policy is recommended for iSCSI storage so all active paths get utilized; by default, the switching point is 1000 IOPS per path. If we reduce it to 10, IO would traverse to the next path quicker while incurring some CPU overhead.
Comments
Post a Comment