Diagnostic Tools to diagnose Infiniband Fabric Information


There are a few diagnostic tools to diagnose Infiniband Fabric Information. Use man for the parameters for the

  1. ibnodes – (Show Infiniband nodes in topology)
  2. ibhosts – (Show InfiniBand host nodes in topology)
  3. ibswitches- (Show InfiniBand switch nodes in topology)
  4. ibnetdiscover – (Discover InfiniBand topology)
  5. ibchecknet – (Validate IB subnet and report errors)
  6. ibdiag (Scans the fabric using directed route packets and extracts all the available information regarding its connectivity and devices)
  7. perfquery (find errors on a particular or number of HCA’s and switch ports)

ibnodes (Show Infiniband nodes in topology)

ibnodes is a script which either walks the IB subnet topology  or  uses an  already  saved  topology  file  and  extracts the IB nodes (CAs and switches)

# ibnodes
.....
Ca      : 0x0000000000009b02 ports 2 "c00 HCA-1"
Ca      : 0x0000000000005af0 ports 1 "h00 HCA-1"
Switch  : 0x00000000000000fa ports 36 "IBM HSSM" enhanced port 0 lid 19 lmc 0
.....

ibhosts  (Show InfiniBand host nodes in topology)

ibhosts is a script which either walks the IB subnet topology  or  uses an already saved topology file and extracts the CA nodes.

# ibhosts
Ca      : 0x0000000000009b02 ports 2 "c00 HCA-1"
Ca      : 0x0000000000005af0 ports 1 "h00 HCA-1"

ibswitches (Show InfiniBand switch nodes in topology)

ibswitches is a script which either walks the  IB  subnet  topology  or uses an already saved topology file and extracts the switch nodes.

# ibswitches
Switch  : 0x00000000000003fa ports 36 "IBM HSSM" enhanced port 0 lid 19 lmc 0
Switch  : 0x00000000000003cc ports 36 "IBM HSSM" enhanced port 0 lid 16 lmc 0

ibnetdiscover (Discover InfiniBand topology)

ibnetdiscover performs IB subnet discovery and outputs a human readable topology file. GUIDs, node types, and port numbers are displayed  as  well as port LIDs and NodeDescriptions.  All nodes (and links) are displayed (full topology).  Optionally, this utility can be used to list the current connected nodes by nodetype.  The output is printed to standard output unless a topology file is specified.

# ibnetdiscover
#
# Topology file: generated on Mon Jan 28 14:19:57 2013
#
# Initiated from node 0000000000000080 port 0000090300451281

vendid=0x2c9
devid=0xc738
sysimgguid=0x2c90000000000
switchguid=0x2c90000000080(0000000000080)
Switch  36 "S-0002c9030071ba80"         # "MF0;switch-6260a0:SX90Y3245/U1" enhanced port 0 lid 2 lmc 0
[2]     "H-00000000000011e0"[1](00000000000e1)          # "node-c01 HCA-1" lid 3 4xQDR
[3]     "H-00000000000012d0"[1](00000000000d1)          # "node-c02 HCA-1" lid 4 4xQDR
....
....

ibchecknet (Validate IB subnet and report errors)

# ibchecknet
......
......
## Summary: 31 nodes checked, 0 bad nodes found
##          88 ports checked, 59 bad ports found
##          12 ports have errors beyond threshold

perfquery command

The perfquery command is useful for find errors on a particular or number of HCA’s and switch ports. You can also use perfquery to reset HCA and switch port counters.

# Port counters: Lid 1 port 1
PortSelect:......................1
CounterSelect:...................0x1400
SymbolErrorCounter:..............0
LinkErrorRecoveryCounter:........0
LinkDownedCounter:...............0
PortRcvErrors:...................13
PortRcvRemotePhysicalErrors:.....0
PortRcvSwitchRelayErrors:........0
PortXmitDiscards:................0
PortXmitConstraintErrors:........0
PortRcvConstraintErrors:.........0
CounterSelect2:..................0x00
LocalLinkIntegrityErrors:........0
ExcessiveBufferOverrunErrors:....0
VL15Dropped:.....................0
PortXmitData:....................199578830
PortRcvData:.....................504398997
PortXmitPkts:....................15649860
PortRcvPkts:.....................15645526
PortXmitWait:....................0

References:

  1. Appendix B. InfiniBand Fabric Troubleshooting
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s