SRIOV on Mellanox ConnectX-6 Infiniband: Struggles & Learnings

Kailash Chander
Geek Culture
Published in
2 min readJun 5, 2022

--

Key learnings from my struggle of connecting a VM to a infiniband assigned virtual function.

Image courtesy Flat Icons

Following a documentation to configure SRIOV on a system with a Mellanox Infiniband card, everything worked as expected except that the connectivity between VM & infiniband wouldn’t work because the virtual function state was down.

[root@mymy858 ~]# ibstat
CA 'mlx5_0'
CA type: MT4124
Number of ports: 1
Firmware version: 20.28.4512
Hardware version: 0
Node GUID: 0x000000
System image GUID: 0xbbbbbb
Port 1:
State: Down
Physical state: LinkUp
Rate: 10
Base lid: 65535
LMC: 0
SM lid: 1
Capability mask: 0x22221
Port GUID: 0x000000
Link layer: InfiniBand

Within virtual machine, If you run ip a , you may notice a <NO-CARRIER> on the ib0 link seen inside the VM.

Identified Issues

  • multiple subnet managers (opensm) were running on different systems in this environment.
  • some of these subnet managers had non default priorities configured
  • we didn’t know which subnet manager was master and whether it had virtualisation enabled?

Learnings

  • Do not configure multiple subnet managers. By default mellanox switch comes with a SM on it. Keep this SM as your primary SM. You can set the priority of this SM to 15 (max) through the switch UI.
  • Identify how many SM’s are running. Run ibdiagnet to find out the number of SM’s actively running and which one has highest priority. You will find this information in the output of ibdiagnet.
Master SM: Port=0 LID=1 GUID=0x0abcdefghijkl devid=123456 Priority:14 Node_Type=SW Node_Description=MF0;switch-bkbkbk:MSB0012/U1
Standby SM : No Standby SM
  • In most cases, keeping only a single SM running, preferably on the switch is best case.
  • Enable virtualisation on the SM running on the switch by executing below command through mellanox switch console.
ib sm virt enable

Finally, the virtual function link was up and vm to infiniband ib0 connectivity was functional 🍪

[root@mymy858 ~]# ibstat
CA 'mlx5_0'
CA type: MT4124
Number of ports: 1
Firmware version: 20.28.4512
Hardware version: 0
Node GUID: 0x000000
System image GUID: 0xbbbbbb
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 11
LMC: 0
SM lid: 1
Capability mask: 0x22221
Port GUID: 0x000000
Link layer: InfiniBand

Summary

I hope this post would provide helpful pointers to anyone stuck in such a situation. Feel free to comment, if you have any questions around SRIOV on infiniband.

Good Luck.

— — — — — —

--

--