Live Migration Support for GPU with SR-IOV [email protected] [email protected] [email protected] [email protected] Agenda – Hypervisor/QEMU • GPU Virtualization Solutions • SR-IOV GPU Virtualization Hypervisor View • Current Migration Status • Migration Sequence • Challenge of Hypervisor’s View Common GPU Virtualization Pass-through SR-IOV MDEV Others Virtio-GPU OS VM OS VM OS VM rCUDA GPU GIM GPU Host vGPU Others Hypervisor Hypervisor MDEV emulation GPU GPU PF VF VF GPU IOMMU IOMMU IOMMU PT mode remapping • Full GPU capability, full featured • High performance • Production and commercialized • Advance features: Live Migration support for SR-IOV and MDEV Virtualization of SR-IOV GPU Host Guest • PCIe PF/VF interface libvirt • GPU graphics engine partitioned to qemu Guest GPU User driver support multiple VFs Kernel mmio Drbell • GPU video encoder engine VFIO vfio-pci FB partitioned to support multiple VFs gim.ko vfio type1 KVM IOMMU PCI VF bars • Host driver (gim.ko) controls VF scheduling GPU PF VF0…15 • No display for Server GPU GIM control of GPU internal resource: engines bandwidth, Framebuffers, etc. Live Migration for SR-IOV GPU • Collaborated between Alibaba Cloud Virtualization Team and AMD Virtualization Team • Prototype solution based on AMD GPU MI25 • Support graphic 3D rendering migration • Support planned for MM encoding engine migration in the future • Support VM with SR-IOV VF checkpoint • Service downtime: ~500ms with 1G graphic memory Source VM Target VM Live Migration Evaluation Result Service Downtime (ms) 700 600 s m Guest Configuration: 500 : e m • 8vCPU, 1GPU i 400 t n w o • 2GB System RAM d 300 e c • GPU FB: 512MB/1024MB i v 200 r e S 100 0 Desktop idle Unigine Heaven 512MB 1024MB QEMU High Level Migration Sequence Migration start Migration start SRC notify start of DST notify start of migration migration Migration log_sync Migration log_sync Iterate-Round n Iterate-Round n Get and transfer fb and Transfer FB content to Dst VF RecvFb page info system page info Recvsystem page info Migrate VMState Migrate VMState Stop scheduling Service Last round Last round downtime Get and transfer fb and Transfer FB content to Dst VF Recvfb page info system page info Recvsystem page info Transfer VF state to target VF Get VF state Restore VF state Migration end Migration end notify end of VF Add into scheduler Source QEMU Target QEMU Challenges • Who should stop first: CPU or GPU • Memory tracking • GPU -> system memory tracking • GPU -> Framebuffer tracking • GPU workload preemption/World Switch • GPU internal status migration • Page table • Interrupts • Context • Registers save/restore Agenda – GPU • SR-IOV Architecture • SR-IOV SW Stack • SR-IOV Advantage for VF Migration • SR-IOV VF Migration • Demo Video • Challenge Single-Root I/O-Virtualization (SR-IOV) • Defines hierarchy of Physical Functions (PF) / Virtual Functions (VF) with a single root complex • Mix of PFs and VFs • SR-IOV Capability Structure defines VF Capabilities associated with each PF • Each VF is uniquely addressable with RID • VFs have their own Configuration Spaces and Capability Structures • PCIe endpoint is responsible for VF-PF scheduling and HW resource sharing • SR-IOV is built on PCIe base spec v1.1 or later Source from SR-IOV spec 1.1 SI: system image / virtual machine
Description: