Hey there and welcome back! Excited to dive into today’s topic!
If you read my last post, you know I recently fought my way out of a nasty Proxmox boot loop right after a power outage. That whole mess kicked off because I was trying to configure GPU passthrough for an NVIDIA RTX 3060.
As promised in response to a reader asking “What if I genuinely need the graphics card on my server?” – I dove back in after completing my backups. I wanted to figure out why my server crashed and how to actually configure GPU passthrough cleanly and safely.
This is a walk-through of how I diagnosed and fixed the boot loops, cleaned up my configs, and successfully isolated my NVIDIA GPU. These are the exact steps and lessons learned so you can avoid the same pitfalls.
- Context and Hardware
- The Symptoms and The Core Problem
- Initial (Problematic) Configuration
- Step-by-Step Fix
- Post-Reboot Verification
- Lessons Learned
- Conclusion
Context and Hardware
Before we jump into the commands, here is the hardware setup I am working with:
- Host OS: Proxmox VE
- Platform: Intel Xeon E5 v2 (C600/X79 chipset)
- Host Console GPU: ASPEED BMC GPU (built into the motherboard)
- Passthrough GPU: NVIDIA RTX 3060
My Goal: Use the RTX 3060 explicitly for GPU passthrough to virtual machines, keep the host usable via the local console (using the ASPEED GPU), and avoid unsafe hacks that cause boot failures.
The Symptoms and The Core Problem
After installing the NVIDIA card and enabling passthrough, my host would completely fail to boot after a power outage.
To bypass the boot loop, I had to drop into a root shell (by appending init=/bin/bash in GRUB), rename my VFIO configuration files, and comment out the VFIO modules to stop them from loading. This clearly pointed to a boot-time VFIO / IOMMU / interrupt configuration problem.

Initial (Problematic) Configuration
To understand the fix, you need to see the mistakes I made initially. These settings appeared to work at first, but ultimately made the system unstable.
- Unsafe Interrupts:
I had a file (/etc/modprobe.d/iommu_unsafe_interrupts.conf) withoptions vfio_iommu_type1 allow_unsafe_interrupts=1. This forces VFIO to operate without proper interrupt remapping guarantees. This is generally unnecessary on enterprise hardware and highly unstable. - Messy GRUB Kernel Command Line:
My/etc/default/grublooked like this:GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off,efifb:off" - The
pcie_acs_overridewas an unnecessary hack for my setup. - The framebuffer flags (
nofb,nomodeset,video=...off) were completely pointless because my host has its own dedicated ASPEED GPU for video output.
Combined, this “spaghetti” of forum copy/paste solutions made the system fragile.
Step-by-Step Fix
Let’s clean this up the right way.
1. Verify IOMMU and Interrupt Remapping
First, check if your hardware actually supports IOMMU and interrupt remapping out of the box.
# Check kernel rings for IOMMU and interrupt logs
dmesg | grep -i -e iommu -e 'interrupt remapping'
You want to see lines like DMAR: IOMMU enabled, DMAR-IR: IOAPIC id ... IOMMU ..., and iommu: Default domain type: Passthrough. This confirms that your hardware supports proper interrupt remapping, meaning you do not need the unsafe interrupts hack.
2. Inspect IOMMU Groups
Next, verify that your GPU is properly isolated in its own IOMMU group. If it shares a group with critical host components, you’d need the ACS override, but usually, you don’t.
# List all PCI devices by their IOMMU groups
for d in /sys/kernel/iommu_groups/*/devices/*; do
n=${d#*/iommu_groups/*}; n=${n%%/*};
printf 'IOMMU Group %s ' "$n";
lspci -nns "${d##*/}";
done
In my output, the RTX 3060 (Group 25) and its Audio controller (Group 26) were nicely separated from the ASPEED Graphics (Group 30). Lesson learned: No need for pcie_acs_override!
3. Clean up GRUB
Let’s remove the unsafe and unnecessary flags from the bootloader.
# Always backup before editing!
cp /etc/default/grub /etc/default/grub.bak
# Edit the file
nano /etc/default/grub
Change your GRUB_CMDLINE_LINUX_DEFAULT to just the essentials:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
(Note: If you are on an AMD system, use amd_iommu=on instead).
Save the file and apply the changes:
update-grub
4. Clean up modprobe.d Configs
We need to remove the unsafe interrupts file entirely and ensure our blacklists are correct.
# Delete the dangerous config file
rm /etc/modprobe.d/iommu_unsafe_interrupts.conf
Next, ensure your /etc/modprobe.d/vfio.conf binds directly to the specific hardware IDs of your GPU and its audio controller:
options vfio-pci ids=10de:2487,10de:228b disable_vga=1
(Replace 10de:2487,10de:228b with your specific GPU and Audio PCI IDs found via lspci -nn).
Finally, update your driver blacklists (/etc/modprobe.d/blacklist.conf) to stop the host from grabbing the GPU or its audio:
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist snd_hda_intel
blacklist snd_hda_codec_hdmi
5. Rebuild Initramfs and Reboot
Whenever you touch GRUB or modprobe.d configurations, you must rebuild your initial RAM disk so the changes take effect during the boot process.
# Rebuild the initramfs
update-initramfs -u -k all
# Safely reboot the host
reboot
Post-Reboot Verification
Once the server is back online, let’s verify that the vfio-pci driver has successfully captured both the NVIDIA GPU and its audio controller.
lspci -nnk | grep -A3 -E "10de:2487|10de:228b"
Look at the output. For both the VGA controller and the Audio device, you should see:
Kernel driver in use: vfio-pci
To confirm the host is still properly using the ASPEED GPU for the local console:
lspci -nnk | grep -A3 1a03:2000
Expect to see Kernel driver in use: ast. Perfect!
Lessons Learned
- Avoid
allow_unsafe_interruptsunless you truly have no choice. On enterprise or modern hardware, it’s unnecessary and unstable. - Don’t use
pcie_acs_overrideif your IOMMU groups are already clean. If your GPU is isolated, overriding ACS just adds risk. - Bind everything. Always bind both the GPU and its audio function to
vfio-pci, and blacklist the host audio drivers. - Keep GRUB simple.
intel_iommu=on iommu=ptis usually all it takes. - Rebuild your initramfs! If you forget
update-initramfs -u, you’re just hunting ghosts on old configs.
Conclusion
GPU passthrough can be a bit of a dark art. It’s incredibly easy to fall into the trap of aggressively copy-pasting kernel parameters from forums until something works, only for it to spectacularly fail during the next power cycle.
I hope this deep dive saves you some headaches. Until next time, happy virtualizing!
