Most of this is rather undocumented, so use at your own risk. If this is not self-explanatory, then leave it be.
This will prevent pmxcfs wearing out SSDs by writing to disk every few seconds.
systemctl stop pve-ha-lrm ; systemctl stop pve-ha-crm systemctl disable pve-ha-lrm ; systemctl disable pve-ha-crm
Running a non-clustered host and being fed up with CPU spikes?
systemctl edit --full pvesr.timer
Careful, don't live-migrate VMs with nested virtualization enabled, I think that wouldn't work. Nested Virtualization is rather something for your standalone hypervisors.
echo "options kvm-intel nested=Y" > /etc/modprobe.d/kvm-intel.conf modprobe -r kvm_intel modprobe kvm_intel cat /sys/module/kvm_intel/parameters/nested
It's weird, but ksm-control-daemon from Proxmox does not really do anything (tested with PVE 5.4). It seems to be a replacement package for ksmtuned but I do not see ksm included. So we remove ksm-control-daemon (if installed) and then use the one that comes with Debian:
apt remove ksm-control-daemon apt install ksmtuned echo 1 >/sys/kernel/mm/ksm/run systemctl enable ksm systemctl enable ksmtuned systemctl start ksm systemctl start ksmtuned
Hint: To stop KSM and remove shared memory pages, stop ksmtuned and ksm, then echo 2 >/sys/kernel/mm/ksm/run
On Hetzner Intel i9-9900K boxes, I've seen the proxmox kernel 5.3 hang randomly, requiring a hardware reset. I've applied the following change to /etc/default/grub to add “consoleblank=0 intel_idle.max_cstate=1” as kernel parameters.
Afterwards, run the following to apply this change:
This problem doesn't exist when installing from the Proxmox ISO. Run the following command and set the general configuration to Internet Site.
Set maximum ARC Cache on boot:
# cat /etc/modprobe.d/zfs.conf options zfs zfs_arc_max=17179869184
If your root file system is ZFS you must update your initramfs every time this value changes:
# update-initramfs -u
Set it on the fly (will only go into effect after dropping caches):
echo "17179869184" > /sys/module/zfs/parameters/zfs_arc_max echo 3 > /proc/sys/vm/drop_caches
Needed for debugging with Ballooning and ZFS, does not need to be reverted to 0:
echo 3 >/proc/sys/vm/drop_caches
If networking doesn't work in a Fedora container, try this:
touch /etc/sysconfig/network systemctl enable network reboot
update-crypto-policies --set LEGACY
This is how to convert a vdi image to vmdk using vboxmanage provided by VirtualBox:
vboxmanage clonehd whatever.vdi whatever.vmdk --format VMDK
This example shows how to convert vdi to qcow using qemu-img provided by Proxmox:
qemu-img convert -f vdi -O qcow2 whatever.vdi whatever.qcow2
qemu-img convert -f vmdk whatever.vmdk -O qcow2 whatever.qcow2 qm importdisk 100 whatever.qcow2 local-zfs
While still running on VMware/VirtualBox:
dnf install dracut-config-generic dracut -f cd /boot ; mkinitrd <current initramfs> <current kernel> --force lsinitrd <current initramfs> | grep virtio
After booting on Proxmox with virtio-scsi, do the following:
dnf remove dracut-config-generic dracut -f cd /boot ; mkinitrd <current initramfs> <current kernel> --force lsinitrd <current initramfs> | grep virtio
Example on how to mkinitrd:
mkinitrd initramfs-4.18.0-193.28.1.el8_2.x86_64.img 4.18.0-193.28.1.el8_2.x86_64 --force
Remember to remove VirtualBox/VMware guest extensions, install/enable qemu-agent and then enable Qemu Agent in Proxmox. You will also have to fix the network interface names before networking works. Make sure Discard is enabled in VM settings, then finally shrink the volume from inside the guest:
fstrim -a -v
Since the Proxmox documentation is useless and sgdisk totally hosed one of my SSD partition layouts, I came up with this solution. On EFI systems booting from two ZFS drives (RAID1), take backup from partition layouts of both drives while they are working:
sfdisk -d /dev/sdc > /root/sdc-partition-layout.txt sfdisk -d /dev/sdd > /root/sdd-partition-layout.txt
Now when a drive errors out, check which drive is broken, then detach from RAID and replace the hardware:
zpool status zpool detach rpool <drive>
Let's say /dev/sdd was broken. Make a copy of the sdd partition layout created earlier, and remove all lines/parts stating UUIDs:
cp -a /root/sdd-partition-layout.txt /root/sdd-partition-layout-import.txt nano /root/sdd-partition-layout-import.txt
Write partition layout to new sdd disk:
sfdisk /dev/sdd < /root/sdd-partition-layout-import.txt
Take another dump of the partition layout to double-check that the UUIDs are different from the broken drive:
sfdisk -d /dev/sdd > /root/sdd-partition-layout-test.txt diff /root/sdd-partition-layout.txt /root/sdd-partition-layout-test.txt
Find the partition ID of the new drive:
ls -al /dev/disk/by-id
Attach the new ZFS partition to the pool, you attach it to the existing ZFS partition that still works:
zpool attach rpool ata-SAMSUNG_MZ7LH960HAJR-00005_S45NNA0N448234-part3 ata-SAMSUNG_MZ7LH960HAJR-00005_S45NNE0N204633-part3
Wait until resilvering completes, keep checking with:
Format and init the EFI partition:
pve-efiboot-tool format /dev/sdd2 pve-efiboot-tool init /dev/sdd2
Remove the no longer working UUID from pve efiboot config:
Test that there are no more errors by updating initramfs:
Remove temporary files and create a new backup of the partition layout:
rm -f /root/sdd-partition-layout-test.txt rm -f /root/sdd-partition-layout-import.txt sfdisk -d /dev/sdd > /root/sdd-partition-layout.txt
Now reboot and select the new drive as boot device to see if the system is able to boot (in case the other one dies).
Prevent ZFS volumes from being scanned and blocked by LVM (duh!):
cp -a /etc/lvm/lvm*dist /etc/lvm/lvm.conf
Turn off asynchronous I/O for proper snapshotting:
[mariadb] innodb_use_native_aio = 0
Windows Balloon Service needs specific location before enabling:
Copy and rename as Administrator the directory from the virtio.iso to: "c:/Program files/Balloon"
qm monitor <vm id> balloon 500
/usr/share/perl5/PVE/LXC/Setup# diff Fedora.pm Fedora.pm.custom 14c14 < die "unsupported fedora release\n" if !($version >= 22 && $version <= 33); --- > die "unsupported fedora release\n" if !($version >= 22 && $version <= 34);
a) shut down container b) edit e.g. /etc/pve/lxc/108.conf c) append: lxc.mount.entry: /dev/net dev/net none bind,create=dir lxc.cgroup.devices.allow: c 10:200 rwm
Two options: a) Ignore and hide in monitoring, everything's normal. b) Optional: Inspect and remove contents from /lib/sysctl.d/
Edit /etc/pve/lxc/whatever.conf and append:
lxc.apparmor.profile: unconfined lxc.cgroup.devices.allow: a lxc.cap.drop:
Double-check that in /etc/hosts and /etc/hostname and /etc/pve/nodes both IP address and hostname is updated.
pvecm updatecerts --force
echo "sleep 1" >> /etc/default/qemu-guest-agent reboot
Run pvecm expected with the number of still online nodes, e.g. 1. Set back to previous number when done.
pvecm expected 1