Chotaire Wiki

Stuff you didn't know

User Tools

Site Tools


proxmox-various-commands

Collection of various Proxmox commands

Most of this is rather undocumented, so use at your own risk. If this is not self-explanatory, then leave it be.

Disable PVE cluster services on standalone node

This will prevent pmxcfs wearing out SSDs by writing to disk every few seconds.

systemctl stop pve-ha-lrm ; systemctl stop pve-ha-crm
systemctl disable pve-ha-lrm ; systemctl disable pve-ha-crm

Change Replication Runner from minutely to monthly

Running a non-clustered host and being fed up with CPU spikes?

systemctl edit --full pvesr.timer

Enable Nested Virtualization

Careful, don't live-migrate VMs with nested virtualization enabled, I think that wouldn't work. Nested Virtualization is rather something for your standalone hypervisors.

echo "options kvm-intel nested=Y" > /etc/modprobe.d/kvm-intel.conf
modprobe -r kvm_intel
modprobe kvm_intel
cat /sys/module/kvm_intel/parameters/nested

Install and enable KSM (Kernel Samepage Mapping)

It's weird, but ksm-control-daemon from Proxmox does not really do anything (tested with PVE 5.4). It seems to be a replacement package for ksmtuned but I do not see ksm included. So we remove ksm-control-daemon (if installed) and then use the one that comes with Debian:

apt remove ksm-control-daemon
apt install ksmtuned
echo 1 >/sys/kernel/mm/ksm/run
systemctl enable ksm
systemctl enable ksmtuned
systemctl start ksm
systemctl start ksmtuned

Hint: To stop KSM and remove shared memory pages, stop ksmtuned and ksm, then echo 2 >/sys/kernel/mm/ksm/run

Proxmox kernel hangs randomly (intel_idle)

On Hetzner Intel i9-9900K boxes, I've seen the proxmox kernel 5.3 hang randomly, requiring a hardware reset. I've applied the following change to /etc/default/grub to add “consoleblank=0 intel_idle.max_cstate=1” as kernel parameters.

GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0 intel_idle.max_cstate=1"

Afterwards, run the following to apply this change:

update-grub
reboot

Proxmox installed from Hetzner image causes postfix to bounce mail

This problem doesn't exist when installing from the Proxmox ISO. Run the following command and set the general configuration to Internet Site.

dpkg-reconfigure postfix

Configure ZFS ARC Cache

Set maximum ARC Cache on boot:

# cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=17179869184

If your root file system is ZFS you must update your initramfs every time this value changes:

# update-initramfs -u

Set it on the fly (will only go into effect after dropping caches):

echo "17179869184" > /sys/module/zfs/parameters/zfs_arc_max
echo 3 > /proc/sys/vm/drop_caches

Drop Memory Caches

Needed for debugging with Ballooning and ZFS, does not need to be reverted to 0:

echo 3 >/proc/sys/vm/drop_caches

Enable network on older Fedora LXC containers

If networking doesn't work in a Fedora container, try this:

touch /etc/sysconfig/network
systemctl enable network
reboot

Can't SSH into Fedora 33 using pubkey authentication

update-crypto-policies --set LEGACY

Convert Virtualbox vdi using vboxmanage

This is how to convert a vdi image to vmdk using vboxmanage provided by VirtualBox:

vboxmanage clonehd whatever.vdi whatever.vmdk --format VMDK

Convert Virtualbox vdi using qemu-img

This example shows how to convert vdi to qcow using qemu-img provided by Proxmox:

qemu-img convert -f vdi -O qcow2 whatever.vdi whatever.qcow2 

Convert vmdk and import to thinpool

qemu-img convert -f vmdk whatever.vmdk -O qcow2 whatever.qcow2
qm importdisk 100 whatever.qcow2 local-zfs

Convert Fedora/CentOS/RedHat VirtualBox/VMware VM to use virtio-scsi

While still running on VMware/VirtualBox:

dnf install dracut-config-generic
dracut -f
cd /boot ; mkinitrd <current initramfs> <current kernel> --force
lsinitrd <current initramfs> | grep virtio

After booting on Proxmox with virtio-scsi, do the following:

dnf remove dracut-config-generic
dracut -f
cd /boot ; mkinitrd <current initramfs> <current kernel> --force
lsinitrd <current initramfs> | grep virtio

Example on how to mkinitrd:

mkinitrd initramfs-4.18.0-193.28.1.el8_2.x86_64.img 4.18.0-193.28.1.el8_2.x86_64 --force

Remember to remove VirtualBox/VMware guest extensions, install/enable qemu-agent and then enable Qemu Agent in Proxmox. You will also have to fix the network interface names before networking works. Make sure Discard is enabled in VM settings, then finally shrink the volume from inside the guest:

fstrim -a -v

Replacing a failed drive on a Proxmox 6.3 system booting from ZFS

Since the Proxmox documentation is useless and sgdisk totally hosed one of my SSD partition layouts, I came up with this solution. On EFI systems booting from two ZFS drives (RAID1), take backup from partition layouts of both drives while they are working:

sfdisk -d /dev/sdc > /root/sdc-partition-layout.txt
sfdisk -d /dev/sdd > /root/sdd-partition-layout.txt

Now when a drive errors out, check which drive is broken, then detach from RAID and replace the hardware:

zpool status
zpool detach rpool <drive>

Let's say /dev/sdd was broken. Make a copy of the sdd partition layout created earlier, and remove all lines/parts stating UUIDs:

cp -a /root/sdd-partition-layout.txt /root/sdd-partition-layout-import.txt
nano /root/sdd-partition-layout-import.txt

Write partition layout to new sdd disk:

sfdisk /dev/sdd < /root/sdd-partition-layout-import.txt

Take another dump of the partition layout to double-check that the UUIDs are different from the broken drive:

sfdisk -d /dev/sdd > /root/sdd-partition-layout-test.txt
diff /root/sdd-partition-layout.txt /root/sdd-partition-layout-test.txt

Find the partition ID of the new drive:

ls -al /dev/disk/by-id

Attach the new ZFS partition to the pool, you attach it to the existing ZFS partition that still works:

zpool attach rpool ata-SAMSUNG_MZ7LH960HAJR-00005_S45NNA0N448234-part3 ata-SAMSUNG_MZ7LH960HAJR-00005_S45NNE0N204633-part3

Wait until resilvering completes, keep checking with:

zpool status

Format and init the EFI partition:

pve-efiboot-tool format /dev/sdd2
pve-efiboot-tool init /dev/sdd2

Remove the no longer working UUID from pve efiboot config:

nano /etc/kernel/pve-efiboot-uuids

Test that there are no more errors by updating initramfs:

update-initramfs -u

Remove temporary files and create a new backup of the partition layout:

rm -f /root/sdd-partition-layout-test.txt
rm -f /root/sdd-partition-layout-import.txt
sfdisk -d /dev/sdd > /root/sdd-partition-layout.txt

Now reboot and select the new drive as boot device to see if the system is able to boot (in case the other one dies).

Running ZFS alongside LVM volumes on Proxmox host

Prevent ZFS volumes from being scanned and blocked by LVM (duh!):

cp -a /etc/lvm/lvm*dist /etc/lvm/lvm.conf

Running MariaDB in vm/container with ZFS/LVM-thin on host

Turn off asynchronous I/O for proper snapshotting:

[mariadb]
innodb_use_native_aio = 0

Windows Balloon Service requires manual path

Windows Balloon Service needs specific location before enabling:

Copy and rename as Administrator the directory from the virtio.iso to: 
"c:/Program files/Balloon"

Having fun with Ballooning

qm monitor <vm id>
balloon 500

Fedora 34 LXC: unsupported release

/usr/share/perl5/PVE/LXC/Setup# diff Fedora.pm Fedora.pm.custom

14c14
<     die "unsupported fedora release\n" if !($version >= 22 && $version <= 33);
---
>     die "unsupported fedora release\n" if !($version >= 22 && $version <= 34);

Give lxc container access to tun/tap devices (e.g. OpenVPN)

a) shut down container
b) edit e.g. /etc/pve/lxc/108.conf
c) append:
lxc.mount.entry: /dev/net dev/net none bind,create=dir
lxc.cgroup.devices.allow: c 10:200 rwm

CentOS 7.5 container: systemd logs read-only file system errors

Two options:
a) Ignore and hide in monitoring, everything's normal.
b) Optional: Inspect and remove contents from /lib/sysctl.d/

Run Docker inside LXC container

Edit /etc/pve/lxc/whatever.conf and append:

lxc.apparmor.profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop:

Re-generate certificates before creating cluster

Double-check that in /etc/hosts and /etc/hostname and /etc/pve/nodes both IP address and hostname is updated.

pvecm updatecerts --force

qemu-guest-agent fails to start on boot (Debian/Ubuntu)

echo "sleep 1" >> /etc/default/qemu-guest-agent
reboot

Can't login on Web UI when cluster is degraded

Run pvecm expected with the number of still online nodes, e.g. 1. Set back to previous number when done.

pvecm expected 1



proxmox-various-commands.txt · Last modified: 2022/03/08 11:33 by chotaire