Posted on Leave a comment

benchmark: disable CPU & RAM mitigations to speed up Linux gaming

i ran Monster Hunter Wilds benchmark 4k ultra with no frame gen, via steam store on Linux, on Xorg display server with “teafree” enabled, running the open-source amdgpu kernel driver:

68.51 FPS average | 23243 score ::: regular Linux Kernel
71.87 FPS average | 24526 score ::: Kernel boot options: mitigations=off init_on_alloc=0 init_on_free=0 randomize_kstack_offset=0

7900XTX GPU
3800X CPU

Linux 6.14.0 built using:
clang +full LTO, debian defaut kernel config as make oldconfig with some things already tweaked at build time such as changing timer frequency to 1000Hz, disabling non-essential debug stuf the kernel documentation says has “minimal runtime overhead”, etc, intel components, etc removed due to running on AMD.

mitigations=off,init_on_alloc=0,init_on_free=0, etc are optionally disabled at runtime as a benchmark (mitigations is something that is _always_ on, by default on every linux distro and default kernel. init_on_alloc/free are enabled by default varying by distro, some have either one or the other enabled, or both on)

this is a system-wide speed improvement, and reduces latency, speeds up the CPU and also the memory operations by sacrificing security operations. register zeroing is also a newer kernel CPU security option we can choose to enable/disable (security/speed)

it will be nice to test this with ntsync now it is mainlined into the kernel! ^-^: ttps://www.kernel.org/doc/html/v6.14-rc7/userspace-api/ntsync.html

besides other kernel memory & cpu options, another potential kernel-level improvement we can get is from AutoFDO when building the kernel using clang (though, i probably need to upgrade from my Zen 2 gen CPU for a start):
https://www.kernel.org/doc/html/next/dev-tools/autofdo.html

“AutoFDO (Auto-Feedback-Directed Optimization) is a type of profile-guided optimization (PGO) used to enhance the performance of binary executables. It gathers information about the frequency of execution of various code paths within a binary using hardware sampling. This data is then used to guide the compiler’s optimization decisions, resulting in a more efficient binary. AutoFDO is a powerful optimization technique, and data indicates that it can significantly improve kernel performance. It’s especially beneficial for workloads affected by front-end stalls.

For AutoFDO builds, unlike non-FDO builds, the user must supply a profile. Acquiring an AutoFDO profile can be done in several ways. AutoFDO profiles are created by converting hardware sampling using the “perf” tool. It is crucial that the workload used to create these perf files is representative; they must exhibit runtime characteristics similar to the workloads that are intended to be optimized. Failure to do so will result in the compiler optimizing for the wrong objective. […]