Credit-Based CPU Scheduler
The credit scheduler is a proportional fair share CPU scheduler built from the ground up to be work conserving on SMP hosts. It is now the default scheduler in the xen-unstable trunk. The SEDF and BVT schedulers are still optionally available but the plan of record is for them to be phased out and eventually removed.
Each domain (including Host OS) is assigned a weight and a cap.
A domain with a weight of 512 will get twice as much CPU as a domain with a weight of 256 on a contended host. Legal weights range from 1 to 65535 and the default is 256.
The cap optionally fixes the maximum amount of CPU a domain will be able to consume, even if the host system has idle CPU cycles. The cap is expressed in percentage of one physical CPU: 100 is 1 physical CPU, 50 is half a CPU, 400 is 4 CPUs, etc... The default, 0, means there is no upper cap.
SMP load balancing
The credit scheduler automatically load balances guest VCPUs across all available physical CPUs on an SMP host. The administrator does not need to manually pin VCPUs to load balance the system. However, she can restrict which CPUs a particular VCPU may run on using the generic vcpu-pin interface.
The xm sched-credit command may be used to tune the per VM guest scheduler parameters.
|xm sched-credit -d <domain>|
|xm sched-credit -d <domain> -w <weight>|
|xm sched-credit -d <domain> -c <cap>|
Each CPU manages a local run queue of runnable VCPUs. This queue is sorted by VCPU priority. A VCPU's priority can be one of two value: over or under representing wether this VCPU has or hasn't yet exceeded its fair share of CPU resource in the ongoing accounting period. When inserting a VCPU onto a run queue, it is put after all other VCPUs of equal priority to it.
As a VCPU runs, it consumes credits. Every so often, a system-wide accounting thread recomputes how many credits each active VM has earned and bumps the credits. Negative credits imply a priority of over. Until a VCPU consumes its alloted credits, it priority is under.
On each CPU, at every scheduling decision (when a VCPU blocks, yields, completes its time slice, or is awaken), the next VCPU to run is picked off the head of the run queue. The scheduling decision is the common path of the scheduler and is therefore designed to be light weight and efficient. No accounting takes place in this code path.
When a CPU doesn't find a VCPU of priority under on its local run queue, it will look on other CPUs for one. This load balancing guarantees each VM receives its fair share of CPU resources system-wide. Before a CPU goes idle, it will look on other CPUs to find any runnable VCPU. This guarantees that no CPU idles when there is runnable work in the system.
Glossary of Terms
- ms: millisecond
- Host: The physical hardware running Xen and hosting guest VMs.
- VM: guest, virtual machine.
- VCPU: Virtual CPU (one or more per VM)
- CPU/PCPU: Physical host CPU.
- Tick: Clock tick period (10ms)
- Time-slice: The time-slice a VCPU receives before being preempted to run another (30ms)
- Period: The accounting period (30ms). Once per period, credits earned are recomputed.
- Weight: Proportional share of CPU per guest VM
- Cap: An optional upper limit on the CPU time consumable by a particular VM.