From Xen

This page is dedicated to the work done for GSoC to use SeaBIOS in Xen.

The overall process:

  1. Set up xenstore conection
  2. initiate front rings
  3. Initiate gran table
  4. Take my rings mfn address and set it up as a entry
  5. create a unbound port for front-back ring communication
    1. I start by changing state to XenbusStateInitialising
    2. ring-ref entry (step 4)
    3. port entry (step 5)
    4. backend state is XenbusStateInitWait
    5. change state to XenbusStateInitialised
    6. back end state is XenbusStateClosing meaning there is an error or something is missing.
  6. on sucess end

SeaBIOS talks to the I/O ports associated with an IDE device. These are statically defined for the first 4 IDE disks from way back in the original PC XT/AT days. In this document we will use this terminology: term "backend" for the thing which implements the PV device, opposite the PV driver "frontend". For emulated devices we tend to use "device model" to refer to the thing which implements the device. Note: Virtio implements full block support on SeaBIOS for virtio block devices. It implements on virtio-blk- both disk and block, also PCI is for disk devices is provided. Virtio creates the encapsulating struct, all commands and overwrites the interrupt vector entry for the device using add_bcv_internal An encapsulating struct will be created that will contain drive_s of seabios that way the rings pointers can be located when calling commands on the drive. In order to write the frontend we will required to do the following things:

  • SeaBIOS: Insert init code for Xen on POST
  • init xenbus rings
  • init blk rings
  • Allocate xendrive struct with malloc_fseg for later recovery.
  • Init grant table
  • Init grant entries
  • Share a single entry for blk rings, gref is stored in xendrive->info->ring_ref
  • TODO share via xenbus rings the gref with dom0.
  • when called process_xen_op: retrieve xendrive, with blkfront_info (aka info) fill request structs and place on blk rings. fill and place TODO.
  • TODO On exit destroy or deallocate all

Later I will place the final code that corresponds to each step and explain it all.

This project is focused only on the seabios frontend


  1. The code starts in BIOS POST, here the init code is called.
  2. Init will validate Xen and setup all the requeired rings, event channels and structs.
  3. A single ring and event channel is created for communication with the back driver.
  4. the operation funtion is registered in the interrupt vector so the guest can call the new front end.
  5. Upon notification from the guest all (interrupt may be requeired) the destruction of rings and event channels in bios space.

The HVM guest itself is responsible for allocating a page and giving it to the hypervisor (via XENMEM_add_to_physmap with XENMAPSPACE_shared_info) to set it up as a shared info, if the guest does not use PVDrivers it hat two options: The xenbus driver should be hung off the vendor/device ID of the xen "platform PCI device" The virtual PCI device model, provided by qemu, whose sole purpose is to provide a hint to a kernel that it should try and setup Xen PV drivers. The device model already exists as does a driver for Linux but the SeaBIOS driver is part of this project: Xen's PC vendor ID is 0x5853 (XS in ASCII)

set bit and clear EXPLAINED

Xen uses the following code:

/int event, shinfo->evtchn_pending[sizeof(unsigned long) * 8]
static inline int test_and_clear_bit(int nr, volatile void *addr)  //nr is %1 and addr is %2
    int oldbit;  //is %0
    asm volatile (
        "lock ; btrl %2,%1 ; sbbl %0,%0"   //pg178 y pg1193
        : "=r" (oldbit), "=m" (*(volatile long *)addr)   //la r= says any register can be used; Register operand constrain ALSO Memory operand constraint(m)
        : "Ir" (nr), "m" (*(volatile long *)addr) : "memory"); //Ir nr is in a register, and it’s value ranges from 0-31 (x86 dependant constraint); our instruction modifies memory in an unpredictable fashion, add "memory" to the list of clobbered registers.
    return oldbit;
"=" : Means that this operand is write-only for this instruction; the previous value is discarded and replaced by output data.

asm ( assembler template 
           : output operands                  /* optional */
           : input operands                   /* optional */
           : list of clobbered registers      /* optional */
his instruction causes the processors LOCK# signal to be asserted during execution of the
accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor
environment, the LOCK# signal insures that the processor has exclusive use of any shared
memory while the signal is asserted.

BTR (BIT TEST RESET) takes addr and offset, clears bit to 0 but stores the old value in CF register.
SBB (Integer Substraction with Borrow) subtract op1 - (op2 + CF) in this case the op is empty we get the value of CF

PCI Hook

It's not really much of a driver, it really just kicks off the xenbus stuff, the emulated hardware itself has no real functionality The platform device is dev-id 0x0001, the code will be introducted in pci_ids.h For the purpose of this code the PCI Hook will provide interrupts to the code, since the PCI has not been implemented the code will be left in a state that will provide easy update but no functionality. The PCI code is planned to be written soon. After this etherboot will be developed.

Real Mode Problem

In order to give PV Drivers to SeaBIOS we will need to solve a few problems, one is the following: Does a booting kernel informs the BIOS that it will leave real mode and not use it again? When the booting kernel uses CPU real mode for the last time, how can we (Xen or SeaBIOS) know that real mode will no longer be used, and hence BIOS calls will not be issued? We want upon last real mode usage to leave all Xen PV information in a clean state, this means, closing the channel and ring between the newly created domain and the host system.


The ACPI spec does define a mechanism for the OS to inform the BIOS that it is transitioning from "Legacy state" to "Working state" via an SMI. SeaBIOS does have code for this (see src/smm.c), but it doesn't currently do anything interesting. Unfortunately, this is only available for OSs that support ACPI.


You can look at the Linux source code and see what the first thing it does is. With GPLPV, the first thing I do is set up logging to /var/log/qemu-dm-<domu name>.log (iowrites which are caught by qemu), but only under the checked drivers. The next thing is to balloon down the memory before Windows touches it too much. Then I disable the qemu devices (iowrites which are caught by qemu). Finally I check the CPUID for the xen signature (should probably do that first) and then set up the rights etc.

I think the cheapest way to do it would be to trap the iowrite's and use that as the trigger to tear down the rings etc, as the iowrites are already processed in qemu which should be easier to intercept, but the xen guys would need to comment on if you can guarantee that this is always done by any reasonably recent version of Linux with PV drivers. There may well be lots of current installations that pre-date those iowrite's.

Next I guess you could look for the WriteMSR instruction to copy the hypercall pages in, or look for an OS querying the CPUID's where the Xen signatures live, but then the Hyper-V signatures are there too and I don't know when Windows queries those. Possibly harder to trap as Xen would either need to signal qemu or SeaBIOS directly that this had happened.

Alternatively, seeing the HVM_PARAM_CALLBACK_IRQ, HVM_PARAM_STORE_PFN, and HVM_PARAM_STORE_EVTCHN hypercalls (hvm set op) is the definitive way to know that the OS is initialising the xenbus interface. SeaBIOS would need to trap the calls (all three I guess in case they were executed in an order you didn't expect) before they were executed, which would be harder as I think qemu never sees it. This early intervention would be required as you'd need to use xenbus to tear down the interfaces which is probably asking a bit much.


Keir Proposes: There's no easy way. Best effort might be to hook off the guest OS setting up its PV drivers. One of the first steps of that would be getting a hypercall transfer page, and also setting up event-channel delivery. It may be necessary for the hypervisor to give the BIOS some help by delivering a pre-registered callback on one of those events, to clean up. This is made uglier by the fact you don't know what execution mode the OS might be in when it triggers the callback. Needs a bit more thought.


Tim and I just had a bit of a think about whether or not this could be done from AML }:-). (Lets ignore the fact that require ACPI support in the guest for this functionality would be a bit lame...)

Turns out it cannot (phew!) without adding some very hacky way to make hypercalls (e.g. via an I/O write), hypercalls are needed to kick the xenstore evtchn and also to close any other evtchns. The rest, such as clearing down grant entries and zeroing the xenstore ring could be done from AML, we reckon.

FWIW the set of things which needs to be done seems to be:

  • xenbus writes to move devices to state 5 (provoking backend
    • reset), notify xenbus evtchn, wait for responses to complete (or otherwise interlock against the xenstore ring reset below).
  • make hypercalls to close event channels
  • clear grant table entries
  • reset the xenstore ring ready for use by next OS.

So it looks like some sort of SMM alike thing is going to be the best answer here, although "real virtual" SMM looks like a complete snake/tar pit. A simpler callback with flat segments seems plausibly doable.

As an aside we will also need to handle the case where the guest is not PV aware and hence uses the emulated devices and never triggers any of the above activities. So we need to ensure that the backends are sync'd even if none of the above takes place. The PV devices will remain open but that needn't be a problem if the guest never uses them.

Possibly this means making sure all writes via this PV interface go straight to disk (using the appropriate barriers) or by having qemu do the necessary flush when the emulated device is first used.

Keir responds: The SMM type thing (maybe not really emulated SMM, but kidn of inspired by the principle of SMM) is the best idea I have so far. That was the kind of thing in my mind when I replied yesterday.


How can you be certain an OS won't switch back to real mode even after an extended period of up-time? Or that such switching back would affect you (could be calling e.g. the video or PCI BIOS functions only).

There is INT15 AX=EC00 with BX specifying the target operating mode, but that's apparently being called only before entering long mode (i.e. wouldn't cover 32-bit OSes). And it would neither be a guarantee that the OS might not later return to real mode.

Jurguen complements: Wouldn't it be possible for the BIOS to reestablish the connection to Xen in this case? This might be the best solution: close the channel and ring at some specific event (might even be timer based) and open them again if really needed.

Tim answers: You can't, but you could always try to re-establish PV connections if the guest starts making INT13h call again. In any case the existing BIOS has this problem if the PV drivers have turned off the emulated devices.

As for how you tidy up cleanly, I can't think of anything better than a sort of virtual SMM, where you register an area of code to be run in a known sane environment and have Xen trigger it based on, e.g. the disable-my-devices ioport write. It's pretty ugly but at least it'd be fairly self-contained compared to having Xen or qemu try to tear down grant-table entries &c