Xen Roadmap/4.4

From Xen
Revision as of 18:17, 10 February 2014 by Dunlapg (talk | contribs) (Open)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Rather than try to predict precisely what will make it into what release (which was something of a disaster last release), I'm just going to borrow a term from the Agile world and call all uncompleted features the "Backlog". I'll still track who is doing what, and when we get close, what state things seem to be in.

As mentioned in another e-mail, we'll also be working on improving the regression tester. Feel free to join us.

And as always, if you are working on a feature / bug that you want tracked, please respond to this e-mail.


As discussed elsewhere, I am proposing a 6-month release cycle. Xen 4.3 was released on 9 July. That would give us a release on 9 January 2014. This is fairly close after the Christmas season, so I propose to make the estimated release date later, on 21 January, giving a few extra weeks for the holiday season:

  • Feature freeze: 18 October 2013
  • Code freezing point: 18 November 2013
  • First RCs: 6 December 2013 <<== WE ARE HERE
  • Release: When It's Ready 2014. (Probably mid-to-end of February.)

Feedback on the estimated dates is welcome.

Last updated: 10 February

Exception guidelines for after the code freeze

We have now reached the code freeze, so we need to be thinking carefully about every patch accepted. Below is a brief overview of the criteria the maintainers and the release coordinator will use to make decisions; you can help us out a lot by including your own analysis to any attached patch. Remember that the more conservative you are in your own analysis, the less strict we need to be in our analysis.

Our goal for the release are, in this order:

  1. A bug-free release
  2. An awesome release
  3. An on-time release

Accepting any patch at this point may fix some bugs or enable some functionality, but has a risk of introducing other bugs (breaking other functionality). That bug may be found before the release (threatening #3), or it may not be found until after the release (threatening #1).

The "expected value" of a risk is how bad the risk is times the probability of that risk.

So when considering a bug fix, three questions need to be asked:

  1. What functionality is being fixed / enabled by this patch?
  2. If there was a bug in this patch, what functionality might be broken?
  3. What is the probability that this patch has a bug?
  4. If the patch had a bug, what is the probability it would be found before the release?
  5. Given the above benefit and risk, is this patch worth it?

When asking #2, I think we need to think 95th percentile weighted by probability. It is always conceivable that a minor change will cause a cascading series of failures leading to global thermonuclear war and the annihilation of the human race; but if we always thought like that we'd never do anything at all.

When considering #3, consider things like the complexity of the patch, complexity of the underlying code, reviewer confidence.

You can find a short example of this kind of analysis here: http://marc.info/?l=xen-devel&m=138617980623176


* Event channel scalability (FIFO event channels)

* Non-udev scripts for driver domains (non-Linux driver domains)

* Multi-vector PCI MSI (Hypervisor side)

* Improved Spice support on libxl
 - Added Spice vdagent support
 - Added Spice clipboard sharing support
 - Spice usbredirection support for upstream qemu 

* PHV domU (experimental only)

* pvgrub2 checked into grub upstream

* ARM64 guest

* Guest EFI booting (tianocore)

* kexec

* Testing: Xen on ARM

* Update to SeaBIOS

* Update to qemu 1.6.2

* SWIOTLB (in Linux 3.13)

* Disk: indirect descriptors (in 3.11)

* Reworked ocaml bindings 

Resolved since last update

* qemu-* parses "008" as octal in USB bus.addr format

* Claim mode and PoD

* Disable IOMMU if no southbridge

* osstest windows-install failures

* libxl / libvirt races


* Win2k3 SP2 RTC infinite loops
   > Regression introduced late in Xen-4.3 development
   owner: andrew.cooper@citrix
   status: patches posted, undergoing review.

* PVH regression

* dirty vram / IOMMU bug
 > http://bugs.xenproject.org/xen/bug/38
 status: Patch posted

* RHEL 7 pygrub patches
 > http://bugs.xenproject.org/xen/bug/39
 status: Wait for 4.4.1?

* credit2 runqueues
 > http://bugs.xenproject.org/xen/bug/36

* RHEL 5.x ocaml build bug
  status: patch posted

* libxl / xl does not handle failure of remote qemu gracefully
  > Related to http://bugs.xenproject.org/xen/bug/30
  > Easiest way to reproduce: 
  >  - set "vncunused=0" and do a local migrate
  >  - The "remote" qemu will fail because the vnc port is in use
  > The failure isn't the problem, but everything being stuck afterwards is
 Ian J investigating

* qemu memory leak?
  > http://lists.xen.org/archives/html/xen-users/2013-03/msg00276.html

(Open, not for 4.4)

* qemu-upstream not freeing pirq 
 > http://www.gossamer-threads.com/lists/xen/devel/281498
 > http://marc.info/?l=xen-devel&m=137265766424502
 status: patches posted; latest patches need testing
 it hasn't been tested because of the other passthrough issues.

 Not a blocker.

* Race in PV shutdown between tool detection and shutdown watch
 > http://www.gossamer-threads.com/lists/xen/devel/282467
 > Nothing to do with ACPI
 status: Patches posted, need more work, will be stalled for some time
 The fix is to the Linux side of things.
 Not a blocker.

* xl does not support specifying virtual function for passthrough device
 > http://bugs.xenproject.org/xen/bug/22
 Too much work to be a blocker.

* xl does not handle migrate interruption gracefully
  > If you start a localhost migrate, and press "Ctrl-C" in the middle,
  > you get two hung domains
 Ian J investigated -- can of worms, too big to be a blocker for 4.4

* HPET interrupt stack overflow (when using hpet_broadcast mode and MSI
capable HPETs)
  owner: andyh@citrix
  status: patches posted, undergoing review iteration.
  > andyhhp: I have more work to do on the HPET series
  > andyhhp: no way it is going to be ready or safe for 4.4

* PCI hole resize support hvmloader/qemu-traditional/qemu-upstream with PCI/GPU passthrough
  > http://bugs.xenproject.org/xen/bug/28
  > http://lists.xen.org/archives/html/xen-devel/2013-05/msg02813.html
  > Where Stefano writes:
  > 2) for Xen 4.4 rework the two patches above and improve
  > i440fx_update_pci_mem_hole: resizing the pci_hole subregion is not
  > enough, it also needs to be able to resize the system memory region
  > (xen.ram) to make room for the bigger pci_hole

  status: not going to be fixed for 4.4 either. Created bug #28.


Testing coverage

* new libxl w/ previous versions of xl

* Host S3 suspend
 @bguthro, @dariof

* Default [example] XSM policy
 @Stefano to ask Daniel D	

* Storage driver domains

* HVM pci passthrough

* PV pci passthrough
 @konrad (or @george if he gets to it first)

* Network driver domains

* Nested virt?
 @intel (chased by George)

* Fix SRIOV test (chase intel)

* Fix bisector to e-mail blame-worthy parties
* Fix xl shutdown 

* stub domains

* performance benchmarks

Meta-items (composed of other items)

* Meta: PVIO NUMA improvements
 - NUMA affinity for vcpus (4.4 possible)
 - PV guest NUMA interface (4.4 possible)
 - Sensible dom0 NUMA layout 
 - Toolstack pinning backend thread / virq to appropraite d0 vcpu
 - NUMA-aware ballooning

* xend still in tree (x)
 - xl list -l on a dom0-only system
 - xl list -l doesn't contain tty console port
 - xl Alternate transport support for migration*
 - xl PVSCSI support
 - xl PVUSB support

Big ticket items

* PVH dom0 (w/ Linux) 
  owner: mukesh@oracle, george@citrix
  status (Linux): Acked, waiting for ABI to be nailed down
  status (Xen): v6 posted; no longer considered a blocker

* libvirt/libxl integration (external)
 - owner: jfehlig@suse, dario@citrix
 - patches posted (should be released before 4.4)
  - migration
  - PCI pass-through
 - In progress
  - integration w/ libvirt's lock manager
  - improved concurrency

Missed the feature freeze

* libxl network buffering support for Remus
   status: patches posted
   prognosis: fair

* xencrashd
   owner: don@verizon
   status: v2 posted
  > http://lists.xen.org/archives/html/xen-devel/2013-11/msg02569.html

* ARM Live Migration Support
  owner: Jaeyong Yoo <jaeyong.yoo@samsung.com>
  status: Not for 4.4

* soft affinity for vcpus (was NUMA affinity for vcpus)
    owner: Dario
    status: v2 posted

* PV guest NUMA interface
    owner: Elena 
    status: v3 posted

* xl USB pass-through for HVM guests using Qemu USB emulation
  owner: George
  status: v6 patch series posted

* Sensible dom0 NUMA layout

* Toolstack pinning backend thread / virq to appropraite d0 vcpu

* NUMA Memory migration 
  owner: dario@citrix
  status: In progress

* NUMA-aware ballooning
   owner: Li Yechen
   status: in progress

* xl migrate transport improvements
 owner: None
 > See discussion here: http://bugs.xenproject.org/xen/bug/19
 - Option to connect over a plain TCP socket rather than ssh
 - xl-migrate-recieve suitable for running in inetd
 - option for above to redirect log output somewhere useful
 - Documentation for setting up alternate transports

* HVM guest NUMA
  owner: Matt Wilson@amazon
  status: in progress (?)

* qemu-upstream stubdom, Linux
   owner: anthony@citrix
   status: in progress
   prognosis: ?
   qemu-upstream needs a more fully-featured libc than exists in
   mini-os.  Either work on a minimalist linux-based stubdom with
   glibc, or port one of the BSD libcs to minios.

* qemu-upstream stubdom, BSD libc
  prognosis: ?
  owner: ianj@citrix

* Network performance improvements
  owner: wei@citrix

* Xen EFI feature: Xen can boot from grub.efi
 owner: Daniel Kiper
 status: in progress

* Default to credit2
 status: Probably not for 4.4
 - cpu pinning
 - NUMA affinity
 - cpu "reservation"

* xenperf
  prognosis: Deferred to 4.5
  Owner: Boris Ostrovsky
  status: v2 patches posted

* Nested virtualization on Intel

* Nested virtualization on AMD

* Multi-vector PCI MSI (upstream Linux)
  owner: boris@oracle

* xl: passing more defaults in configuration in xl.conf
  owner: ?
  There are a number of options for which it might be useful to pass a
  default in xl.conf.  For example, if we could have a default
  "backend" parameter for vifs, then it would be easy to switch back
  and forth between a backend in a driver domain and a backend in dom0.

* xl PVUSB pass-through for PV guests
* xl PVUSB pass-through for HVM guests
  owner: George
  status: ?
  xm/xend supports PVUSB pass-through to guests with PVUSB drivers (both PV and HVM guests).
  - port the xm/xend functionality to xl.
  - this PVUSB feature does not require support or emulation from Qemu.
  - upstream the Linux frontend/backend drivers. Current work-in-progress versions are in Konrad's git tree.
  - James Harper's GPLPV drivers for Windows include PVUSB frontend drivers.

* Xen EFI feature: pvops dom0 able to make use of EFI run-time services (external)
 owner: Daniel Kiper
 status: Just begun


* mac address changes on reboot if not specified in config file
  > Needs a robust way to "add" to the config

* ACPI WAET table vs RTC emulation mode
  owner: jan@suse
  prognosis: ?
 > An overly simplified fix was posted a while ago
 > (http://lists.xenproject.org/archives/html/xen-devel/2013-07/msg00122.html),
 > but Tim's objection is rather valid. I can't, however, estimate
 > if/when I would find time to learn what tools side changes are
 > necessary to accommodate a new HVM param, and hence this is
 > currently stalled.  The current solution (as of 3fa7fb8b ["x86/HVM:
 > RTC code must be in line with WAET flags passed by hvmloader"])
 > isn't desirable to be kept for 4.4.

* Polish up xenbugtool
  owner: wei.liu2@citrix.com

* Sort out better memory / ballooning / dom0 autoballooning thing
 > Don't forget NUMA angle
 - Inaccurate / incomplete info from HV

* Implement Xen hypervisor dmesg log entry timestamps
 > https://xenorg.uservoice.com/forums/172169-xen-development/suggestions/3924048-implement-xen-hypervisor-dmesg-log-entry-timestamp
 > Request seems to be for a shorter stamp (seconds-only, rather than full date)

* Make network driver domains easier to set up / more useful
 - Make it easy to make a device assignable (in discussion)
 - Automatically start/shutdown (xendomains?)
 - Pause booting of other domains until network driver domain is up (necessary?)

* libxl: More fine-grained control over when to pass through a device
 > Some IOMMUs are secure; some are merely functional, some are not present.
 > Allow the adminitrator to set the default 

* qxl
  > http://bugs.xenproject.org/xen/bug/11
  - Uninitialized struct element in qemu
  - Revert 5479961 to re-enable qxl in xl,libxl
  - Option in Xen top-level to enable qxl support in qemu tree
  - Fix sse2 MMIO issue
   - make word size arbitrary

* libxl config file
  > "non-xl toolstacks which use libxl could specify configuration
  > options for some things.... things like locations of binaries come
  > to mind; maybe so that distros could package up libxl and say
  > where things were, and other programs could like against it..."
  > "There some settings that you'd want to be host wide for any libxl
  > using toolstacks sharing a host (e.g. xl and xapi). Default
  > vif-scripts and policy WRT selecting disk backends are two which
  > spring to mind.  Probably a great deal of xl.conf actually belongs
  > in libxl.conf"

* libxl: Don't use RAW format for "URL"-based qdisks (e.g., rbd:rbd/foo.img)
  - Figure out whether to use a generic URL or have a specific type for each one
  - Check existence of disk file for all RAW

* acpi-related xenstore entries not propagated on migrate
 > http://www.gossamer-threads.com/lists/xen/devel/282466
 > Only used by hvmloader; only a clean-up, not a bug.

Wishlist / someday

* Make storage migration possible
  owner: ?
  status: none
  There needs to be a way, either via command-line or via some hooks,
  that someone can build a "storage migration" feature on top of libxl
  or xl.

* Full-VM snapshotting
  owner: ?
  status: none
  Have a way of coordinating the taking and restoring of VM memory and
  disk snapshots.  This would involve some investigation into the best
  way to accomplish this.

* VM Cloning
  owner: ?
  status: none
  Again, a way of coordinating the memory and disk aspects.  Research
  into the best way to do this would probably go along with the
  snapshotting feature.

* xl vm-{export,import}
  owner: ?
  status: none
  Allow xl to import and export VMs to other formats; particularly
  ovf, perhaps the XenServer format, or more.
* Memory: Replace PoD with paging mechanism
  owner: george@citrix
  status: none

* PV audio (audio for stubdom qemu)
  owner: stefano.panella@citrix
  status: ?

* Wait queues for mm
 > Needed for more advanced paging schemes
  owner: ?
  status: Draft posted Feb 2012; more work to do.

* V4V: Inter-domain communication
  owner (Xen): dominic.curran@citrix.com
  status (Xen): patches submitted
  owner (Linux driver):  stefano.panella@citrix
  status (Linux driver): in progress

* Serial console improvements
  owner: ?
  status: Stalled (see below)
  -xHCI debug port (Needs hardware)
  -Firewire (needs hardware)