Article written by guest author Stefano Stabellini, ZEDi Advisor, Virtualization Architect at Aporeto, Xen Committer and ARM maintainer
The announcement of the new ACRN hypervisor by Intel a couple of weeks ago took many by surprise. However, it is entirely aligned with the broader industry movement toward the use of virtualization to run mixed-criticality workloads in embedded and IoT. At Embedded Linux Conference and OpenIoT Summit North America we had at least four presentations discussing this issue and proposing solutions based on a variety of open source projects, from Xen on ARM to OpenAMP and ARM TrustZone.
Let’s start with the problem statement. It has become increasingly common across industries that a critical software component needs to run alongside a non-critical software component on the same IoT device. For instance, the autopilot software (critical) and the video streaming software (non-critical) on a flying camera drone. If the video streaming crashes, the user might lose the camera feed, but the drone can still land safely. If the autopilot software crashes, the drone will get damaged and might cause damage to other property. There might be liability involved. Other examples of mixed-criticality can be easily found in automotive, in industrial robotics, where the safety of workers and the production rate might be adversely affected by software failures, and healthcare, where patients could be at risk.
Typically, the critical component is small, the code is carefully written and well audited, and its failures have consequences. It is often based on a real-time operating system. The non-critical component is usually far larger, based on a common Linux distribution or Android, it is more prone to crashing, and its failure is just an inconvenience. In these situations, it is crucial to fully isolate the two components so that they cannot affect each other. The non-critical elements cannot be allowed to interfere with the critical elements in any way.
It was great to see such a strong validation by Intel of embedded hypervisors as a solution to this well-known problem. In fact, embedded hypervisors are indeed an excellent fit for this use-case: they offer the needed isolation and security properties to the applications running on top while meeting real-time requirements. Differently from traditional cloud hypervisors, they introduce only a small overhead, and they are quick to boot. Most importantly, embedded hypervisors need to be small in size, because a larger code base inescapably implies more overhead and bugs.
In the Xen on ARM community, we started seeing the hypervisor being used to solve this problem as far back as 2014. Galois, an R&D institute based in Portland, joined Xen Development Summit that year to show a demo of a Parrot drone using Xen on ARM to isolate SMACCMPilot, an open source autopilot software, from the rest of the system. In fact, supporting mixed-critical workloads has been the primary use-case for Xen on ARM during the last four years.
Although Xen was born as an x86 hypervisor for servers (Amazon Web Services created “the cloud” with Xen), Xen on ARM has been a newer, more recent, development effort mostly popular in embedded. Most don’t know that Xen is a natural fit for embedded because it has a micro-kernel design, and as such, it is small and very flexible. It is more similar to the L4 family of micro-kernels than to a monolithic kernel like Linux. Xen on ARM tends to be exceptionally small because it isn’t just a port of Xen to the ARM instruction set — it was purposely rearchitected to take advantage of the latest hardware advancements. As a consequence, Xen on ARM while being featureful has a far smaller code base compared to the older Xen on x86.
Today, Xen on ARM is stable, mature, and is about 65K LOC in size. It is larger than ACRN but still small and certifiable. It supports both ARMv7 and ARMv8 architectures (ACRN is for Intel processors). Xen on ARM comes with a Kconfig infrastructure which can be used to configure the size of the hypervisor depending upon user needs. As a community, we haven’t put much effort into reducing Xen on ARM further so far, because 60-70K LOC are small enough for most users, including those who are looking for certifications. However, we should be able to cut down the size to as low as 30K LOC if needed.
Xen on x86 is different: it is larger, and the code base is older and less “clean” than Xen on ARM. Fortunately, experts in the Xen community are making significant progress toward redesigning Xen on x86 to match Xen on ARM’s elegance and simplicity.
Roger Pau Monne, FreeBSD maintainer, is working on completing a new mode of executing called “PVH” which effectively is the x86 equivalent of a Xen on ARM virtual machine. PVH is lightweight and exploits the hardware as much as possible. Roger wrote: “With PVH Dom0, we’ll be able to have HVM/PVH-only deployments, which will lead to a much smaller code base in both the Xen hypervisor and Dom0 kernels.”
Wei Liu, Xen tools maintainers and longtime contributor to the project, is working on introducing new Kconfig options in the hypervisor to chop the code base into smaller pieces, drastically reducing the lines of code count of the final build. Wei aims to go down to 130K LOC soon, then move further down, ideally reaching Xen on ARM’s size.
Starting off from an older code base like Xen on x86 to support a new use-case has its challenges, but as a trade-off, the project comes with many features, and it is very well-tested. It also comes with a healthy community, a security team, a transparent security process, a CI-loop, and stable trees maintained for up to three years. Xen on ARM has been in the very fortunate position of being able to reuse the existing enterprise-grade infrastructure of Xen Project while starting from a mostly new code base.