KOS Release 2022.7

July, 2022
April 30, 2024

2022 July release

This release is for version 0.11.0 of the KOS platform. This is intended as the last scheduled minor release before the first major release, 1.0.0, is delivered around the end of 2022.

The goal for this release is to improve the developer experience when working with KOS, to enable persistent upgrades and to improve KOS’s reliability.

Highlights

Persistent Upgrades

The KOS boot process can now load the system manifest and individual app images from a file system on a block storage device. KOS can also write new manifests and app images to the file system when they are received during a remote upgrade.

Previously, booting a device from storage required app images to be combined with the seL4 kernel into a single image loaded by a boot loader. Remotely delivered upgrades were ephemeral. Both aforementioned behaviors are still the default. To enable booting from storage, the manifest property persistent-boot needs to be defined in the root manifest node e.g

Manifests and app images are now cached in storage indefinitely, identified by their SHA256 hashes. Remote upgrades only need to deliver files that are not already cached on the device. The system tracks manifest generations, and only updates the link to the current generation when an upgrade completes successfully.

This also enables robust handling of failed upgrades, since the system can fall back to the most recent manifest version that successfully booted.

Support scripts for Arm QEMU simulation of KOS systems now build a file system, and use it for the initial system boot. For real devices, a build-bootfs script is available for building a file system suitable for flashing a physical storage medium.

Block device and file system support

This release includes new internal libraries for low-level access to block devices and file systems.

These are currently intended to support booting from storage, and for persisting upgrade files to storage, but might be extended to user applications in the future.

Block devices initially supported include:

  • A virtio block device that can be used with QEMU simulation.
  • The MMCSD controller on the AM335X (BeagleBone Black).

File systems initially supported include:

  • ext4

Cached App Upgrades

If the persistent-boot option is enabled for storing images to a persistent file store, system upgrades only need to download new binaries for each new upgrade - applications that are unmodified won’t require re-downloading. This improves the speed of the download phase of an upgrade and reduces the total amount of data that needs to be transferred.

Improved Fault Handling

The KOS device-tree manifest now supports a new top-level supervision node for specifying additional apps that should be restarted when an individual app faults. Each property in the supervision node associates an app with a list of additional apps that depend on it. For example, the kos_core example includes the following supervision node:

supervision {
 ethernet = "tunnel", "hello-phx", "admin";
 tunnel = "hello-phx", "admin";
};

This specifies that if the ethernet app faults, then four apps (ethernet, tunnel, hello-phx and admin) will be torn down in reverse order, and then reinitialised in forward order.

Similarly, if the tunnel app faults, then it and the hello-phx and admin apps will be restarted.

Key store

A new KOS core app, the key_store, which provides a centralized private key storage service to allow programs to access their keys and prevent any unauthorized logical access of keys from unauthorised programs.

Apps which previously obtained keys from hard-coded values or ad-hoc configuration files now retrieve keys from the key store. These include the admin component (keys for authenticating to Orbit) and the tunnel component (VPN keys).

This is an initial implementation, with all protections implemented via software. Future releases will build on this to add hardware-protected key stores.

New Orbit Concepts

Orbit now contains concepts of an:

  • Organisation - a company that devices, users and related concepts belong to. This is currently used to determine access rights.
  • Role - a user may either by an “owner” of an organisation, granting them the right to make any change to the organisation (such as adding or removing users, device, etc), or a “user” who is only able to view the organisation and its devices / users / etc
  • Project - a collection of devices (and related structures) that should perform one functionality, eg a group of weather stations.
  • Deployment - a collection of devices used for a specific part of a project. For example, a project may contain a “development” deployment and a “production” deployment. Devices must now belong to one deployment.
  • Release - the manifest and associated applications that is (or should be) deployed to all devices in a deployment.

In addition to upgrading the software on a single device, Orbit now supports attaching a release to a deployment, where it will be automatically rolled out to all devices in the deployment. The devices will either be updated immediately or, if they are not connected when the new release is attached, they will be updated when they next connect to Orbit.

Kry10 Studio

A new repository is added, studio, that contains an early version of the Kry10 studio that provides developer tooling for managing and developing Kry10 platform devices. This early version provides the following components:

  • kos_build.dockerfile describes a docker container that uses mounted volumes and a nix-shell to form a basic development environment. Docker is currently used as the outermost wrapper to minimize installation friction.
  • kos-init tool:kos-init is a shell script that is found in the path of the development container. It is used to inject a KOS platform installation into a new CMake build. If the KOS platform hasn’t been built before, the script will create a build and install it to the /kry10/binaries/ directory under a platform spec name. A platform spec is a ordered list of comma separated positional arguments: PLATFORM,MCS,Build-mode. If arguments are left empty, defaults will be used. A PLATFORM must always be provided. Some examples: qemu-arm-virt, am335x, qemu-arm-virt,mcs,release, qemu-arm-virt,,release, qemu-arm-virt,mcs,debug.
  • Platform management tools: These tools are intended to be used outside of the container, to run a system built inside the container:
    • reload_board: Reload a dev board using Fastboot over uboot
    • reload_board-macos: macos version of reload_board
    • simulate-arm: Launch a qemu simulation of the qemu-arm-virt platform
    • simulate-arm-macos: macos version of simulate-arm
    • simulate-x86: Launch a qemu simulation of the x86_64 platform
    • simulate-x86-macos: macos version of simulate-x86
    • build-bootfs: Build a bootfs image sourced from a KOS build
    • kostrace2json: Convert KOS trace binary data to Google’s JSON trace format
  • Studio command line tool: An erlang escript for uploading releases to Orbit. This can be run using studio <subcommand> <subcommand args> Currently the only subcommand available is upload_release.

Improved Manifest Language

Simple systems can now be described using Elixir rather than needing to write a dts file. Future releases will expand this to allow for more sophisticated system description than the existing manifest.dts files.

A mix task can be used to turn the Elixir code into a .dts file to support extra scrutiny if desired.

See kos_core/kos_manifest and kos_testbed/tracing_example/manifest for examples.

App Environment Variable support

The system manifest now supports setting environment variables for each app. These environment variables will be setup by the application loader and each application can access its environment variables via the C standard library char *getenv(const char *name) function.

Device Aux scripts

An experimental mechanism is added that allows per-app initialization of shared system resources such as a clock-controller via providing a device script via aux_device_setup_script, a new property in the app’s resources node in the manifest. The script will be executed each time the app is started and is intended to be used for common device functions such as enabling and configuring device clocks without providing unrestricted access to the clock controller device to the configuring app.

This feature works similarly to device reset scripts, but it can also access an extra set of hardware resources that are defined globally. Currently this is only implemented for the am335x platform and provides access to the clock controller and EDMA channel controller modules. Examples of this type of device scripts can be found in the can_test and block_test apps.

Inspecting Manifests

Orbit now enables users to view the apps and protocol connections declared in manifests. The manifests can either be uploaded to Orbit as part of a release, or can be fetched from a connected device.

Tracing administration from Orbit

The trace server includes a new control plane that allows starting and stopping individual tracer apps, and collecting trace data from tracer apps. The control plane is enabled by a command-line option on the tracing server. If enabled, the tracing server publishes a new protocol to the message server (kos_trace_admin_protocol).

This means that the trace server can now publish two protocols:

  • The existing kos_trace_protocol allows apps that emit trace events (“tracer” apps) to establish tracing sessions with the trace server. The trace server locally manages low-level tracing sessions on the device.
  • The new kos_trace_admin_protocol (with kos_trace_admin client library) allows a management app to control the trace server. This protocol is intended to support remote access to tracing services.

The Elixir device admin component has also been extended to allow such remote management of tracing services from Orbit. Orbit can start and stop tracing for individual tracer apps on a remote device, and allows Orbit users to download trace data from the device. The SDK includes a script kostrace2json which converts binary trace data to a JSON format supported by third-party tools such as Perfetto.

Multiple message server instances

For systems that require applications to be grouped into multiple zones operating at different security levels, this release includes support for multiple message server instances. The configuration for each app specifies which message server instances the app can connect to.

This is a breaking change to the device-tree manifest configuration structure. The changes are as follows:

  • The device-tree manifest includes a new msg_servers top-level node, whose sub-nodes will specify the message server instances in the system.
  • Each message server instance configuration must contain its own dir_access_control sub-node describing that instance’s access control matrix. There is no longer a top-level dir_access_control node.
  • Each app that needs to connect to one or more message server instances will require a msg_servers property explicitly listing those message server instances. Any app that does not have a msg_servers property will not be connected to any message server instance.

The Elixir tool for generating device-tree manifests (see Improved Manifest Language above) supports multiple message server instances, and generates device-tree manifests in the new format.

Existing apps that are only intended to connect to a single message server instance can continue to use the existing message server APIs, and should therefore be source compatible with the new release.

For new apps that need to connect to multiple message server instances, there is a new API kos_msg_set_active_server_id, which selects the message server instance to use in subsequent message server API calls.

There is updated message server documentation in kos_core/msg_server/design.md, and an example system with multiple message server instances in kos_testbed/multi_msg_server.

Consolidated core apps

Several apps that were previously considered “core” apps are now treated as regular user apps: rng_server, logging and self_test. These apps are now optional. Systems that want to include these apps must include them in the user apps manifest.

Beaglebone black RTC clock support

The clock core app on the beaglebone black platform now uses the am335x’s realtime clock module (RTC) for initializing it’s monotonic timestamp counter. This means that the value of the monotonic counter won’t be reset across system restarts and will continue to increase monotonically. Additionally, if the RTC module is initialized to a specific wall-clock value before KOS is loaded, KOS will respect this setting. The RTC module on the am335x requires external power to maintain its register values, so if power to the beaglebone black is lost, the clock will be reset. The reset value of the clock is 1 January 2000, 00:00:00. In the future we plan on adding support for synchronizing a realtime clock with an external clock source using some form of network time protocol mechanism.

Internal improvements

Hardening

To reduce surface area, dependencies on some third-party libraries have been removed from critical components. For example, the root server no longer depends on seL4_libs, libplatsupport or musl-libc.

Build system modularisation

CMake and Nix builds have been modularised, making dependencies clearer and reducing CMake namespace pollution.

API incompatibilities

Tracing API

The KOS tracing API now requires a tracer app to declare trace event categories using a new macro KOS_TRACE_DEFINE_CATEGORY. This allows more efficient handling of tracing events. See kos_testbed/tracing_example for usage.

Removed features

C-based system configurations

Previous releases included basic support for specifying system configurations directly in C code. It has not been possible to maintain parity with device-tree-based system configuration, so the ability to define system configurations in C code has been removed.

Third-party software dependencies

For convenience, this section lists specific versions of some key third-party dependencies of this release of the Kry10 Operating System (KOS). We don’t list commit hashes of repositories included in the KOS repo manifest (e.g. seL4), since these are readily accessible from a checkout of the repo manifest.

See also the corresponding section of kos_core/readme.md.

  • KOS development environment:
    • Erlang/OTP 23.3.4.4 (from nixpkgs)
    • Elixir 1.12.1 (from nixpkgs)
    • GCC cross compilers (from nixpkgs):
      • armv7l-unknown-linux-gnueabihf-gcc (GCC) 10.3.0
      • x86_64-unknown-linux-gnu-gcc (GCC) 10.3.0
      • aarch64-unknown-linux-gnu-gcc (GCC) 11.1.0
    • Clang (from nixpkgs): 11.1.0
    • Rust (custom build): 1.62.0-nightly (8f36334ca 2022-04-06)
  • KOS on-device software:
    • Erlang/OTP 23.3.4.16 (custom build)
    • OpenSSL 1.1.1q (custom build)