# Chromium OS Embedded Controller Runtime

## Design Principles

1.  Never do at runtime what you can do at compile time The goal is saving flash
    space and computations. Compile-time configuration until you really need to
    switch at runtime.

2.  Real-time: guarantee low latency (eg < 20 us) no interrupt disabling ...
    bounded code in interrupt handlers.

3.  Keep it simple: design for the subset of microcontroller we use targeted at
    32-bit single core CPU for small systems : 4kB to 64kB data RAM, possibly
    execute-in-place from flash.

## Execution Contexts

This is a pre-emptible runtime with static tasks. It has only 2 possible
execution contexts:

-   the regular [tasks](#tasks)
-   the [interrupt handlers](#interrupts)

The initial startup is an exception as described in the
[dedicated paragraph](#startup).

### Tasks

The tasks are statically defined at compile-time. They are described for each
*board* in the [board/$board/ec.tasklist](../board/host/ec.tasklist) file.

They also have a static fixed priority implicitly defined at compile-time by
their order in the [ec.tasklist](../board/host/ec.tasklist) file (the top-most
one being the lowest priority aka *task* *1*). As a consequence, two different
tasks cannot have the same priority.

In order to store its context, each task has its own stack whose (*small*) size
is defined at compile-time in the [ec.tasklist](../board/host/ec.tasklist) file.

A task can normally be preempted at any time by either interrupts or higher
priority tasks, see the [preemption section](#scheduling-and-preemption) for
details and the [locking section](#locking-and-atomicity) for the few cases
where you need to avoid it.

### Interrupts

The hardware interrupt requests are connected to the interruption handling *C*
routines declared by the `DECLARE_IRQ` macros, through some chip/core specific
mechanisms (e.g. depending on whether we have a vectored interrupt controller,
slave interrupt controllers...)

The interrupts can be nested (ie interrupted by a higher priority interrupt).
All the interrupt vectors are assigned a priority as defined in their
`DECLARE_IRQ` macro. The number of available priority level is
architecture-specific (e.g. 4 on Cortex-M0, 8 on Cortex-M3/M4) and several
interrupt handlers can have the same priority. An interrupt handler can only be
interrupted by a handler having a priority **strictly** **greater** than its
own.

In most cases, the exceptions (e.g., data/prefetch aborts, software interrupt)
can be seen as interrupts with a priority strictly greater than all IRQ vectors.
So they can interrupt any IRQ handler using the same nesting mechanism. All
fatal exceptions should ultimately lead to a reboot.

### Events

Each task has a *pending* events bitmap[1] implemented as a 32-bit word. Several
events are pre-defined for all tasks, the most significant bits on the 32-bit
bitmap are reserved for them : the timer pending event on bit 31
([see the corresponding section](#time)), the requested task wake (bit 29), the
event to kick the waiters on a mutex (bit 30), along with a few hardware
specific events. The 19 least significant bits are available for task-specific
meanings.

Those event bits are used in inter-task communication and scheduling mechanism,
other tasks **and** interrupt handlers can atomically set them to request
specific actions from the task. Therefore, the presence of pending events in a
task bitmap has an impact on its scheduling as described in the
[scheduling section](#scheduling-and-preemption). These requests are done using
the `task_set_event()` and `task_wake()` primitives.

The two typical use-cases are:

-   a task sends a message to another task (simply use some common memory
    structures [see explanation](#single-address-space)) and want it to process
    it now.
-   a hardware IRQ occurred, and we need to do some long processing to respond
    to it (e.g. an I2C transaction). The associated interrupt handler cannot do
    it (for latency reason), so it will raise an event to ask a task to do it.

The task code chooses to consume them (or a subset of them) when it's running
through the `task_wait_event()` and `task_wait_event_mask()` primitives.

### Scheduling and Preemption

The system has a global bitmap[1] called `tasks_ready` containing one bit per
task and indicating whether it is *ready* *to* *run* (ie want/need to be
scheduled). The task ready bit can only be cleared when it's calling itself one
of the functions explicitly triggering a re-scheduling (e.g. `task_wait_event()`
or `task_set_event()`) **and** it has no pending event. The task ready bit is
set by any task or interrupt handler setting an event bit for the task (ie
`task_set_event()`).

The scheduling is based on (and *only* on) the `tasks_ready` bitmap (which is
derived from all the events bitmap of the tasks as explained above).

Then, the scheduling policy to find which task should run is just finding the
most significant bit set in the tasks_ready bitmap and schedule the
corresponding task.

Important note: the re-scheduling happens **only** when we are exiting the
interrupt context. It is done in a non-preemptible context (likely with the
highest priority). Indeed, a re-scheduling is actually needed only when the
highest priority task ready has changed. There are 3 distinct cases where this
can happen:

-   an interrupt handler sets a new event for a task. In this case,
    `task_set_event` will detect that it is executed in interrupt context and
    record in the `need_resched_or_profiling` variable that it might need to
    re-schedule at interrupt return. When the current interrupt is going to
    return, it will see this bit and decide to take the slow path making a new
    scheduling decision and eventually a context switch instead of the fast path
    returning to the interrupt task.
-   a task sets an event on another task. The runtime will trigger a software
    interrupt to force a re-scheduling at its exit.
-   the running task voluntarily relinquish its current execution rights by
    calling `task_wait_event()` or a similar function. This will call the
    software interrupt similarly to the previous case.

On the re-scheduling path, if the highest-priority ready task is not matching
the currently running one, it will perform a context-switch by saving all the
processor registers on the current task stack, switch the stack pointer to the
newly scheduled task, and restore the registers from the previously saved
context from there.

### Hooks and Deferred Functions

The lowest priority task (ie Task 1, aka TASK_ID_HOOKS) is reserved to execute
repetitive actions and future actions deferred in time without blocking the
current task or creating a dedicated task (whose stack memory allocation would
be wasting precious RAM).

The HOOKS task has a list of deferred functions and their next deadline. Every
time it is woken up, it runs through the list and calls the ones whose deadline
is expired. Before going back to sleep, it arms a timer to the closest deadline.
The deferred functions can be created using the `DECLARED_DEFERRED()` macro.
Similarly, the HOOK_SECOND and HOOK_TICK hooks are called periodically by the
HOOKS task loop (the *tick* duration is platform-defined and shorter than the
second).

For examples, of hooks and deferred functions, see the [`hooks` testcases].

Note: be specially careful about priority inversions when accessing resources
protected by a mutex (e.g. a shared I2C controller) in a deferred function.
Indeed, being the lowest priority task, it might be de-scheduled for long time
and starve higher priority tasks trying to access the resource given there is no
priority boosting implemented for this case. Also, be careful about long delays
(> x 100us) in hook or deferred function handlers, since those will starve other
hooks of execution time. It is better to implement a state machine where you set
up a subsequent call to a deferred function than have a long delay in your
handler.

### Watchdog

The system is always protected against misbehaving tasks and interrupt handlers
by a hardware watchdog rebooting the CPU when it is not attended.

The watchdog is petted in the HOOKS task, typically by declaring a HOOK_TICK
doing it as regular intervals. Given this is the lowest priority task, this
guarantees that all tasks are getting some run time during the watchdog period.

Note: that's also why one should not sprinkle its code with `watchdog_reload()`
to paper over long-running routine issues.

To help debug bad sequences triggering watchdog reboots, most platforms
implement a warning mechanism defined under `CONFIG_WATCHDOG_HELP`. It's a timer
firing at the middle of the watchdog period if it hasn't been petted by then,
and dumping on the console the current state of the execution mainly to help
find a stuck task or handler. The normal execution is resumed though after this
alert.

### Startup

The startup sequence goes through the following steps:

-   the assembly entry routine clears the .bss (uninitialized data), copies the
    initialized data (and optionally the code if we are not executing from
    flash), sets a stack pointer.
-   we can jump to the `main()` C routine at this point.
-   then we go through the hardware pre-init (before we have all the clocks to
    run the peripherals normal) and init routines, in this rough order: memory
    protection if any, gpios in their default state, prepare the interrupt
    controller, set the clocks, then timers, enable interrupts, init the debug
    UART and the watchdog.
-   finally, start tasks.

For the tasks startup, initially only the HOOKS task is marked as ready, so it
is the first to start and can call all the HOOK_INIT handlers performing
initializations before actually executing any real task code. Then all tasks are
marked as ready, and the highest priority one is given the control.

During all the startup sequence until the control is given the first task, we
are using a special stack called 'system stack' which will be later re-used as
the interrupts and exception stack.

To prepare the first context switch, the code in `task_pre_init()` is stuffing
all the tasks stacks with a *fake* saved context whose program counter contains
the task start address, and the stack pointer is pointing to its reserved stack
space.

### Locking and Atomicity

The two main concurrency primitives are lightweight atomic variables and heavier
mutexes.

The atomic variables are 32-bit integers (which can usually be loaded/stored
atomically on the architecture we are supporting). The `atomic.h` headers
include primitives to do atomically various bit and arithmetic operations using
either load-linked/load-exclusive, store-conditional/store-exclusive or simple
depending on what is available.

The mutexes are actually statically allocated binary semaphores. In case of
contention, they will make the waiting task sleep (removing its ready bit) and
use the [event mechanism](#events) to wake up the other waiters on unlocking.

Note: the mutexes are NOT triggering any priority boosting to avoid the priority
inversion phenomenon.

Given the runtime is running on single core CPU, spinlocks would be equivalent
to masking interrupts with `interrupt_disable()` spinlocks, but it's strongly
discouraged to avoid harming the real-time characteristics of the runtime.

## Time

### Time Keeping

In the runtime, the time is accounted everywhere using a **64-bit**
**microsecond** count since the microcontroller **cold** **boot**.

Note: The runtime has no notion of wall-time/date, even though a few platforms
have an RTC inside the microcontroller.

These microsecond timestamps are implemented in the code using the `timestamp_t`
type, and the current timestamp is returned by the `get_time()` function.

The time-keeping is preferably implemented using a 32-bit hardware free running
counter at 1Mhz plus a 32-bit word in memory keeping track of the high word of
the 64-bit absolute time. This word is incremented by the 32-bit timer rollback
interrupt.

Note: as a consequence of this implementation, when the 64-bit timestamp is read
in interrupt context in a handler having a higher priority than the timer IRQ
(which is somewhat rare), the high 32-bit word might be incoherent (off by one).

### Timer Event

The runtime offers *one* (and only one) timer per task. All the task timers are
multiplexed on a single hardware timer. (can be just a *match* *interrupt* on
the free running counter mentioned in the [previous paragraph](#time-keeping))
Every time a timer is armed or expired, the runtime finds the task timer having
the closest deadline and programs it in the hardware to get an interrupt. At the
same time, it sets the TASK_EVENT_TIMER event in all tasks whose timer deadline
has expired. The next deadline is computed in interrupt context.

Note: given each task has a **single** timer which is also used to wake up the
task when `task_wait_event()` is called with a timeout, one needs to be careful
when using directly the `timer_arm()` function because there is an eventuality
that this timer is still running on the next `task_wait_event()` call, the call
will fail due to the lack of available timer.

## Memory

### Single Address Space

There is no memory isolation between tasks (ie they all live in the same address
space). Some architectures implement memory protection mechanism albeit only to
differentiate executable area (eg `.code`) from writable area (e.g., `.bss` or
`.data`) as there is a **single** **privilege** level for all execution
contexts.

As all the memory is implicitly shared between the task, the inter-task
communication can be done by simply writing the data structures in memory and
using events to wake the other task (given we properly thought the concurrent
accesses on those structures).

### Heap

The data structure should be statically allocated at compile time.

Note: there is no dynamic allocator available (e.g. `malloc()`), not due to
impossibility to create one but to avoid the negative side effects of having
one: ie poor/unpredictable real-time behavior and possible leaks leading to a
long-tail of failures.

-   TODO: talk about shared memory
-   TODO: where/how we store *panic* *memory* and *sysjump* *parameters*.

### Stacks

Each task has its own stack, in addition there is a system stack used for
startup and interrupts/exceptions.

Note 1: Each task stack is relatively small (e.g. 512 bytes), so one needs to be
careful about stack usage when implementing features.

Note 2: At the same time, the total size of RAM used by stacks is a big chunk of
the total RAM consumption, so their sizes need to be carefully tuned. (Please
refer to the [debugging paragraph](#debugging) for additional input on this
topic.)

## Firmware Code Organization and Multiple Copies

-   TODO: Details the classical RO / RW partitions and how we sysjump.

## Power Management

-   TODO: talk about the idle task + WFI (note: interrupts are disabled!)
-   TODO: more about low power idle and the sleep-disable bitmap
-   TODO: adjusting the microsecond timer at wake-up

## Debugging

-   TODO: our main tool: serial console ... (but non-blocking / discard
    overflow, cflush DO/DONT)
-   TODO: else JTAG stop and go: careful with watchdog and timer
-   TODO: panics and software panics
-   TODO: stack size tuning and canarying

-   TODO: Address the rest of the comments from https://crrev.com/c/445941

\[1]: bitmap: array of bits.

[`hooks` testcases]: ../test/hooks.c