summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorVadim Bendebury <vbendeb@chromium.org>2020-06-23 13:01:14 -0700
committerCommit Bot <commit-bot@chromium.org>2020-06-26 00:03:56 +0000
commit8196d442a20bac595e5befb01f155419163c8eda (patch)
tree4f53da447e59e7d6da5a278813c19cd04ec6fd85
parent3f0b2cb3b3b21b324f899e1912d8402c94ccb07e (diff)
downloadchrome-ec-8196d442a20bac595e5befb01f155419163c8eda.tar.gz
Cr50: vboot troubleshooting doc
A document describing how AP and H1 interact during the boot up process, and typical failures causing the Chrome OS device falling into recovery mode. BUG=none TEST=none Signed-off-by: Vadim Bendebury <vbendeb@chromium.org> Change-Id: Ib71ffbc9c7dadd5f42923c0bfac038ae7f0ca8e5 Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/ec/+/2261318 Reviewed-by: Andrey Pronin <apronin@chromium.org>
-rw-r--r--docs/cr50_vboot_troubleshooting.md223
-rw-r--r--docs/images/bobba_boot.salbin0 -> 44606 bytes
-rw-r--r--docs/images/read_with_wake_pulse.pngbin0 -> 37846 bytes
-rw-r--r--docs/images/typical_boot.pngbin0 -> 24653 bytes
-rw-r--r--docs/images/typical_read.pngbin0 -> 35884 bytes
5 files changed, 223 insertions, 0 deletions
diff --git a/docs/cr50_vboot_troubleshooting.md b/docs/cr50_vboot_troubleshooting.md
new file mode 100644
index 0000000000..a81eb62afc
--- /dev/null
+++ b/docs/cr50_vboot_troubleshooting.md
@@ -0,0 +1,223 @@
+# Cr50 And Chrome OS Verified Boot Troubleshooting
+
+H1 is a Google security chip installed on most Chrome OS devices. Cr50 is the
+firmware running on the H1. A high level overview of hardware and firmware can
+be found in [this
+presentation](https://2018.osfc.io/uploads/talk/paper/7/gsc_copy.pdf).
+
+This write-up is an attempt to explain how Cr50 participates in the Chrome OS
+device boot process, and what are possible reasons for the dreaded "Chrome OS
+Missing Or Damaged" screen showing up when Chrome OS device reboots.
+
+## Basic overview
+
+The H1 controls reset lines of the EC (embedded controller) and the AP
+(application processor, or SOC). During normal Chromebook operation H1 is
+always powered up as long as battery retains even a minimal amount of charge.
+In Chromeboxes H1 powers on with the rest of the system.
+
+One of the important functions of H1 in the system is a subset of TPM (Trusted
+Platform Module) functionality. The TPM stores verified boot information, this
+is why any **problems communicating with the TPM during the boot up process**
+result in the Chrome OS device falling into **recovery mode**.
+
+Another important function of the H1 in the system is CCD ([closed case
+debugging](https://chromium.googlesource.com/chromiumos/platform/ec/+/fe6ca90e/docs/case_closed_debugging_cr50.md#))
+
+## H1 power states and CCD
+
+During periods of inactivity H1 could enter a *sleep* or *deep sleep* state.
+In *sleep* state most of the clocks are turned off and power consumption is
+minimized, but SRAM contents and the CPU state are maintained. In *deep sleep*
+state the H1 is practically shut down.
+
+The H1 never enters the *deep sleep* state during the Chrome OS boot process,
+but could enter the *sleep* state if the Chrome OS device boot process is
+delayed for whatever reason, and **only when CCD is not active**. This could
+be one of the reasons that there are boot failures when CCD is not connected,
+but the failures go away if CCD is on (the debug cable is plugged in).
+
+To make sure the H1 exits the *sleep* state the AP triggers a wake up event,
+details of which are described below.
+
+## H1 communications with the AP
+
+The H1 could be connected to the AP over the I2C or SPI bus. The same Cr50
+firmware is used in both cases, the decision which of the two interfaces to
+use is made based on resistor straps the Cr50 reads at startup.
+
+Both I2C and SPI interfaces do not fully comply with their respective bus
+standards: the I2C controller does not support clock stretching, and the SPI
+controller can not be clocked faster than 2 MHz.
+
+Look for a text line like the following in the Cr50 console output right after
+power up
+
+> [0.005657 Valid strap: 0xa properties: 0x41]
+
+to confirm that the straps were read properly.
+
+A Cr50 console command allows to see which interface is used to communicate
+with the AP:
+
+> \> brdpprop<br>
+> properties = 0x1141
+
+If the least significant bit of the value is set, the H1 is using the SPI
+interface, if the bit is cleared the H1 is using the I2C interface.
+
+Using H1 imposes additional requirements on the AP interface - the H1 might
+have to be waken up from sleep, and flow controls the AP using an additional
+`AP_INT_L` signal, both described in more details below.
+
+## TPM reset
+
+The H1 is staying up until power is removed, unless it falls into deep sleep.
+TPM is just one of the components of the Cr50 firmware, and the TPM must be
+reset when the AP resets.
+
+There are differences between ARM and X86 reset circuit architectures. ARM
+SOCs have a bidirectional reset signal called `SYS_RST_L`. They (or, rather,
+most of them, but let's not worry about the outliers) generate a pulse on this
+line when the SOC reboots. External device can toggle this line to reset the
+SOC asynchronously, which is what the Cr50 does to reset ARM SOCs.
+
+The X86 SOCs have two separate signals, one output `PLT_RST_L` which is held
+low, while the AP is in reset or in low power mode, and one input,
+`SYS_RST_ODL` which Cr50 toggles to reset the SOC.
+
+In case of X86, when `PLT_RST_L` is held low longer than a second, the Cr50
+considers this an indication of the AP going into a low power mode (S5 or
+lower), which means that the AP will start from the reset vector when it wakes
+up, so Cr50 can take H1 into *deep sleep* mode as well.
+
+On top of that ARM based Chrome OS devices have some additional logic which
+forces the `SYS_RST_L` behave similar to `PLT_RST_L` - it stays low when
+the SOC is in a low power mode and will resume operation from the reset
+vector. This allows H1 to enter deep sleep on ARM devices as well.
+
+Resistor bootstraps tell the Cr50 which kind of reset architecture to expect,
+the SOC reset indication is used both to reset the TPM component and to enter
+the *deep sleep* mode as appropriate.
+
+In the `brdprop` command output bit D5 when set signifies `SYS_RST_L`
+('regular' ARM devices) and bit D6 - `PLT_RST_L` (X86 and modified ARM) type
+of reset.
+
+Boot problems can arise when the AP reboots, without cr50 seeing a pulse on
+the `SYS_RST_L` or `PLT_RST_L` signal: in this case the very first TPM_Startup
+command sent by coreboot returns an error, and the Chrome OS device falls into
+recovery mode.
+
+
+## Cr50 operations synchronization
+
+The H1 microcontroller is very slow (clocked at 24 MHz), the AP is usually
+hundreds of times faster, there is a need to slow down the AP when it tries to
+talk to the TPM during boot up process. The issue is complicated by the
+inability of the I2C controller of stretching the clock.
+
+In both I2C and SPI modes the AP\_INT\_L H1 output signal is used to indicate
+to the AP that the H1 is ready for the next I2C or SPI transaction. By default
+this signal is a 4+ us long low pulse. Some X86 platforms require a pulse of
+100+ us, this pulse extension mode can be configured by setting a bit in a TPM
+register (I2C register address 0x1c or SPI register address 0xfe0).
+
+In any case it is important that the AP firmware is properly configuring the
+pin where the AP\_INT\_L signal is connected as an edge sensitive GPIO, which
+latches on either falling or rising edge of the signal.
+
+AP firmware missing these synchronization pulses results in boot process
+taking very long time and the AP firmware log including messages
+
+> Timeout wait for TPM IRQ!
+
+in case of SPI or
+
+> Cr50 i2c TPM IRQ timeout!
+
+in case of I2C.
+
+## Waking H1 up from sleep
+
+The I2C Start sequence is sufficient for the H1 to resume operation, the AP
+does not have to do anything special. In case of SPI the matters are more
+complicated.
+
+Technically speaking the assertion of the CS SPI bus signal is enough to wake
+up the H1, but it takes time for it to become fully operational, the AP could
+be already transmitting the message by the time the H1 SPI controller is
+ready. This is why in case the previous SPI transaction was a second or more
+ago, the SPI driver is required to first issue a CS pulse without transferring
+any data, just to wake up the H1, then wait for 100 us to let the H1 wake up,
+and then continue with a regular SPI transaction.
+
+If the AP does not follow this protocol and starts transmitting before H1 is
+ready, communications failures are likely, resulting in the Chrome OS device
+falling into recovery. This often happens when the device took a long time to
+find the kernel to boot, and then the AP is trying to lock the TPM state
+before starting up the kernel, but fails, because the H1 was asleep by this
+time and was not properly woken up.
+
+## SPI Message Synchronization
+
+SPI interface is synchronous, and either read or write accesses happen within
+a single transaction. The Trusted Computing Group (TCG) came up with a
+hardware protocol on top of SPI specification to allow the slow device to flow
+control the fast host controller.
+
+The base idea is that each time the AP wants to read or write a TPM register,
+it sends a SPI packet, which consists of the header and data fields.
+
+The header field is always present, it is 4 bytes in size, and includes the
+operation type (read or write), data length and register address.
+
+The header is sent out as soon as the SPI transaction starts, then the AP
+starts monitoring the MOSI line, one byte at a time, paying attention to bit
+D0. The Cr50 keeps sending zeros on that bit, until ready to proceed with the
+operation requested in the transaction header. Once the Cr50 is ready, it
+responds with a byte with bit D0 set to one. At this point the AP knows that
+starting with the next byte the actual data of the transaction can be flowing,
+so it either sends the data in case of write or reads it from the TPM in case
+of reads.
+
+This is described in details in [TCG PC Client Platform TPM Profile (PTP)
+Specification Family "2.0" Level 00 Revision
+00.43](https://drive.google.com/file/d/16r1vDhf1fnggI4BkOBuTXPqOQt4LaFvk/view?usp=sharing)
+in section "6.4 Spi Hardware Protocol".
+
+The AP ignoring this flow control mechanism is yet another common problem
+causing failures to boot, because the driver starts sending or receiving data
+before TPM is ready. This failure is more likely to happen when developing new
+SPI drivers.
+
+## Boot up process examples
+
+A trace of a typical Chrome OS device boot process was collected using the
+[Saleae](https://www.saleae.com/) Logic Pro 16 logic analyzer.
+
+The [full trace](./images/bobba_boot.sal) can be examined in details using the
+Saleae application in the trace analysis mode.
+
+A few detailed snapshots of this trace are shown below (click to expand):
+
+### Full boot sequence
+[![Full boot sequence](./images/typical_boot.png)](./images/typical_boot.png) shows communications between AP
+an H1 during a typical Chrome OS boot: first a flurry of communications
+between Coreboot and the H1, then some time spent verifying and loading
+various firmware stages, then a block of communications between Depthcarge and
+the H1.
+
+### Typical read sequence
+[![Typical read sequence](./images/typical_read.png)](./images/typical_read.png) shows the 4 byte header
+where the read of four bytes from register address 0xd40f00 is requested. The
+TPM is not ready and sends all zeros on the MISO line for three cycles, then
+sends a byte of 01 and then the AP reads four bytes of the actual register
+value (0xe01a2800). Then, after H1 is ready to accept the next SPI transaction
+it generates a pulse on AP\_INT\_L.
+
+### Read with wake pulse sequence
+[![Read with wake pulse](./images/read_with_wake_pulse.png)](./images/read_with_wake_pulse.png) is an example of a
+case where the AP toggles the CS line first, without sending any data, and
+then in 100 us starts the actual SPI transaction completed with the AP\_INT\_L
+pulse.
diff --git a/docs/images/bobba_boot.sal b/docs/images/bobba_boot.sal
new file mode 100644
index 0000000000..ea68f4ef58
--- /dev/null
+++ b/docs/images/bobba_boot.sal
Binary files differ
diff --git a/docs/images/read_with_wake_pulse.png b/docs/images/read_with_wake_pulse.png
new file mode 100644
index 0000000000..8c411a7506
--- /dev/null
+++ b/docs/images/read_with_wake_pulse.png
Binary files differ
diff --git a/docs/images/typical_boot.png b/docs/images/typical_boot.png
new file mode 100644
index 0000000000..c825b0441e
--- /dev/null
+++ b/docs/images/typical_boot.png
Binary files differ
diff --git a/docs/images/typical_read.png b/docs/images/typical_read.png
new file mode 100644
index 0000000000..63988cbf99
--- /dev/null
+++ b/docs/images/typical_read.png
Binary files differ