OTA - Over The Air firmware update

The main design principle of the OTA mechanims is reliability: never end up with a bricked device, roll back on any failure. Therefore, an OTA process never updates firmware code or data in-place. Any failure (e.g. power loss) can end up in broken device. Thus, an OTA mechanism uses independent self-contained flash paritions to hold firmware images (code and data), and an intelligent boot loader makes a decision which partition to boot:

Here is a high level overview of the OTA procedure:

  1. OTA is triggered via one of the many supported methods: HTTP POST request, periodic timer that polls well known location, AWS IoT device shadow change, an OTA.Update RPC command, or other. You can create your method using an OTA API.
  2. A separate flash partition is created to hold a new firmware image - code and data (root filesystem).
  3. A new firmware image is downloaded to the new flash partition. Any failure during that process aborts an OTA.
  4. When new firmware image is successfully copied,
    • All files from the old FS that do not exist in the new FS, are moved to the new FS. This is an important mechanism of preserving user data and device-specific configuration, like conf2.json - conf9.json configuration files, or any other files. Remember: if a firmware image contains a file, it'll override an existing file during OTA. Never put files like conf9.json in your firmware.
    • Boot loader configuration is updated, saying that a new partition exists and the boot loader must boot from it. A new partition is marked dirty, and the "commit interval" time is stored in the boot configuration.
  5. Device reboots. Boot loader boots the new partition. It figures out from the boot configuration that that partition is dirty, unsafe, because the "commit" flag is not set. Therefore it starts the hardware timer that will fire after the "commit interval", and executes the new image.
  6. The new image start, performs a usual boot sequence. At some point a mgos_upd_commit() is called, which sets a "commit" flag in the boot config, marking this firmware "OK". A commit call could be done automatically after the health-checks, or manually by the user. If the commit is not made, a boot config still has "commit" flag not set.
  7. A boot loader timer handler kicks in. It checks the commit flag. If it is set, it silently exits. If not set, i.e. the firmware is still dirty, the rollback is performed: the image to boot, and commit flag are set to their previous values, and device reboots.

The in-depth example of the OTA on CC3200 is given at embedded.com article - Updating firmware reliably

OTA using HTTP POST

This is the simplest method, very useful for development. Of course it works only if the device is directly visible. In order to enable HTTP POST OTA handler, include ota-http-server library in your mos.yml. Then, you can build a new firmware and push it using this command:

curl -v -F file=@build/fw.zip -F commit_timeout=60 http://IP_ADDR/update

Boot configuration section

If the boot config is stored in only one location, it makes it susceptible to failure during updates, which are usually performed as a read-erase-write operation: a reboot after erase and before write is complete could render device unbootable. The time between the two is short, but we set out to make our update process safe at all points, so we have to deal with it. The way we do it by using two config files with versioning, or sequencing. A sequencer is a monotonically decreasing number, so of the two files the one with smaller sequencer is more recent - on figure 2, config 1 is selected as active because it has smaller sequencer. When writing a new config file, we always use the currently inactive (older) slot and it will not become newer until it is written - erased config will be older than any valid one because erased NOR flash is filled with all 1s: