Machine Learning at the Edge with ESP32

What if we could do machine learning and computer vision on a tiny microcontroller without internet connection? Read about hands-on testing of on-chip face-recognition on a $7 microcontroller.

For years Microsoft, Amazon and Google have pushed their cloud-based machine learning and artificial intelligence services. They have a lot of impressive features and are very simple to implement. However, they come with a lot of challenges when it comes to privacy, the need for constant internet connection, latency and in many cases cost.

Can we get around these limitations and do machine learning without the cloud?

 Even though many machine learning algorithms require huge amounts of processing power, especially when training the models, in many cases the trained model can be quite small, and not require a lot of storage, RAM or CPU to run. Potentially they could run directly on a microcontroller.

What if you could do machine learning and computer vision on a tiny microcontroller without internet connection? That would enable lots of interesting products and services. It turns out that Espressif, the makers of the very popular “IoT chips” ESP8622 and ESP32 have just launched ESP-WHO, a face detection and recognition platform, that runs on ESP32.

What is ESP32?

ESP32 is a series of low-cost low-power system on a chip microcontrollers with Tensilica Xtensa LX6 microprocessor and integrated Wi-Fi and Bluetooth. Development boards are very cheap, generally widely available and relatively easy to get started using.

How to get started?

To use the ESP-WHO platform you will need ESP32, an OV2640 camera module and at 4MB PRAM. You can build this yourself with components from eBay, but the simplest way to get started is to buy an ESP32 development board with a built in OV2640 camera and enough PRAM. Since these cards recently became available on the market, they can be a bit difficult to obtain. Thankfully after a lot of canceled orders on eBay, I managed to get my hands on a few M5Stack M5Cam modules that claimed to have both the camera and the required amount of PRAM. As an extra benefit, they come in a nice Lego-compatible casing, which helps protect the board and camera. 

Does it work?

After googling and reading blog-posts about ESP-WHO I was kind of surprised that I couldn’t find anyone that had tried it, could confirm that it works, provide (working) instructions on how to get it up and running, or say anything about how good the face recognition was. The only way to find out for sure was to try it.

Getting the right board.

There is several boards with ESP32 and OV2640 camera module, but not all of them have the required PRAM. The only version I have gotten to work is the cased version of the M5Cam. (The uncased version is slightly cheaper, but does not have PRAM and the unprotected camera cable snaps very easily)

IMG_6710
 
IMG_6709
 


 
Installing the USB Driver.

To be able to connect to the M5Cam development board you will need the CP210x driver from Silabs. It can be downloaded from here. If you are using Windows 10 make sure to install the Windows 8 driver as the Windows 10 driver does not work with the board.

face-device-manager2
 


 
 
Installing the development environment.

Camera and face-recognition is not (yet) supported by the Arduino IDE for ESP32, so to use the advanced features you will have to set up Espressifs toolchain. This can be a bit unfamiliar if you are used to using the Arduino environment. To get it up and running you will have to complete the following steps.
 
1.     Setup of Toolchain
2.     Getting of ESP-IDF from GitHub
3.     Modify the code to work with M5Stack camera*
 
Setup of toolchain
There are several ways to setup the toolchain and detailed description on how to setup toolchain can be found here. Windows, the simplest way to get started is to download the Windows all-in-one toolchain & MSYS2 zip file from here.

 
Extract the zip to c:\mysys\
Open a MSYS2 MINGW32 terminal window by running C:\msys32\mingw32.exe. The environment in this window is a bash shell.
 
GET ESP-IDF:
Create and navigate to the directory ~/esp  (mkdir esp and cd esp)
git clone --recursive github.com/espressif/esp-idf.git
(remember –recursive)
 
Setup path to esp-idf:
export IDF_PATH="C:/msys32/home/<user-name>/esp/esp-idf"
 

Downloading the example code:
 
 git clone --recursive github.com/espressif/esp-who.git
 

Modifying the code to work with M5Stack:

Since the wiring in the M5Stack camera is slightly different from the example code for ESP-WHO you will have to make a few small changes to the example code to make it work. Replace the GPIO settings include\app_camera.h to the follwoing: (this change will pretty much be the same for all of the examples)

#define PWDN_GPIO_NUM 0

#define RESET_GPIO_NUM 15

#define Y2_GPIO_NUM 32

#define Y3_GPIO_NUM 35

#define Y4_GPIO_NUM 34

#define Y5_GPIO_NUM 5

#define Y6_GPIO_NUM 39

#define Y7_GPIO_NUM 18

#define Y8_GPIO_NUM 36

#define Y9_GPIO_NUM 19

#define XCLK_GPIO_NUM 27

#define PCLK_GPIO_NUM 21

#define HREF_GPIO_NUM 26

#define VSYNC_GPIO_NUM 22

#define SIOD_GPIO_NUM 25

#define SIOC_GPIO_NUM 23

 

Make menuconfig

To flash the example code to the ESP you will have to configure the serial flasher with the correct port.

run "make menuconfig"

serial-flash-config
Select serial flasher config
serial-flash-port
Select the correct port-number (found in deviece manager)
serial-flash-save
Save changes

Flash the example code to the development board you run "make flash"

When the code is flashed to your development board run "make monitor" to connect to the serial monitor.

face-startup
Startup..
face-enrollment
Face detection and enrollment
face-detected
Face recognition

Conclusion

It actually works! The camera requires quite a bit of light to be able to detect a face. I haven't had time to test it on several faces. However, so far, the face recognition seems good enough for a lot of use cases. I probably would not use it for access control to my front door, but for many less critical tasks it would probably work great.