Deploying a Home Assistant Voice Assist on a OrangePI Zero using Balena
This yak shave started when I installed Tandoor so I could store all of our recipes in one place. It has a handy meal planner function that has been super useful for organising our weekly meal plans. It also has a shopping list builder, which is also pretty handy.
However, we currently use a rube goldberg setup of Microsoft Todo and Amazon Alexa with some third party Alexa app for shopping lists, as it means my wife and I can easily share the list, I can see it on my watch while shopping and we can add things to it using voice commands (which is very useful when cooking).
The problem is both Microsoft Todo and Alexa have been on a steady route of enshittification. The latest version of Todo’s watch integration has been getting worse and worse, and Alexa has been slowly getting dementia.
I began to wonder if I could build a little shim app that would allow me to monitor the contents of the Amazon Alexa Shopping list, and populate the Tandoor list, allowing us to get rid of Microsoft Todo completely.
It turns out the enshittification was going to accelerate in June, when the Alexa team will drop support for the Todo list REST api. I’m pretty sure this will mean the Todo integration will die, and one of the main use cases for our Alexa will stop working.
Great.
I’ve been meaning to try out the fruits of the Home Assistant year of the voice for awhile - maybe it was time.
Setting up Home Assistant
First things first, I needed to work out how local voice assistants worked in Home Assistant.
Clearly, my search skills aren’t what they used to be (lol, no - search engines are just shit now), but it seemed like I needed install Rhasspy - which is no longer the case.
The main developer behind Rhasspy now works for Nabu Casa (the entity that builds Home Assistant), and has created a bunch of new packages. The new packages are whisper for speech-to-text, piper for text-to-speech. These packages are glued together using a new protocol called wyoming.
If I was using HASSOS to run home assistant, it would have been a straightforward matter of installing the add-on, and enabling the integrations. However, I run home assistant in a Docker container, which made things slightly more complicated.
After a bit more searching, I managed to find references to the Docker containers that HASSOS use under the hood, and I was able to translate them into docker-compose entries.
---
version: 3.0
services:
wyoming-piper:
image: rhasspy/wyoming-piper
container_name: wyoming-piper
restart: unless-stopped
volumes:
- /config/piper/data:/data
ports:
- "10200:10200"
command: --voice en_US-lessac-medium
wyoming-whisper:
image: rhasspy/wyoming-whisper
container_name: wyoming-whisper
restart: unless-stopped
volumes:
- /config/whisper/data:/data
ports:
- "10300:10300"
command: --model tiny-int8 --language en
Next, I added the Wyoming integration. The trick here is to add two instances, one with port 10200 and one with 10300 - the integrations will automatically work out which is which and add the correct entities.
Finally, I setup a new Assistant, using the faster-whipser and piper options for text-to-speech and speech-to-text.
After testing it out using the Assist icon in the web app, I confirmed it was all working!
Time for some hardware
Having to open a web browser, and click and icon to task home assistant to do a thing isn’t quite the same experience as using an Amazon Echo Dot.
First of all, we’d need a wake-word to replace the icon click, and maybe some sort of stand alone hardware to replace the web brower.
Thankfully, there are numerous examples of prior art.
The two main options seem to be using a Raspberry PI or equivalent and a USB microphone and speaker; or an ESP32 with a MEMS microphone and I2S speaker.
While I actully had all the bits to do the latter kicking around my workshop, there are a few disadvantages with this setup.
The ESP32 doesn’t really have the grunt to do a the AI processing to perform the wake word analysis, so it basically constantly streams audio back to home assistant, whcih does the analysis there. I don’t love this - it seems like a bit of a waste of resources to chew up bandwidth, and CPU just to look for a wake word.
Now, they have started another project to use a even smaller and more effecient wake word detector that can run on the ESP, but it’s not ready.
So, I started looking into some sort of cheap and low power embedded linux system that could run wake-words, and probably speech-to-text and text-to-speech, effectively just pushing JSON intents to Home Assistant, and receiving text back.
I have a bunch of OrangePi Zeros kicking around which I purchased ages ago, when they were $10 each (they are now $30 - thanks Covid). Couple that with a
USB speakerphone, I reckon these could be a cute little solution.
Because nothing I ever do is easy, I decided to deploy these using Balena. I had had some good success doing this for my zigbee2mqtt device, which is also on an OrangePi zero.
I started out looking at some instructions for setting up Raspberry PIs, and saw there are three components needed: wyoming-mic-external
for accessing microphones, wyoming-snd-external
for accessing speakers and wyoming-satellite
which ties everything together.
I got a copy of all the relevant repositories, and setup the following docker-compose.yaml file, to see how far I got.
version: "2.4"
services:
microphone:
build:
context: ./services/wyoming-mic-external
ports:
- "10600:10600"
devices:
- /dev/snd:/dev/snd
group_add:
- audio
command:
- "--device"
- "sysdefault"
- "--debug"
playback:
build:
context: ./services/wyoming-snd-external
ports:
- "10601:10601"
devices:
- /dev/snd:/dev/snd
group_add:
- audio
command:
- "--device"
- "sysdefault"
- "--debug"
satellite:
build:
context: ./services/wyoming-satellite
ports:
- "10700:10700"
command:
- "--name"
- "my satellite"
- "--mic-uri"
- "tcp://microphone:10600"
- "--snd-uri"
- "tcp://playback:10601"
- "--debug"
Not super far. The default BalenaOS kernel is very bare bones - there were no sound drivers.
Compiling “custom” modules for balena
The BalenaOS image for the OrangePi zero was community submitted, and is very old. One thought I had was to try and figure out how to update it to have the modules I needed, and ideally update it.
But after a bit of research it became clear I needed to leave Zephry, and a bunch of other esoteric embedded things to do that. While that sounds fun, it was beyond the scope of what I wanted to do on this project, so I needed to figure out a different way to get these modules on to the device.
It turns out, a priviledged docker container can insert modules into kernel space, and the accepted way to load custom modules is to build them into a container, and insmod
them from there.
Setting up a cross compile docker container
Started down this track, then thought about th example from balena’s github - that doesn’t do any explicit cross compiling. It also builds on the host, but runs on the target. The balena build process must already be setup for cross compiling. Let’s test that theory…
Compiling a test program
I cribbed this little beauty, and saved it as test.c
#include <stdio.h>
int main()
{
printf("Hello World!");
return 0;
}
Then created this Dockerfile
FROM debian:latest
RUN apt update && apt install -y build-essential
RUN mkdir -p /usr/src/test
COPY ./test.c /usr/src/test/test.c
WORKDIR /usr/src/test
RUN gcc -o /usr/bin/hello_world test.c
ENTRYPOINT [ "/usr/bin/hello_world" ]
and add this to docker-compose.yml
---
hello_world:
build:
context: ./services/test
privileged: true
restart: on-failure
Success!
Compiling a vanilla kernel
Next step - can we recompile a kernel with some modules?
Luckily, the config.gz file was present in the /proc
directory, so we can use that as a starting point.
To get it off the device in the first place, I used ssh
. When in developer mode, balena devices have an open ssh port on 22222
, so after finding the IP address, I ran ssh -t -p 22222 root@192.168.1.244 "gunzip -k -c /proc/config.gz" > .config
. Why not scp
? It didn’t work. There was something about the way it was setup on the remote that caused it to fail.
Next, we need to enable the sound stuff. Add this to the .config file
CONFIG_SOUND=m
CONFIG_SND=m
# CONFIG_SND_OSSEMUL is not set
CONFIG_SND_PCM_TIMER=y
# CONFIG_SND_HRTIMER is not set
# CONFIG_SND_DYNAMIC_MINORS is not set
CONFIG_SND_SUPPORT_OLD_API=y
CONFIG_SND_PROC_FS=y
CONFIG_SND_VERBOSE_PROCFS=y
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set
# CONFIG_SND_SEQUENCER is not set
CONFIG_SND_DRIVERS=y
# CONFIG_SND_DUMMY is not set
# CONFIG_SND_ALOOP is not set
# CONFIG_SND_MTPAV is not set
# CONFIG_SND_SERIAL_U16550 is not set
# CONFIG_SND_MPU401 is not set
#
# HD-Audio
#
# end of HD-Audio
CONFIG_SND_HDA_PREALLOC_SIZE=64
CONFIG_SND_SPI=y
CONFIG_SND_USB=y
# CONFIG_SND_USB_AUDIO is not set
# CONFIG_SND_USB_UA101 is not set
# CONFIG_SND_USB_CAIAQ is not set
# CONFIG_SND_USB_6FIRE is not set
# CONFIG_SND_USB_HIFACE is not set
# CONFIG_SND_BCD2000 is not set
# CONFIG_SND_USB_POD is not set
# CONFIG_SND_USB_PODHD is not set
# CONFIG_SND_USB_TONEPORT is not set
# CONFIG_SND_USB_VARIAX is not set
# CONFIG_SND_SOC is not set
Using debian:latest
failed because (I think) GCC 11 has some issues with some thing in the kernel (I was some sort of double import, I didn’t bother to investigate other than some cursory googling), so I tried to find the oldest version of debian that would still work. At the time of writing, it was debian:buster
which comes out of LTS on 30 June 2024. After that, it may not work anymore with missing mirrors. Who knows (In reality, you can probably find a mirror that works…)
So that took 8 hours on my Macbook Pro because Docker. I fired up a real linux machine so I could build things a bit faster, and then saw a previous attempt at do this nonsense. It was a patched kernel from sunxi 5.4.20-sunxi
and wondered if I could just load up the modules. I mean, it was a long shot, but it didn’t work:
modprobe: can't load module soundcore (kernel/sound/soundcore.ko): invalid module format
So, let’s build on a real linux box. No good - balena uses QEMU when building containers (makes sense - you can run a docker recipe on a different target - the binaries won’t run).
Back to cross compiling?
docker build . -t wyoming-satellite-kernel-builder
docker run --mount type=bind,source=./linux-5.4.18,target=/usr/src/linux --mount type=bind,source=./modules,target=/lib/modules/5.4.18 wyoming-satellite-kernel-builder make oldconfig
FROM alpine
RUN mkdir -p /lib/modules/5.4.18/kernel
COPY modules.builtin /lib/modules/5.4.18/modules.builtin
COPY sound /lib/modules/5.4.18/kernel/sound
ENTRYPOINT ["/bin/sh"]