InfraRed Image Of Someone Hiding In The Dark
InfraRed Image Of Someone Hiding In The Dark

If you've read my article Infra-Red Imaging With The Grid-EYE AMG8833 Sensor, you'll have read how easy it is to connect the sensor to a Raspberry Pi and to start getting Infra-Red images. The software that I'm using to display the images, from https://github.com/makerportal/AMG8833_IR_cam, gets the job done but it's quite slow. How slow? Well, on a Raspberry Pi 3 it displays 3 or 4 frames per second at best, although the sensor handles 10 frames per second. I know it's provided as a demo to get you going with the sensor, but faster is always better!

Why Is The Demo Software Slow? What's Going On?

The code used to retrieve and display the data has three things happening, one after the other:

  1. Retrieve the data from the sensor, and convert it to a usable format
  2. Scale the data into pixels for displaying on the screen, using interpolation to smooth out the gaps
  3. Actually display the data on the screen

There are a few things you might be wondering at this point: can we use multithreading or some other type of parallel programming? Would making things asynchronous help? This article looks like it might help: Concurrency, Multiprocessing, Multi-threading and Asynchronous Code! Microservices are trendy - what if we used those?

All of those are good questions, and they all look awesome on your resume, but there's one thing that we need to do before anything else - measure the performance of the system. This might sound boring, but we can automate it. Using the Python time.perf_counter(), we can record the times it takes to do each step, run the read/interpolate/display loop, say, 1000 times to get a decent average, and then get the mean times.

Adding this to the demo code in the original article by changing the original:

#####################################
# Plot AMG8833 temps in real-time
#####################################
#
pix_to_read = 64 # read all 64 pixels
while True:
    status,pixels = sensor.read_temp(pix_to_read) # read pixels with status
    if status: # if error in pixel, re-enter loop and try again
        continue
    
    T_thermistor = sensor.read_thermistor() # read thermistor temp
    fig.canvas.restore_region(ax_bgnd) # restore background (speeds up run)
    new_z = interp(np.reshape(pixels,pix_res)) # interpolated image
    im1.set_data(new_z) # update plot with new interpolated temps
    ax.draw_artist(im1) # draw image again
    fig.canvas.blit(ax.bbox) # blitting - for speeding up run
    fig.canvas.flush_events() # for real-time plot

to:

read_times = []
interp_times = []
draw_times = []

for _ in range(0, 10000):
    read_start_time = time.perf_counter()

    status,pixels = sensor.read_temp(pix_to_read) # read pixels with status
    if status: # if error in pixel, re-enter loop and try again
        continue
    
    T_thermistor = sensor.read_thermistor() # read thermistor temp
    read_end_time = time.perf_counter()
    
    fig.canvas.restore_region(ax_bgnd) # restore background (speeds up run)
    new_z = interp(np.reshape(pixels,pix_res)) # interpolated image

    interp_end_time = time.perf_counter()
    
    im1.set_data(new_z) # update plot with new interpolated temps
    ax.draw_artist(im1) # draw image again
    fig.canvas.blit(ax.bbox) # blitting - for speeding up run
    fig.canvas.flush_events() # for real-time plot
    draw_end_time = time.perf_counter()
    
    
    read_times.append(read_end_time - read_start_time)
    interp_times.append(interp_end_time - read_end_time)
    draw_times.append(draw_end_time - interp_end_time)
    
print(f"Read:   {(statistics.mean(read_times) * 1000):.2f}ms")
print(f"Interp: {(statistics.mean(interp_times) * 1000):.2f}ms")
print(f"Draw:   {(statistics.mean(draw_times) * 1000):.2f}ms")

Note - you'll need to add:

import statistics

if you want to duplicate this.

The results I got for the times to read and draw each image were as follows:

Mean times:
Read:   1.51 ms
Interp: 3.55 ms
Draw:   231.38 ms

It's obvious that reading the sensor and doing the interpolation aren't that demanding. Displaying each new image on the screen takes around 235ms, which is pretty dreadful performance.

I tried changing the display memory on the Raspberry Pi 3 from its initial 76Mb to 256Mb, just in case there wasn't enough memory for the processing that was going on. This made no difference at all.

Normally at this point I'd wonder what was going on in the demo code that was so slow, especially as it's quite straightforward code - it's meant to provide you with a basic working system after all. I've read a few articles that showed techniques they said would drastically reduce the time to display each frame. All of them either didn't make a difference, or failed because the versions of numpy and scipy on the Raspberry Pi 3 didn't support the changes.

My next idea, upgrading Python, numpy and scipy came to a halt when I couldn't find a way to successfully build scipy on the Pi. Again, quite a few forum answers with "it works on my machine", but nothing that worked on a fairly clean Raspberry Pi. First matplotlib wouldn't build, then jpeg and Pillow, then I got those working and couldn't rebuild scipy, and then I decided I'd gone far enough down that particular rabbit hole. The library that did most of the drawing, matplotlib, was already at its latest version, and so there didn't seem to be much I could do with that.

It might look like everything was grinding to a halt, and that I was out of options at this point. Maybe 3 or 4 frames per second is all that software was going to get out of the Raspberry Pi.

This isn't a particularly unusual situation to find yourself in when you're developing a system. Sometimes you do the initial development and find you've taken a wrong turn, or that the system design needs to be corrected, or that the hardware just isn't up to the job. What matters is what you do next. If you can solve the original problem by changing the system design or optimizing part of it, that's probably the best option. Otherwise you'll need to talk to the client and see what they want.

So, to summarize, we're just running too much for it all to work quickly and we need to do some redesigning.

Then, A Delay

This article was started when Python 3.9 was the current version. Then, it got put to one side for a while, a couple of years went by, and I had time to pick it up again. When I re-ran the tests on a Raspberry Pi 3 with newer versions of the Raspberry Pi OS and Python, I got the following averages:

Read:   10.25ms
Interp: 3.54ms
Draw:   103.40ms

Compared with the original figures:

Read:   1.51 ms
Interp: 3.55 ms
Draw:   231.38 ms

The drawing process takes about half of the original time. This is quite possibly because the OS is a 64 bit version now, instead of the original 32 bit, and matplotlib and scipy have been updated since the last time I looked at this.

The interpolation routines take around the same time, so the 64 bit processing hasn't made any difference there.

The odd part is the reading from the I2C bus, which takes 7 times as long now, even with the I2C bus clock explicitly set to run at 400kHz. Without digging too far into the cause, I would point the finger at the smbus library being six years old; that's quite old for software and it won't have kept up with the latest Python features. It might even be running in some sort of emulation mode. In any case, we could dig around in the source code, or just accept that we need to look into this later.

So, with the newer figures, assuming we want 10 frames per second from the AMG8833, we have the following total times to retrieve and display 10 frames:

Read:   102.5 ms
Interp: 35.4 ms
Draw:   1034.0 ms
Total:  1171.9 ms

This is at around 20% slower than we need, so despite the improvements that the new OS and libraries have brought us, we still need to re-architect the solution.

What's Next?

The first thing we can see from the timing figures is that the drawing part of the code alone is too slow to allow 10fps to be displayed. We might be able to do something clever with multiple threads, but that would mean we're relying on the data retrieval and drawing code to run on separate thread. Then we'd need to pass the data when it's been retrieved. It's possible to do this, but even running on its own thread it's doubtful that we could get 10fps out of the Raspberry Pi.

Alternatively, we could just pass the drawing work over to something that can do it quicker - for example a remote machine. This would separate the sensor from the display as well, so remotely viewing the output could be done anywhere on your network.

The first thing to do was to separate the sensor code and make some sort of server. The features I decided I wanted were:

  • Access to the AMG8833 data (ambient and sensor temperatures) from a REST API
  • Work with more modern simd libraries - hopefully they'll be faster than the original used in the example code
  • Don't use more resources than I have to - constantly-open connections to busses etc. tend to cause problems
  • Make the code simple - I'm not aiming for an enterprise customer-site based system, the Pi and sensor sit on my home network in the dining room!
  • No need for HTTPS or authorization yet (see the dining room comment above!)

The data I wanted to get back from the sensor API should include:

  • Ambient temperature
  • The sensor temperatures
  • An error status to see whether the data was read successfully
  • Maybe details of the sensor - name, resolution, etc. This can be added later because I know exactly what I'm using at the moment

I decided to use Python with a FastAPI server to do this because:

  • Python is a language I'm comfortable with
  • FastAPI works asynchronously, so it's going to be more responsive than Flask when my dining room achieves web-scale!
  • I've never used FastAPI to create a REST API, so I'm indulging in a bit of resume-driven development...

Like any other engineer, I started this part of the project by having a rough idea of how I wanted to structure the system, and went on a quest to find some code I could "gain inspiration from". This wasn't entirely successful. There's a lot of code that lets you access and display the data from the AMG8833 on a screen, but it tends to have been written a few years ago with older Python standards, in a procedural style. I decided I'd write my own handler code for the sensor so that:

  • The caller can just create a class whenever they want to access the sensor
  • I can attach another AMG8833 to the Raspberry Pi and use both of them at the same time (This is a "future goal")
  • The code is encapsulated so callers don't need to know much about the sensor; just that it has data and will let you read it.
  • The caller can close the connection to the device at any time. I added close() methods for this, and then added context management __enter__ and __exit__ functions. Now the sensor class can be used as a context manager and will close all the connections regardless of whether exceptions occur or not.
  • I can make the code to read the sensor asynchronous. Again, another "future goal", but where would we be without those?
  • The messages, registers and values to be sent to the AMG8833 should be written as enumerated types. This improves type safety and, in conjunction with linting, reduces the chances of an invalid configuration being sent to the device. The datasheet has warnings about this and, unlike software, some hardware can actually be damaged by sending the wrong configuration. It's an expensive mistake so it's best avoided.

The code I wrote to handle the sensor is available at:

https://github.com/big-jr/sensor-api-server/blob/main/src/sensors/amg8833.py

and the server code, utilizing FastAPI is at:

https://github.com/big-jr/sensor-api-server/blob/main/src/api_server/main.py

After I wrote this, I adapted the original test and display code to show interpolated data from the REST API, instead of directly from the sensor. The code was run on a laptop instead of the Raspberry Pi, inside an Ubuntu Virtual Machine (VM), and gave the following timings:

Read:    33.41ms
Interp:  0.58ms
Draw:    14.09ms

Why did I run this on a VM? Because the older versions of scipy that the demo code needed wouldn't compile on Windows without a Fortran compiler! Even when I installed the Intel free Fortran compiler it failed because Intel have changed the compiler executable's name. At that point I decided that it was easier to just run it on something that I know works: an Ubuntu system.

These figures show that using an HTTP connection, the time to read the data has increased by nearly 200%. I know it now includes serializing the data to JSON, adding it to the response for an HTTP request and then deserializing it, but that's still far too long. On the plus side, interpolating and drawing the data has dropped by over 90% to about 15ms. This brings the average time to retrieve and draw the data, in Python, to under 50ms. That's way under our limit of 100ms. At last!

You can see the outline of someone in front of the sensor
You can see the outline of someone in front of the sensor

Awesome! Can You Make It Even Better?

Funny you should say that... Yes, I can. Now that I have a server sitting there, providing data to as many clients as can connect to it, there's a bottleneck that becomes immediately obvious. The AMG8833 can provide 10 frames per second, but there's nothing stopping it being read much more often than that - you just get duplicate data back.

The sensor is running on a 400kHz bus, which is fast. But it's sitting on a WiFi network that allegedly handles 1.9Gb/s. This is something of a mismatch, so we probably want to fix that. As I said earlier, web-scale awaits my dining room, and I don't want anything stopping that.

If only there were an easy way of fixing this; thankfully there is! One of my early articles - System Performance – Part 2 – Cache Assets! - explains why and when things should be cached, so here's a cached version of the class:

https://github.com/big-jr/sensor-api-server/blob/main/src/sensors/cached_amg8833.py

This restricts the frequency that the AMG8833 gets read, improving the speed of the server so that the figures from the client are now:

Read:    21.41ms
Interp:  0.51ms
Draw:    15.33ms

...or about 27fps with the AMG8833 being read 10 times per second.

Totally Not Bending Down To Steal Stuff!
Totally Not Bending Down To Steal Stuff!

What Do We Have Now?

After this work, there's now a repository on GitHub containing the code for a server to run on a Raspberry Pi. The server reads temperature data from the AMG8833 and returns it as JSON data. It works, although it needs tidying a little.

There's another repository containing a Python client project, which you can run on a remote system connected to the Raspberry Pi via your network. It will read a number of frames and then give you the timings. It's based heavily on the original demo code, but has a Poetry environment and its dependencies have been updated to stop it complaining about missing methods, and to allow it to run on Windows. If you'd like to take a look at it, and maybe try it out, it's at:

https://github.com/big-jr/sensor-api-client

Using these two repositories together gives you the client AND server code you need to allow MULTIPLE connections to the server, letting you watch as things in front of your sensor are displayed in all their infra-red glory!

Two Simultaneous Connections In Separate Operating Systems
Two Simultaneous Connections In Separate Operating Systems

Can This Be Improved Even More?

Things can always be improved! There are a few ways to improve the system, including the "future goals" I mentioned above, but they wander off into different territory. The client code just runs round a loop, reading and drawing as fast as it can. It's currently around 27fps, but with the sensor limited to 10 frames per second, shouldn't we reduce the number of reads? Especially if we have more clients trying to connect one day?

I'll take a look at these, and maybe use them to explain some other software principles, in the near future.

Made In YYC

Made In YYC
Made In YYC

Hosted in Canada by CanSpace Solutions