Copyright 2017-2018 Jason Ross, All Rights Reserved

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive
 
Lambda, A Greek Letter

In my last article on AWS Lambdas - Creating An AWS Lambda With Dependencies Using Python - I described how to create an AWS Lambda using Python, where the Python code has dependencies on other packages such as requests, and you're seeing errors like:

Unable to import module 'lambda_function': No module named 'requests'

That method works well for many packages, but there are limits.

Eventually, as you write more and more Lambdas, you’ll decide you want to include something like NumPy for some serious data processing or maybe Pillow for image processing. You’ll add it to your requirements.txt file and import it in your Python source code. Then you’ll upload the resultant ZIP file to the Lambda, trigger it, and watch as it throws an error like:

Unable to import module 'lambda_function': cannot import name '_imaging' from 'PIL' (/opt/python/lib/python3.7/site-packages/PIL/__init__.py)

or maybe:

/opt/python/lib/python3.7/site-packages/PIL/_imaging.so: undefined symbol: _Py_ZeroStruct

or

Unable to import module 'lambda_function': Missing required dependencies ['numpy']

or pretty much any other error that says that NumPy, Pillow or any other large package produces and import error.

I’ve encountered something like this before, and after a lot of exploration found the following explanation:

Some packages such as NumPy and Pillow are downloaded as C source code, and then built on the local machine using the local operating system’s C compiler. This allows them to be optimized to run on your hardware and operating system.

So, if you download one of these packages onto your machine it will be compiled against your local operating system, and then when you package it and upload it to the Lambda it will be run on a version of AWS Linux. This is because, although AWS Lambdas abstract away the operating system, they’re really running on a version of AWS Linux. The two versions of Linux are not entirely the same, and so import and linking-type errors get thrown.

To stop this happening, you need to build the package on an AWS Linux instance that is compatible with Lambdas. You can use a EC2 instance, and that should work nicely, or you can make it simpler and use Docker. That way you can create a Docker container using the same version of AWS Linux as the one used to run Lambdas, compile the dependencies locally, build the package containing the lambda and upload it to your AWS account.

Please note that up to now I’ve only ever done this on Linux. It’s probably possible to do it on Windows as well, but I’ve never tried.

Now is probably as good a time as any to point out that packages such as NumPy are BIG. Certainly big enough that repeatedly uploading it during the development phase will slow everything down. It’s also very likely that if you have one Lambda using something that large, you’ll probably have several using the same package. As duplication is generally a bad thing, you might want to share the package between a number of Lambdas.

Luckily, AWS have already though of this and, although they weren’t released until late 2018, Lambda Layers let you share code between Lambdas, so you just upload the packages once, and then use them from multiple Lambdas. Lambda Layers are built in a similar way to Lambdas, and can be uploaded in the same way.

It’s also very possible to create Lambda Layers containing compiled packages in exactly the same way as you create Lambdas, using Docker and a container running AWS Linux.

To show how this can be done, below is a Bash script which runs on Linux and allows the creation of either a Lambda deployment ZIP package or a Lambda Layer deployment ZIP package. It’s a little rough around the edges, but it works and takes Python files and requirements.txt files, creates a deployment package and then compresses it into a ZIP file.

#!/bin/bash

TEMP_PACKAGE_NAME="temp_package.zip"


function usage()
{
  echo "Usage: cmd SOURCE [-o FILENAME] [-p VERSION] | [-v VERSIONS]"
  echo "     "
  echo "  SOURCE The directory containing the source code for the package"
  echo "     "
  echo "  -o, --output           The path and name of the package ZIP file"
  echo "     e.g.:"
  echo "         -o mypackage.zip"
  echo "     "
  echo "  -p, --package_version  Create a Lambda deployment package for Python version VERSION"
  echo "     e.g.:"
  echo "         -l 3.8"
  echo "     "
  echo "  -v, --layer_versions   Create a Lambda Layer package including specified depedencies"
  echo "     for Python versions specified in comma-separated list, e.g.:"
  echo "         -v 3.7,3.8"
}


function build_layer_package()
{
  # Build a Lambda Layer package
  # Arguments: $1: Source Directory
  #            $2: Output path
  #            $3: Lambda Layer versions
  echo "Building a layer package with runtimes: ${3}"
  versions=${3};
  for i in $(echo ${versions} | sed "s/,/ /g")
  do
    echo "Python version ${i}"
    mkdir -p "python/lib/python${i}"
    docker run -v "$PWD":/var/task "lambci/lambda:build-python${i}" /bin/sh -c "pip install --upgrade -r ${1}/requirements.txt -t ${1}/python/lib/python${i}/site-packages/; exit"
  done

  # Ensure the ZIP file contains the correct paths
  cd "${1}"
  zip -r ${TEMP_PACKAGE_NAME} ./python > /dev/null
  rm -Rf ./python
  cd "$OLDPWD"
  mv "${1}/${TEMP_PACKAGE_NAME}" "${2}"
}


function build_lambda_package()
{
  # Build a Lambda package
  # Arguments: $1: Source Directory
  #            $2: Output path
  #            $3: Lambda runtime version
  echo "Building a lambda package with runtime: ${3}"
  version=${3};
  docker run -v "$PWD":/var/task "lambci/lambda:build-python${version}" /bin/sh -c "pip install --upgrade -r ${1}/requirements.txt -t ${1}/libs; exit"

  # Ensure the ZIP file contains the correct paths
  cd "${1}"

  zip -r ${TEMP_PACKAGE_NAME} ./libs > /dev/null
  zip -r ${TEMP_PACKAGE_NAME} ./*.py > /dev/null
  zip -r ${TEMP_PACKAGE_NAME} ./requirements.txt > /dev/null
  rm -Rf ./libs
  cd "$OLDPWD"
  mv "${1}/${TEMP_PACKAGE_NAME}" "${2}"
}

# If the first argument is non-existent or requests help, show it and exit
if [[ (-z $1 || "$1" == "-h" || "$1" == "--help") ]]; then
  usage
  exit
fi

# Otherwise, first argument should be the source directory
source_directory="$1"
shift

layer_versions=
package_runtime=
output_file="package.zip"


while [ "$1" != "" ]; do
    case $1 in
        -v | --layer_versions )
            shift
            layer_versions="$1"
            echo "Package runtime: ${layer_versions}"
            if [ -z "${layer_versions}" ]; then
              # layer_versions argument length is 0
              echo "Invalid option: v/layer_versions requires an argument"
              exit 1
            elif [ -n "${package_runtime}" ]; then
              # package_runtime length > 0
              echo "Invalid option: v/layer_versions cannot be specified as well as p/package_runtime"
              echo "Layer versions: ${package_runtime}"
              exit 1              
            fi
          ;;

        -p | --package_runtime )
            shift
            package_runtime="$1"
            echo "Package runtime: ${package_runtime}"
            if [ -z "${package_runtime}" ]; then
              # package_runtime argument length is 0
              echo "Invalid option: p/package_runtime requires an argument"
              exit 1
            elif [ -n "${layer_versions}" ]; then
              # layer_versions length > 0
              echo "Invalid option: p/package_runtime cannot be specified as well as v/layer_versions"
              echo "Layer versions: ${layer_versions}"
              exit 1              
            fi
          ;;

        -o | --output )
            shift
            output_file="$1"
          ;;

        -h | --help )
            usage
            exit
          ;;

        * )
            usage
            exit 1

    esac
    shift
done

# Actually do the building
if [ -n "${layer_versions}" ]; then
  build_layer_package ${source_directory} ${output_file} ${layer_versions}
elif [ -n "${package_runtime}" ]; then
  build_lambda_package ${source_directory} ${output_file} ${package_runtime}
fi

exit 0

This solves the problems of compiled Python packages not running when they’re uploaded to AWS Lambdas.

Citations

The actual method of using Docker to create the Python package was derived from the code published at: How do I create a Lambda layer using a simulated Lambda environment with Docker?