Introduction

Geolambda is a docker-based lambda function package builder.

If we replace GDAL installation in Docker with GEE, we can build a lambda function that can run on AWS.

# GDAL
RUN \
    wget http://download.osgeo.org/gdal/$GDAL_VERSION/gdal-$GDAL_VERSION.tar.gz; \
    tar -xzvf gdal-$GDAL_VERSION.tar.gz; \
    cd gdal-$GDAL_VERSION;

The base of Geolambda is specified in Dockerfile.base:

FROM lambdalinux/baseimage-amzn:2017.03-004

Problem

docker-compose hangs during geos package compilation on Mojave:

libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../../../include -I../../../include/geos -I../../../include -DGEOS_INLINE -pedantic -Wall -ansi -Wno-long-long -ffloat-store -g -O2 -MT MonotoneChainSelectAction.lo -MD -MP -MF .deps/MonotoneChainSelectAction.Tpo -c MonotoneChainSelectAction.cpp  -fPIC -DPIC -o .libs/MonotoneChainSelectAction.o


You can restart docker app if that happens. At the end, GDAL build fails with an error that HDF5 is missing.

checking for H5Fopen in -lhdf5... no
configure: error: HDF5 support requested with arg /usr/local, but no hdf5 lib found
GNUmakefile:1: GDALmake.opt: No such file or directory
./config.status --recheck
make: ./config.status: Command not found
make: *** [config.status] Error 127
GNUmakefile:1: GDALmake.opt: No such file or directory
./config.status --recheck
make: ./config.status: Command not found
make: *** [config.status] Error 127

The above error can be fixed by adjusting HDF5 library version to 1.10.4 in Dockerfile.

[root@befa35da0698 bin]# ./lambda-package.sh
./lambda-package.sh
Creating deploy package for Python 2.7
cp: cannot stat /usr/local/lib/libhdf5.so.101: No such file or directory

The lambda package created by Geolambda works well. However, /vsicurl doesn't work for CERES HDF file with gdal.Open().

 Runtime failure is likely !
ERROR 4: `/vsicurl/https://gamma.hdfgroup.org/ftp/pub/outgoing/NASAHDF/CER_ES4_TRMM-PFM_Edition2_019018.19808.hdf' not recognized as a supported file format.

HDF5 Open Failure for GDAL version 2.3.1

  GDAL 2.3.1 can't open MOP03T. It seems that GDAL 2.4.0 is necessary for vsicurl.

 Error detected in HDF5 (1.10.4) thread 0:\n  #0\
00: H5F.c line 509 in H5Fopen(): unable to open file\n    major: File accessibilty\\
n    minor: Unable to open file\n  #001: H5Fint.c line 1400 in H5F__open(): unable \
to open file\n    major: File accessibilty\n    minor: Unable to open file\n  #002:\
 H5Fint.c line 1546 in H5F_open(): unable to open file: time = Fri Jan 18 02:59:01 \
2019\n, name = '/vsicurl/https://gamma.hdfgroup.org/ftp/pub/outgoing/NASAHDF/MOP03T\
-20131129-L3V5.2.1.he5', tent_flags = 0\n    major: File accessibilty\n    minor: U\
nable to open file\n  #003: H5FD.c line 734 in H5FD_open(): open failed\n    major:\
 Virtual File Layer\n    minor: Unable to initialize object\n  #004: H5FDsec2.c lin\
e 346 in H5FD_sec2_open(): unable to open file: name = '/vsicurl/https://gamma.hdfg\
roup.org/ftp/pub/outgoing/NASAHDF/MOP03T-20131129-L3V5.2.1.he5'

Identifying Band from HDF5

Warning 1: GDAL was built against curl 7.53.1, but is running against 7.51.0. Runti\
me failure is likely !
'NoneType' object has no attribute 'GetStatistics': AttributeError
Traceback (most recent call last):
  File "/var/task/lambda.py", line 26, in handler
    stats = band.GetStatistics(0, 1)
AttributeError: 'NoneType' object has no attribute 'GetStatistics'


/vsis3 looks up s3.amazonaws.com

If  HDF5:%22%2Fvsis3%2Ftest%2FMOP03T-20131129-L3V5.2.1.he5%22:%2F%2FHDFEOS%2FGRIDS%2FMOP03%2FData_Fields%2FRetrievedSurfaceTemperatureDay is submitted, /vsis3 looks up amazonaws.com, not LocalStack S3.

HTTP response code on https://test.s3.amazonaws.com/MOP03T-20131129-L3V5.2.1.he5: 403
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 0:
 #000: H5F.c line 509 in H5Fopen(): unable to open file

The above behavior can be corrected by setting AWS_S3_ENDPOINT but GDAL always uses https:// for data retrieval. https:// cannot be used with LocalStack by default. Define USE_SSL=true when you start LocalStack. However, SSL is not supported by mockit [5].

Python gdal.Open(opendap_url)

/vsimem is invoked on downloaded file.

  #002: H5Fint.c line 1546 in H5F_open(): unable to open file: time = Tue Feb  5 04:45:52 2019
, name = '/vsimem/http_1/grid_1_2d.h5', tent_flags = 0

This is due to DODS driver is missing [6].

Background

docker-compose.yml specifies two services

  1. base
  2. core

base doesn't include gdal. core is our main target. Although geolambda's README says docker-compose run base, we should run core.

$docker-compose run core

GDAL provides AWS_S3_ENDPOINT option.

gdal.SetConfigOption(b'AWS_S3_ENDPOINT', AWS_S3_ENDPOINT.encode())
gdal.SetConfigOption(b'AWS_ACCESS_KEY_ID', AWS_ACCESS_KEY_ID.encode())
gdal.SetConfigOption(b'AWS_SECRET_ACCESS_KEY', AWS_SECRET_ACCESS_KEY.encode())
gdal.SetConfigOption(b'CPL_CURL_VERBOSE', b'YES')  
gdal.SetConfigOption(b'AWS_VIRTUAL_HOSTING', b'NO')

Use the following command to generate sessoin token:

$aws sts get-session-token --duration-seconds 129600


Solution

  1. Modify Dockerfile: HDF5 version to 1.10.4. GDAL version to 2.4.0.
  2. Modify lambda-package-base.sh: libhdf5.so.101 to 103
  3. docker-compose build (--no-cahce)
  4. docker-compose run core
  5. sftp lambda-deploy.zip to host system (e.g. #sftp hyoklee@nene) to save the lambda function.
  6. Unzip and add your lambda.py. Zip it again.
  7. Deploy to local stack.
  8. Check lambda functions executed: $docker ps -a
  9. Check output logs: $docker logs [container_id]

Experiment

  Geolambda's GDAL image could successfully run on LocalStack for a GEE sample GeoTIFF file https://gamma.hdfgroup.org/ftp/pub/outgoing/joe/gee/MOP03T.45.tif. /viscurl worked fine with gdal.Open().

Warning 1: GDAL was built against curl 7.53.1, but is running against 7.51.0. Runtime failure is likely !
[-9999.0, 444.1781311035156, -9000.95712966157, 3016.201309916306]
[DEBUG] 2019-01-17T03:38:47.852Z        228987c5-4707-4171-b0c5-da5e55b6f4fd    [-9999.0, 444.1781311035156, -9000.95712966157, 3016.201309916306]

END RequestId: 228987c5-4707-4171-b0c5-da5e55b6f4fd
REPORT RequestId: 228987c5-4707-4171-b0c5-da5e55b6f4fd Duration: 529 ms Billed Duration: 600 ms Memory Size: 1536 MB Max Memory Used: 41 MB

{"body": [-9999.0, 444.1781311035156, -9000.95712966157, 3016.201309916306], "statusCode": 200}


Performance

  The baseline is Goelambda Python on Mac OS X Docker.


References

  1. https://lists.osgeo.org/pipermail/gdal-dev/2013-May/036359.html
  2. https://www.gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_drivers (Notable exceptions are netCDF/HDF4/HDF5)
  3. https://github.com/OSGeo/gdal/pull/786 (Recent support of VSI on netCDF/HDF5)
  4. https://github.com/koordinates/gdal-vsis3-tests/blob/master/test_gdal_s3.py
  5. https://github.com/localstack/localstack/issues/55
  6. http://osgeo-org.1560.x6.nabble.com/gdal-dev-optimal-vsicurl-settings-for-merging-range-requests-td5389484.html
  7. https://hub.docker.com/r/cumuluss/cumulus-geolambda/


When you use docker for LocalStack executor, Python print() output goes to docker logs.