I have a project I'd like to run on AWS Lambda but it is exceeding the 50MB zipped limit. Right now it is at 128MB zipped and the project folder with the virtual environment sits at 623MB and includes (top users of space):
- scipy (~187MB)
- pandas (~108MB)
- numpy (~74.4MB)
- lambda_packages (~71.4MB)
Without the virtualenv the project is <2MB. The requirements.txt is:
click==6.7
cycler==0.10.0
ecdsa==0.13
Flask==0.12.2
Flask-Cors==3.0.3
future==0.16.0
itsdangerous==0.24
Jinja2==2.10
MarkupSafe==1.0
matplotlib==2.1.2
mpmath==1.0.0
numericalunits==1.19
numpy==1.14.0
pandas==0.22.0
pycryptodome==3.4.7
pyparsing==2.2.0
python-dateutil==2.6.1
python-dotenv==0.7.1
python-jose==2.0.2
pytz==2017.3
scipy==1.0.0
six==1.11.0
sympy==1.1.1
Werkzeug==0.14.1
xlrd==1.1.0
I deploy using Zappa, so my understanding of the whole infrastructure is limited. My understanding is that some (very few) of the libraries do not get uploaded so for e.g. numpy, that part does not get uploaded and Amazon's version gets used that is already available in that environment.
I propose the following workflow (without using S3 buckets for slim_handler):
- delete all the files that match "test_*.py" in all packages
- manually tree shake scipy as I only use
scipy.minimize, by deleting most of it and re-running my tests - minify all the code and obfuscate using
pyminifier - zappa deploy
Or:
- run
compileallto get .pyc files - delete all *.py files and let zappa upload .pyc files instead
- zappa deploy
I've had issues with slim_handler: true, either my connection drops and the upload fails or some other error occurs and at ~25% of the upload to S3 I get Could not connect to the endpoint URL. For the purposes of this question, I'd like to get the dependencies down to manageable levels.
Nevertheless, over half a gig of dependencies with the main app being less than 2MB has to be some sort of record.
My questions are:
- What is the unzipped limit for AWS? Is it 250MB or 500MB?
- Am I on the right track with the above method for reducing package sizes?
- Is it possible to go a step further and use .pyz files?
- Are there any standard utilities out there that help with the above?
- Is there no tree shaking library for python?
slim_handler: trueoption, and to get around the connectivity issues, I've bundled the whole thing on one of Amazon's VMs. Have not figured out how to slim down the whole project other than rewriting my code to not have the above as dependencies.