1

I am working on an AWS Lambda function for an end-to-end data engineering project involving YouTube data analysis. The function is designed to read JSON data from an S3 bucket, process it, and write the results back to another S3 bucket using AWS Glue.

I have set up the environment variables, and the S3 buckets are created. However, when I test the Lambda function using the S3-put option, the execution result is not as expected. The response I receive is:

{
  "statusCode": 200,
  "body": "\"Hello from Lambda!\""
}

This is not the expected result, and I suspect there might be an issue with my Lambda function. I have verified the environment variables, IAM permissions, and the input event JSON. Could someone please review my Lambda function code and help me identify the issue?

import awswrangler as wr
import pandas as pd
import urllib.parse
import os

os_input_s3_cleansed_layer = os.environ['s3_cleansed_layer']
os_input_glue_catalog_db_name = os.environ['glue_catalog_db_name']
os_input_glue_catalog_table_name = os.environ['glue_catalog_table_name']
os_input_write_data_operation = os.environ['write_data_operation']


def lambda_handler(event, context):
    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    try:

        # Creating DF from content
        df_raw = wr.s3.read_json('s3://{}/{}'.format(bucket, key))

        # Extract required columns:
        df_step_1 = pd.json_normalize(df_raw['items'])

        # Write to S3
        wr_response = wr.s3.to_parquet(
            df=df_step_1,
            path=os_input_s3_cleansed_layer,
            dataset=True,
            database=os_input_glue_catalog_db_name,
            table=os_input_glue_catalog_table_name,
            mode=os_input_write_data_operation
        )

        return wr_response
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

I have also updated the except block to provide more detailed error messages, and the logs show

Function Logs
START RequestId: 0589a447-4f50-4dc9-b58e-9d0c7ff1b2de Version: $LATEST
END RequestId: 0589a447-4f50-4dc9-b58e-9d0c7ff1b2de
REPORT RequestId: 0589a447-4f50-4dc9-b58e-9d0c7ff1b2de  Duration: 1.24 ms   Billed Duration: 2 ms   Memory Size: 128 MB Max Memory Used: 39 MB
3
  • 1
    It looks like you started with a template Lambda function and then modified it. You need to save and deploy the changes. Commented Jan 14, 2024 at 14:47
  • 1
    "write the results back to another S3 bucket using AWS Glue" - why the need to use GLUE when you are already using a Lambda function. Much better to develop the app logic to perform all tasks in your Lambda function. Commented Jan 14, 2024 at 15:48
  • @jarmod It worked! thank you so much! I'm sorry I'm totally a beginner at this stuff thats why i was stuck at this silly mistake! Commented Jan 14, 2024 at 19:48

1 Answer 1

1

If you are using the AWS Lambda console to write/edit your Lambda function code, then you need to deploy any changes that you have made before you can run them.

When there are undeployed changes, the Deploy button will be available to press.

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.