Convert a XLSX to Json using NodeJs and XLSX package

Question

Here is my full scenario: I have a static site hosted on S3. Its just an interface for uploading an file. The upload is done using API Gateway as an endpoint, that triggers an lambda function.

This lambda function will read the XLSX file, process it and upload the json to an S3 Bucket. The PUT event on the bucket, triggers another lambda function, which will read this json and send as messages to an SQS Queue. Then, as soon an message arrives it spins up an EC2 instance for processing.

The problem is: I don't know what is going on, but my output is getting totally scrambled. It is something like nothing is decoding the base64 after receiving it.

Here is an example of my output:

The API Gateway has been setup correctly, i'm pretty sure about that:

I've set the binary media type as application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Added body mapping templates

When testing sending a file trough Postman, it works without problems. Based on this i suppose the API has been setup fine and also the function, its something happening between the browser and the lambda function \ api call.

This is what my log on the lambda shows:

2024-09-03T22:13:31.337Z 2024-09-03T22:13:31.337Z a107932e-0bde-4b4c-ad8d-478625019503 INFO Processed data: [ [ { "numCpfCnpj": "Content-Disposition: form-data", "numAcordo": " name="file"" }

], null, ]....lots of binary data

This part specially:

"numCpfCnpj": "Content-Disposition: form-data",
"numAcordo": " name=\"file\"" }

Got my attention, because it looks like something is wrong with the decoding, since those headers are present where there should be data.

This is the code i'm using to Read the XLSX file:

import XLSX from 'xlsx';
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';

const s3Client = new S3Client({ region: process.env.AWS_REGION });

const processXlsxFile = (fileContent) => { // Corrected arrow function syntax and added closing brace
    const workbook = XLSX.read(fileContent, { type: 'buffer' });
    const sheet = workbook.Sheets[workbook.SheetNames[0]];

    // Convert all rows to JSON, treating every cell as text to preserve leading zeros
    const data = XLSX.utils.sheet_to_json(sheet, {
        header: 1, // Use the first row as the header
        raw: false // Ensure that all cells are treated as strings
    }).slice(1); // Skip the header row

    // Process the data and pad CPF/CNPJ numbers as needed
    return data.reduce((acc, row, index) => { // Corrected arrow function syntax
        // Skip empty or invalid rows
        if (!row[0] || !row[1]) return acc;

        // Determine if the number is CPF (11 digits) or CNPJ (14 digits)
        const numCpfCnpj = String(row[0]).length <= 11 
            ? String(row[0]).padStart(11, '0')  // Pad CPF to 11 digits
            : String(row[0]).padStart(14, '0'); // Pad CNPJ to 14 digits

        const groupIndex = Math.floor(index / 2);
        if (!acc[groupIndex]) acc[groupIndex] = [];
        acc[groupIndex].push({
            numCpfCnpj,
            numAcordo: String(row[1])
        });
        return acc;
    }, []); 
};

export const handler = async (event) => { // Corrected arrow function syntax and added closing brace
    try {
        console.log("Received event:", JSON.stringify(event, null, 2));
        
        if (event.body) {
            // Decode the base64-encoded file content
            const fileContent = Buffer.from(event.body, 'base64');
            console.log("File content received");

            // Process the Excel file content
            const processedData = processXlsxFile(fileContent);
            console.log("Processed data:", JSON.stringify(processedData, null, 2));

            // Flatten the processed data to a single array
            const flatData = processedData.flat();
            console.log("Flattened data:", JSON.stringify(flatData, null, 2));

            // Convert the result to a JSON string
            const jsonString = JSON.stringify(flatData);
            const jsonFileName = `converted_data_${Date.now()}.json`;
            const bucketName = process.env.BUCKET_NAME;

            // Upload the JSON to S3
            const putObjectParams = {
                Bucket: bucketName,
                Key: jsonFileName,
                Body: jsonString,
                ContentType: 'application/json'
            };

            const command = new PutObjectCommand(putObjectParams);
            await s3Client.send(command);

            console.log(`JSON file uploaded successfully to S3: ${jsonFileName}`);

            return {
                statusCode: 200,
                body: JSON.stringify({ message: 'File processed and JSON uploaded to S3 successfully.' }),
                headers: {
                    'Content-Type': 'application/json'
                }
            };
        } else {
            return {
                statusCode: 400,
                body: JSON.stringify({ message: 'No file uploaded' }),
                headers: {
                    'Content-Type': 'application/json'
                }
            };
        }
    } catch (error) {
        console.error("Error processing file:", error);
        return {
            statusCode: 500,
            body: JSON.stringify({
                message: 'Internal Server Error',
                error: error.message
            }),
            headers: {
                'Content-Type': 'application/json'
            }
        };
    }
};

Any input is appreciated.

Also, if anoyone got any questions, i'll be glad to reply as well.

Edit: HTML Code

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>File Processor</title>
    <link rel="stylesheet" href="styles/styles.css"> <!-- Link to external CSS file -->
</head>
<body>
    <div class="container">
        <h1>Process XLSX File</h1>
        <input type="text" id="txtInput" readonly placeholder="No file chosen">
        <input type="file" id="fileInput" style="display:none" accept=".xlsx">
        <button id="btnBrowse">Procurar arquivo</button>
        <button id="btnProcess">Processar</button>
        <div id="results"></div>
    </div>
    <script src="scripts/main.js"></script> <!-- Link to external JavaScript file -->
</body>
</html>

Js Code:

document.addEventListener('DOMContentLoaded', () => {
    const fileInput = document.getElementById('fileInput');
    const txtInput = document.getElementById('txtInput');
    const btnBrowse = document.getElementById('btnBrowse');
    const btnProcess = document.getElementById('btnProcess');
    const resultsDiv = document.getElementById('results');

    btnBrowse.addEventListener('click', () => {
        fileInput.click();
    });

    fileInput.addEventListener('change', () => {
        txtInput.value = fileInput.files[0] ? fileInput.files[0].name : 'No file chosen';
    });

    btnProcess.addEventListener('click', async () => {
        if (!fileInput.files.length) {
            alert('Please select a file first.');
            return;
        }

        const formData = new FormData();
        formData.append('file', fileInput.files[0]);

        try {
            const response = await fetch('https://5lvbhojaaf.execute-api.sa-east-1.amazonaws.com/conversion/readFile', {
                method: 'POST',
                body: formData
            });

            if (!response.ok) {
                throw new Error(`HTTP error! status: ${response.status}`);
            }

            const result = await response.json();

            // Format JSON output for better readability
            resultsDiv.innerHTML = `<pre>${JSON.stringify(result, null, 2)}</pre>`;
        } catch (error) {
            console.error('Error processing file:', error);
            resultsDiv.innerHTML = '<p>Error processing file. Check the console for details.</p>';
        }
    });
});

Postman endpoint: https://5lvbhojaaf.execute-api.sa-east-1.amazonaws.com/conversion/readFile

The weird characters appears after the function execution. On the CloudWatch logs i can see the data being received as base64, and after the function execution it becomes scrambled.

#1 Share the html/js code in the client which is in charge of upload the file and pass it to the backend endpoint #2 Share the postman or curl of success attempt #3 The martian chars in your screenshot are before, during or after the method processXlsxFile — JRichardsz
– JRichardsz, Commented Sep 4, 2024 at 4:46
I know that when doing it trough the browser, an error shows up in the console. But i have been ignoring it since the data is received by the function. Also, the weird characters appears after the function execution. On the CloudWatch logs i can see the data being received as base64, and after the function execution it becomes scrambled. I added the HTML and Js code. Endpoint: 5lvbhojaaf.execute-api.sa-east-1.amazonaws.com/conversion/… — Pablo Costa
– Pablo Costa, Commented Sep 4, 2024 at 12:18
Change the question to something like: Failed to upload file to aws lambda or something like that and removes the s3 part. Isolate the error — JRichardsz
– JRichardsz, Commented Sep 4, 2024 at 13:33

JRichardsz · Accepted Answer · 2024-09-04 13:31:43Z

Summary

Your client (html/js) is not encoding the file to base64 before the upload
Your lambda is expecting a base64 body

Advice

Try locally and debug (or print) line by line if your file is being sending as the backend expects
Fix this directly on aws will be a pain. Check this to try locally before the deploy on aws:
- How to test AWS Lambda handler locally using NodeJS?
If you want to avoid the multipart/form-data , convert the file to base64 at the client layer (html/js) and then send the content as a json , so in the lambda handler it will be easy to get the content
- Warning: If you file tends to increase its size, convert to base64 it would be a problem (ram, bandwidth, browser, etc). That's why the multipart/form-data content-type exist. Check this and this

#1 Base64

At the client (html/js) you are not encoding the file as base64, so at the lambda layer, this doesn't make sense:

const fileContent = Buffer.from(event.body, 'base64');

Maybe using the postman, you are sending the file as base64, but in your html/js you are not converting the file to base64. You are sending it as Content-Type: multipart/form-data

#2 Multipart Form Data

(1) Get the content from multipart/form-data is not easy compared when the content type is Content-Type: application/json

(2) Usually is sent the file name and file content (binary)

(3) Usually this content type has sections like, size, disposition, content, etc

Almost no body deal directly with this (in nodejs), that's why libraries in all languages help the final developer. For example multer library makes our live easy:

app.post('/upload', upload.single('file'), function(req, res) {
  const title = req.body.title;
  const file = req.file;
  //file is ready to use (pdf, xls, zip, images, etc)

References

#3 Multipart Form Data with Aws Lambda

event.body doesn't have the file ready to use. That's is another of your errors

A quick research, didn't get me some "easy" ways to deal with multipart/form-data using aws lambdas

Reviewing some libraries, I found the magic to get a file from lambda event variable:

module.exports.parse = (event, spotText) => {
    const boundary = getBoundary(event);
    const result = {};
    event.body
        .split(boundary)
        .forEach(item => {
            if (/filename=".+"/g.test(item)) {
                result[item.match(/name=".+";/g)[0].slice(6, -2)] = {
                    type: 'file',
                    filename: item.match(/filename=".+"/g)[0].slice(10, -1),
                    contentType: item.match(/Content-Type:\s.+/g)[0].slice(14),
                    content: spotText? Buffer.from(item.slice(item.search(/Content-Type:\s.+/g) + item.match(/Content-Type:\s.+/g)[0].length + 4, -4), 'binary'):
                        item.slice(item.search(/Content-Type:\s.+/g) + item.match(/Content-Type:\s.+/g)[0].length + 4, -4),
                };
            } else if (/name=".+"/g.test(item)){
                result[item.match(/name=".+"/g)[0].slice(6, -1)] = item.slice(item.search(/name=".+"/g) + item.match(/name=".+"/g)[0].length + 4, -4);
            }
        });
    return result;
};

This is already implemented in these libraries:

Choose one and try it. I'm sure that the file will be ready to be used in your

const workbook = XLSX.read(file, { type: 'buffer' });

I've marked it as the answer, because it is. For now i just changed to convert the data to base64 on the client side. Tested with a few sizes of spreadsheet, worked good. I am not going to have any spreadsheet larger than 1MB, so its good. However, i still want to implement the another solution. I just need it working before trying to improve. Also, i understood everything you told me about what was happening and i completely agree. I plan to try aws-lambda-multipart-parser in the future, i think it is going to be a proper solution.

Collectives™ on Stack Overflow

Convert a XLSX to Json using NodeJs and XLSX package

1 Answer 1

Summary

Advice

#1 Base64

#2 Multipart Form Data

#3 Multipart Form Data with Aws Lambda

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Summary

Advice

#1 Base64

#2 Multipart Form Data

#3 Multipart Form Data with Aws Lambda

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related