-2

I need to parse some CSV data and store it to the database. The input file is about 15000 rows. It is written in Laravel framewrok. The request to DB spend about 0.2s so it seems the problem is in CSV parser. Can somebody tell me how to optimize this code in PHP. The code looks like this:

protected function csv2array(string $csv)
{
    try {
        $return = [
            'headers' => [],
            'rows' => [],
        ];
        $rows = explode(PHP_EOL, $csv);

        $headers = str_getcsv(strtolower(array_shift($rows)));  // Headers + strtolower()
        $return['headers'] = $headers;

        foreach ($rows as $row) {
            $items = str_getcsv($row);
            if ( count($items) !== count($headers) ) continue;

            $items[2] = new Carbon($items[2]);  // Third item is UTC datetime
            $items[3] = new Carbon($items[3]);  // Fourth item is UTC datetime

            $items = array_combine($headers, $items);

            $return['rows'][] = $items;
        }

        return $return;
    } catch (Exception $e) {
        Log::error($e->getMessage());
        throw $e;
    }
}

The parent code which call csv2array() looks like this

$csv = $request->getContent();
$csv = trim($csv);
$csvArray = $this->csv2array($csv);

$insertArray = $this->addDeviceIdToArray($csvArray['rows'], $device);  // This is fast 0.2s
11
  • 3
    Do you also know what part of csvToArray is slow? That would be your next check. Is one specific thing slow, of just the amount of rows to process? Commented Jan 16, 2024 at 12:12
  • 4
    $rows = explode(PHP_EOL, $csv); will make it keep the complete file content, in array form, in memory. Using fgetcsv, which reads line by line from a file, might perform better. Looks like you don't have a file here, but get the CSV content posted to your app directly? Then you could perhaps either use php://input to access it as a readable stream (depends on what exactly you are sending), or write the data to a temporary file first. Commented Jan 16, 2024 at 12:13
  • @CBroe yes it is POST input. So you thing the explode is the problem. I will try to save it to the file and read via fgetcsv. Commented Jan 16, 2024 at 12:21
  • 2
    @CBroe you are alluding at the memory usage while the OP evidently is about CPU. Though the OP failed completely to explain the actual problem: what time the whole parsing takes and why it's a concern. Which makes this question ultimately off topic. Commented Jan 16, 2024 at 12:30
  • 1
    it seems the most time consuming part is new carbon. but still, for me it's just same 0.2s for the whole 15k lines. The problem is elsewhere Commented Jan 16, 2024 at 13:02

1 Answer 1

0

My guess would be that your current setup load everything in one go. You have too many rows for that to be an effective route. You have the 'easy fast' solution, those have limitations.

IMO the most likely issue is the amount of data. You load 15000 lines into $csv and then copy that into a huge array. PHP isnt really that good with info like that.

My suggestion is to use fgetcsv:

file_put_contents("test.csv", $request->getContent()); // write to file
protected function csvToArray(){
    $handle = fopen("test.csv", "r");
    $i = 0;
    while (($items = fgetcsv($handle, null, ",")) !== false) {
        $i++;
        if($i === 1){
            $headers = $items;
            continue;
        }

        $items[2] = new Carbon($items[2]);  // Third item is UTC datetime
        $items[3] = new Carbon($items[3]);  // Fourth item is UTC datetime

        $items = array_combine($headers, $items);
        $return['rows'][] = $items;
    }

    return $return;
}

file_put_contents("test.csv", $request->getContent()); // write to file

$csvArray = $this->csvToArray();

That way you can stream from a file which is a lot less memory intensive and doesnt scal with filesize.
(Please note that this is demo code, there is room for improvement)

Sign up to request clarification or add additional context in comments.

2 Comments

Guesswork and conjectures are not good answers for Stack Overflow. Your suggestion won't improve the overall performance of this code.
Why you thing the fgetcsv() do not improve the performance?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.