1

I'm trying to speed up this this script for importing CSV file to MySQL. For 1000 rows it takes 130 seconds to load it. Tried to use it for 30 000 rows and after 20 minutes It timed out and loaded 8681 rows.

CSV header looks like (could be any number of columns in any order):

email;param1;..;paramX
[email protected];something;..;value 

MySQL create mail_queue:

CREATE TABLE IF NOT EXISTS `mail_queue` (
  `mail_queue_id` INT NOT NULL AUTO_INCREMENT,
  `mailer_batch_id` INT NOT NULL,
  `to` VARCHAR(100) NOT NULL,
  `priority` INT NOT NULL DEFAULT 0,
  `created` DATETIME NOT NULL DEFAULT NOW(),
  `mail_status_id` INT NOT NULL,
  PRIMARY KEY (`mail_queue_id`),
  INDEX `fk_mail_queue_mailer_batch1_idx` (`mailer_batch_id` ASC),
  INDEX `fk_mail_queue_mail_status1_idx` (`mail_status_id` ASC),
  CONSTRAINT `fk_mail_queue_mailer_batch1`
    FOREIGN KEY (`mailer_batch_id`)
    REFERENCES `mailer_batch` (`mailer_batch_id`)
    ON DELETE CASCADE
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_mail_queue_mail_status1`
    FOREIGN KEY (`mail_status_id`)
    REFERENCES `mail_status` (`mail_status_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

MySQL create mail_param:

CREATE TABLE IF NOT EXISTS `mail_param` (
 `mail_param_id` INT NOT NULL AUTO_INCREMENT,
 `mail_queue_id` INT NOT NULL,
 `param_key` VARCHAR(45) NOT NULL,
 `param_value` VARCHAR(45) NOT NULL,
 PRIMARY KEY (`mail_param_id`),
 INDEX `fk_mail_param_mail_queue1_idx` (`mail_queue_id` ASC),
  CONSTRAINT `fk_mail_param_mail_queue1`
  FOREIGN KEY (`mail_queue_id`)
  REFERENCES `mail_queue` (`mail_queue_id`)
  ON DELETE CASCADE
  ON UPDATE NO ACTION)
 ENGINE = InnoDB;

Code (Zend framework). Works well, but slow:

    if (($handle = fopen($this->filepath, 'r')) !== false)
    {
        // DB
        $mailQueueTable = new Application_Model_DbTable_MailQueue();
        $mailParamTable = new Application_Model_DbTable_MailParam();

        // Get header
        $header = \ForceUTF8\Encoding::toUTF8(fgetcsv($handle, 0, ';'));

        while(($data = fgetcsv($handle, 0, ';')) !== false)
        {
            // Save e-mail to e-mail queue 
            $mailQueueRow = $mailQueueTable->createRow();
            $mailQueueRow->mailer_batch_id = $mailerBatchId;
            $mailQueueRow->to = $data[$this->emailColumn];
            $mailQueueRow->priority = 0;
            $mailQueueRow->created = $created->toString('yyyy-MM-dd HH:mm:ss');
            $mailQueueRow->mail_status_id = 1;
            $mailQueueId = $mailQueueRow->save();

            // Save e-mail params
            foreach ($data as $key => $value) {
                $mailParamRow = $mailParamTable->createRow();
                $mailParamRow->mail_queue_id = $mailQueueId;
                $mailParamRow->param_key = $header[$key];
                $mailParamRow->param_value = \ForceUTF8\Encoding::toUTF8($value);
                $mailParamRow->save();
            }

            unset($data);
        }
        fclose($handle);

    }

I experimented with LOAD DATA INTO, but I can't use it because of mail_param table structure.

1) Create temporary table (OK)

    $columns = "";
    foreach ($this->header as $item) {
        if ($columns == "") {
            $columns = "`" . $item . "` VARCHAR(45)";
        } else {
            $columns .= ", `" . $item . "` VARCHAR(45)";
        }
    }

    $query = 'CREATE TEMPORARY TABLE `tmp_csv_import` (
    `id` int AUTO_INCREMENT,
    ' . $columns . '
    ) ENGINE MyISAM;';   

2) LOAD DATA INFILE (OK)

    $query = "LOAD DATA INFILE '" . $this->filepath . "'
    INTO TABLE `tmp_csv_import`
    FIELDS TERMINATED BY ';'
    ENCLOSED BY '\"'
    LINES TERMINATED BY '\n'
    IGNORE 1 ROWS;";

3) Insert into mail_queue (OK)

// $mailerBatchId from last_insert_id()
$query = "INSERT INTO `mail_queue` (`mailer_batch_id`, `to`, `priority`, `created`, `mail_status_id`)
SELECT " . $mailerBatchId . ", `email`, 0, NOW(), 1 FROM `tmp_csv_import`";

4) Insert into mail_param (???)

I don't know what to write here. I need to insert new row for each column in table tmp_csv_import. And I need to get mail_queue_id - foreign key for table mail_param.

$query = "INSERT INTO mail_param (mail_queue_id, param_key, param_value) 
SELECT ??? FROM `tmp_csv_import`";

Is it possible to do this in MySQL? Or should I do this different way?

1
  • I didnt understand where $mailerBatchId comes from. Even with you comment // $mailerBatchId from last_insert_id(). It will be more easy then to use id from tmp_csv_import as mailer_batch_id and mail_queue_id. It will be correct for you task? Commented Aug 18, 2014 at 14:14

1 Answer 1

1

I found solution for my problem. There is no need to use a temporary table.

1) Data to "mail_queue" will be loaded using this code:

$query = "LOAD DATA INFILE '" . $this->filepath . "'
INTO TABLE `mail_queue`
FIELDS TERMINATED BY ';'
ENCLOSED BY '\"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(@dummy, @dummy, email, @dummy)
SET `mailer_batch_id` = " . (int) $mailerBatchId . ",
    `priority` = 0,
    `created` = NOW(),
    `mail_status_id` = 1;";

For import only one row, will be generated by header

(@dummy, @dummy, email, @dummy)

Other values will be set by SET.

2) I will select "mail_queue_id" from inserted values and insert it into array this style:

array('to' => 'mail_queue_id');

3) I will create temporary CSV file from source file. Structure:

mail_queue_id;key;value

4) Data will be loaded into "mail_param" table:

$query = "LOAD DATA INFILE " . $tmpFilepath . "
    INTO TABLE `mail_param`
    FIELDS TERMINATED BY ';'
    ENCLOSED BY '\"'
    LINES TERMINATED BY '\n'
    (`mail_queue_id`, `param_key`, `param_value`);";

5)

unlink($tmpFilepath)

6) It's done. I tried to load CSV with 30 000 rows and it is significantly faster. (< 1s).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.