7

I have an array with 100,000 users personal info in (ID, name, email etc). I need to loop through each row of the array and insert a mysql record to a table based on the row data. My problem is that I am running out of memory after about 70,000 rows.

My code:

if(!empty($users)){
    $c = 0;
        foreach($users as $user){

            $message = // Some code to create custom email
            queue_mail_to_send($user->user_email, $subject, $message, $db_options, $mail_options, $mail_queue);
        }
}

Background:

I am building an email system which sends out an email to the users of my site. The code above is looping through the array of users and executing the function 'queue_mail_to_send' which inserts a mysql row into a email queue table. (I am using a PEAR library to stagger the email sending)

Question:

I know that I am simply exhausting the memory here by trying to do too much in one execution. So does anybody know a better approach to this rather than trying to execute everything in one big loop?

Thanks

8
  • 1
    Sure. Execute everything in many smaller loops, say up to 1K users each time. Commented Apr 15, 2014 at 9:58
  • 1
    If your are loading user details in $user array/object then why not do it directly in one sql statement like INSERT INTO table_name (COL1, Col2,...) SELECT COL1, COL2 FROM other_table; Commented Apr 15, 2014 at 10:02
  • As @Jon suggested used smaller loop something as limit 0,1000 in the first loop, then store num 1000 in a temp table then on next loop 1000,2000 and so on !! Commented Apr 15, 2014 at 10:02
  • I would have to create some kind of trigger system that would execute a new php script once the first 1000 had been processed and send it some header vars to track where it's up to and when it will be finished etc. Would this be the easiest/correct way to do it? Commented Apr 15, 2014 at 10:04
  • I can see elastic work allocation being useful here. Commented Apr 15, 2014 at 10:05

5 Answers 5

3

I think reducing the payload of the script will be cumbersome and will not give you a satisfying result. If you have any possibility to do so, I would advise you to log which rows you have processed already, and have a script run the next x rows. If you can use a cronjob, you can stage a mail, and let the cronjob add mails to the queue every 5 minutes, until all users are processed.

The easiest way would be to store somewhere, the highest user id you have processed. I would not advise you to store the number of users, because in between batches a user can be added or removed, resulting in users not receiving the e-mail. But if you order by user id (assuming you use an auto-incrementing column for the id!), you can be sure every user gets processed.

So your user query would be something like:

SELECT * FROM users WHERE user_id > [highest_processed_user_id] ORDER BY user_id LIMIT 1000

Then process your loop, and store the last user id:

if(!empty($users)) {
    $last_processed_id = null;
    foreach($users as $user) {
        $message = // Message creation magic
        queue_mail_to_send( /** parameters **/ );
        $last_processed_id = $user->id;
    }

    // batch done! store processed user id
    $query = 'UPDATE mail_table SET last_processed_user_id = '. $last_processed_id; // please use parameterized statements here
    // execute the query
}

And on the next execution, do it again until all users have received the mail.

Sign up to request clarification or add additional context in comments.

1 Comment

I thought there may be some method of garbage collection within the main loop that would enable me to do it all in one script, but it seems not and that this is the most solid approach. Thanks for taking the time to write this.
3

I have exactly same problem with you. Anyway the answer from @giorgio is the best solutions.

But like java or python, we have "yield" in php. @see [here] (http://php.net/manual/en/language.generators.syntax.php)

Here is my sample code, my case is 50.000 records. and I also test successfully with 370.000 records. But it takes times.

$items = CustomerService::findAll();
        foreach ($items AS $item)
        {
            yield (new self())->loadFromResource($item);
        }

1 Comment

This is supposed too be the accepted answer.
0

You may split that operation in multiple operations, seperated in time. For instance, only allow your routine to process 40 emails per minute, or maybe use an array of an array, to create "pages" of records (use sql LIMIT function). And set the arrays of array to null and unset it, when you no longer need that information.

Comments

-1

I think you can use MySQL IN clause rather then doing foreach for every user.

Like user_ids = array (1,2,3,4); // Do something WHERE user_id IN ($user_ids);

and of sending mails you can user PHPMailer class by supplying comma separated email addresses in $to.

1 Comment

Thanks for the comment but I doubt this would work - as I'd still have to do the loop to build the custom $message and add the custom row with queue_mail_to_send :)
-1

USE just one query like:

INSERT INTO table_name (COL1, Col2,...) SELECT COL1, COL2 FROM other_table;

6 Comments

how is this gonna help him reducing the memory?
Again, I doubt this would work as I am inserting unique data into each row based on the retrieved row I am processing.
It will process unique data check mysql documentation. and @giorgio This will reduce the memory issue because everything will be done within mysql not in php.
yes but he needs unique data, based on calculations retrieved from the user table, so only mysql doesn't suffice...
@giorgio Will I asked him before and yes he can filter the user data within the SQL statement there. If I am missing something or it is not possible like that I would like to know :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.