2

I have a php script which I use to make about 1 mil. requests every day to a specific web service.

The problem is that in a "normal" workflow the script is working almost the whole day to complete the job . Therefore I've worked on an additional component. Basically I developed a script which access the main script using multi-curl GET request to generates some random tempid for each 500 records and finally makes another multi-curl request using POST with all the generated tempids. However I don't feel this is the right way so I would like some advice/solutions to add multithreading capabilities to the main script without to use additional /external applications (e.g the curl script that I'm currently using). Here is the main script : http://pastebin.com/rUQ6pwGS

2 Answers 2

1

If you want to do it right you should install a message queue. My preference goes out to redis because it is a "data structure server since keys can contain strings, hashes, lists, sets and sorted sets". Also redis is extremely fast.

Using the blpop(spawning a couple of worker threads using php <yourscript> to process work concurrently) to listen for new messages(work) and rpush to push new messages onto the queue. Spawning processes is expensive(relative) and when using a message queue this has to be done only once when the process is created.

I would go for phpredis if you could(need to be to recompile PHP) because it is an extension written in C and therefor going to be a lot faster than the pure PHP clients. Else PRedis is also pretty mature library you could use.

You could also use this brpop/rpush as some sort of lock(if you need to). This is because:

Multiple clients can block for the same key. They are put into a queue, so the first to be served will be the one that started to wait earlier, in a first-BLPOP first-served fashion.

I would advise you to have a look at Simon's redis tutorial to get an impression of the sheer power that redis has to offer.

Sign up to request clarification or add additional context in comments.

Comments

1

This is background process, correct? In which case, you should not run it via a web server. Run it from the command-line, either as a daemon or as a cron job.

My preference is a "cron" job because you get automatic restart for free. Be sure that you don't have more instances of the program running than desired (You can achieve this by locking a file in the filesystem, doing something atomic in a database etc).

Then you just need to start the number of processes you want, and have them read work from a queue.

Normally the pattern for doing this is having a table containing columns to store who is currently excuting a given task:

CREATE TABLE sometasks (
   ID of some kind,
   Other info required to do task,
   some data we need to know if the task is due yet or complete,
   locked_by_host VARCHAR(64) NULL,
   locked_by_pid INT NULL
)

Then the process will do the following pseduo-query to lock a set of tasks (batch_size is how many per batch, can be 1)

UPDATE sometasks SET locked_by_host=my_hostname, locked_by_pid=my_pid 
  WHERE not_done_already AND locked_by_host IS NULL ORDER BY ID LIMIT batch_size

Then select the rows back out using a select to find the current process's tasks. Then process the tasks, and update them as being "done" and clear out the lock.

I'd opt for a cron job with a controller process which starts up N child processes and monitors them. The child processes could periodically die (remember PHP does not have good GC, so it can easily leak memory) and be respawned to prevent resource leaks.

If the work is all done, the parent could quit, and wait to be respawned by cron (the next hour or something).

NB: locked_by_host can store the host name (pids aren't unique in different hosts) to allow for distributed processing, but maybe you don't need that, so you can omit it.

You can make this design more robust by putting a locked_time column and detecting when a task has been taking too long - you can alert, kill the process, and try again or something.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.