4

I have 10 files containing anywhere from 50 to 250 rows each.

I need to be able to pull 1 or more random rows from different files.

I currently do this...

$lines = file($filePath);
if ($number == "1") {
    return $lines[array_rand($lines)];
}
else {
    shuffle($lines);
    return array_slice($lines, 0, $number);
}

However, I just read of a way to do the same thing using MySQL here:

https://stackoverflow.com/a/4329447/390480

Would I really have that much of a performance gain moving these 10 files to a MySQL database and doing that query instead?

Thanks!

2
  • what is the average row size/length? Commented Mar 19, 2013 at 12:38
  • 1
    Anywhere between about 3 to 40 characters. Commented Mar 19, 2013 at 20:02

2 Answers 2

2

From my experience, when you don't have too much information, it's quicker to retrieve info from a file than from a database. In my case I have files where I store arrays with types of houses for example, and it was faster to get it from files. The files were about 10kb only, and the performance was 10 times faster, maybe I'm wrong with the performance improvement but definetively it was fast enough to make me delete those tables containing those rows and use the text files xD.

The best way to ensure the performance is to make use of the function microtime() and do it both ways. Then you can see the performance through your own benchmark.

I use this a lot:

$start = microtime(true);

// my code;
// If it's not a very long script, you should always put it inside a loop, let's say thousands times or more, depending on the script, because sometimes just the resources used by the system could vary the benchmark, as suggested by @Colin Morelli .

echo microtime(true) - $start;

You get the output of the microseconds elapsed.

Sign up to request clarification or add additional context in comments.

3 Comments

Just to comment: when benchmarking you should almost always run a loop multiple times. There are too many variables that can happen during one check. For example, what if something on your computer just slams your CPU during that portion of the benchmark? It's generally best to run the loop at least a few hundred times (significantly more for simple tasks) and then take the average execution across all runs
Yes I know, actually it's on the comment that I usually use loops. The project I'm working on uses a lot of resources and takes up to 5 mins to run the script, that's why on the long ones I don't use the loops :P
I know, but you said that "sometimes if it's a very simple task." I just commented to say that you should really always loop your benchmarks. I have yet to see a benchmark-able scenario that takes anything more than a second. Remember, when you benchmark, you're not benchmarking the entire script (talk about variables). You're benchmarking one specific piece of it, to see how it performs.
0

Making it inside database should be faster than reading whole file and shuffling it. But you have to insert this data to database which will generate additional workload. You should check it manually, which option is faster.

BUT remember about one more thing. When you will have large files, then memory usage can be huge, so you should think about it too.

3 Comments

It's not true that it "should be faster". Databases read from disk too and there may be a lot more overhead to get the data. Also ORDER BY RAND() which is what is required here is slow. It may be faster, it may not.
Yes, that's true, but database is very good optimized. There is also RAND(), which generates temp table, but RAND on table is faster than shuffle on php array.
No, in the real world RAND is not faster than shuffle. PHP can shuffle and slice 30,000 ids in 0.003 seconds. Show me a realistically priced database server that can outperform that with RAND() and LIMIT. Read the comments on this answer. stackoverflow.com/questions/15435560/get-10-rows-from-a-query/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.