I'm building a script which requires counting the number of occurances of each word in each file, out of about 2000 files, each being around 500KB.
So that is 1GB of data, but MySQL usage goes over 30+ GB (then it runs out and ends).
I've tracked down the cause of this to my liberal use of associative arrays, which looks like this:
for($runc=0; $runc<$numwords; $runc++)
{
$word=trim($content[$runc]);
if ($words[$run][$word]==$wordacceptance && !$wordused[$word])
{
$wordlist[$onword]=$word;
$onword++;
$wordused[$word]=true;
}
$words[$run][$word]++; // +1 to number of occurances of this word in current category
$nwords[$run]++;
}
$run is the current category.
You can see that to count the words's I'm just adding them to the associative array $words[$run][$word]. Which increases with each occurance of each word in each category of files.
Then $wordused[$word] is used to make sure that a word doesn't get added twice to the wordlist.
$wordlist is a simple array (0,1,2,3,etc.) with a list of all different words used.
This eats up gigantic amounts of memory. Is there a more efficient way of doing this? I was considering of using a MySQL memory table, but I want to do the whole thing in PHP so it's fast and portable.