There is always debate over what is faster so I thought I'd run some tests using different methods.
Tests Run:
- strpos
- preg_match with foreach loop
- preg_match with regex or
- indexed search with string to explode
- indexed search as array (string already exploded)
Two sets of tests where run. One on a large text document (114,350 words) and one on a small text document (120 words). Within each set, all tests were run 100 times and then an average was taken. Tests did not ignore case, which doing so would have made them all faster. Test for which the index was searched were pre-indexed. I wrote the code for indexing myself, and I'm sure it was less efficient, but indexing for the large file took 17.92 seconds and for the small file it took 0.001 seconds.
Terms searched for included: gazerbeam (NOT found in the document), legally (found in the document), and target (NOT found in the document).
Results in seconds to complete a single test, sorted by speed:
Large File:
- 0.0000455808639526 (index without explode)
- 0.0009979915618897 (preg_match using regex or)
- 0.0011657214164734 (strpos)
- 0.0023632574081421 (preg_match using foreach loop)
- 0.0051533532142639 (index with explode)
Small File
- 0.000003724098205566 (strpos)
- 0.000005958080291748 (preg_match using regex or)
- 0.000012607574462891 (preg_match using foreach loop)
- 0.000021204948425293 (index without explode)
- 0.000060625076293945 (index with explode)
Notice that strpos is faster than preg_match (using regex or) for small files, but slower for large files. Other factors, such as the number of search terms will of course affect this.
Algorithms Used:
//strpos
$str = file_get_contents('text.txt');
$t = microtime(true);
foreach ($search as $word) if (strpos($str, $word)) break;
$strpos += microtime(true) - $t;
//preg_match
$str = file_get_contents('text.txt');
$t = microtime(true);
foreach ($search as $word) if (preg_match('/' . preg_quote($word) . '/', $str)) break;
$pregmatch += microtime(true) - $t;
//preg_match (regex or)
$str = file_get_contents('text.txt');
$orstr = preg_quote(implode('|', $search));
$t = microtime(true);
if preg_match('/' . $orstr . '/', $str) {};
$pregmatchor += microtime(true) - $t;
//index with explode
$str = file_get_contents('textindex.txt');
$t = microtime(true);
$ar = explode(" ", $str);
foreach ($search as $word) {
$start = 0;
$end = count($ar);
do {
$diff = $end - $start;
$pos = floor($diff / 2) + $start;
$temp = $ar[$pos];
if ($word < $temp) {
$end = $pos;
} elseif ($word > $temp) {
$start = $pos + 1;
} elseif ($temp == $word) {
$found = 'true';
break;
}
} while ($diff > 0);
}
$indexwith += microtime(true) - $t;
//index without explode (already in array)
$str = file_get_contents('textindex.txt');
$found = 'false';
$ar = explode(" ", $str);
$t = microtime(true);
foreach ($search as $word) {
$start = 0;
$end = count($ar);
do {
$diff = $end - $start;
$pos = floor($diff / 2) + $start;
$temp = $ar[$pos];
if ($word < $temp) {
$end = $pos;
} elseif ($word > $temp) {
$start = $pos + 1;
} elseif ($temp == $word) {
$found = 'true';
break;
}
} while ($diff > 0);
}
$indexwithout += microtime(true) - $t;
strpos().