4

Say I have an array of strings in a php array called $foo with a few hundred entries, and I have a MySQL table 'people' that has a field named 'name' with a few thousand entries. What is an efficient way to find out which strings in $foo aren't a 'name' in an entry in 'people' without submitting a query for every string in $foo?

So I want to find out what strings in $foo have not already been entered in 'people.'

Note that it is clear that all of the data will have to be on one box at one point. The goal would be doing this at the same time minimizing the number of queries and the amount of php processing.

7 Answers 7

1

I'd put your $foo data in another table and do a LEFT OUTER JOIN with your names table. Otherwise, there aren't a lot of great ways to do this that don't involve iteration at some point.

Sign up to request clarification or add additional context in comments.

1 Comment

how would you use the LEFT JOIN to return the $foo date not in the names table?
1

The best I can come up with without using a temporary table is:

 $list = join(",", $foo);

// fetch all rows of the result of 
// "SELECT name FROM people WHERE name IN($list)" 
// into an array $result

$missing_names = array_diff($foo, $result);

Note that if $foo contains user input it would have to be escaped first.

2 Comments

Ack! No placeholders! Not quoted! No escapes!
Well, commented lines are obviously pseudo-code. And I mentioned the lack of escapes, did I not? Not quoting was an omission though, my bad.
1

What about the following:

  1. Get the list of names that are already in the db, using something like: SELECT name FROM people WHERE name IN (imploded list of names)
  2. Insert each item from the return of array_diff()

If you want to do it completely in SQL:

  1. Create a temp table with every name in the PHP array.
  2. Perform a query to populate a second temp table that will only include the new names.
  3. Do an INSERT ... SELECT from the second temp table into the people table.

Neither will be terribly fast, although the second option might be slightly faster.

Comments

0
CREATE TEMPORARY TABLE PhpArray (name varchar(50));

-- you can probably do this more efficiently
INSERT INTO PhpArray VALUES ($foo[0]), ($foo[1]), ...;

SELECT People.*
FROM People
 LEFT OUTER JOIN PhpArray USING (name)
WHERE PhpArray.name IS NULL;

Comments

0

For a few hundred entries, just use array_diff() or array_diff_assoc()

Comments

0
$query = 'SELECT name FROM table WHERE name != '.implode(' OR name != '. $foo);

Yeash, that doesn't look like it would scale well at all.

1 Comment

That should be "AND", not "OR".
0

I'm not sure there is a more efficient way to do this other than to submit all the strings to the database.

Basically there are two options: get a list of all the strings in MySQL and pull them into PHP and do the comparisons, or send the list of all the strings to the MySQL server and let it do the comparisons. MySQL is going to do the comparisons much faster than PHP, unless the list in the database is a great deal smaller than the list in PHP.

You can either create a temporary table, but either way your pushing all the data to the database.

2 Comments

Can you give an example of what the long select statement might look like?
On further reflection, a long select probably will not work, you'll need the temp table idea accepted as the right answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.