18

I'm storing some "unstructured" data (a keyed array) in one field of my table, and i'm currently using serialize() / unserialize() to "convert" back and forth from array to string.

Every now and then, however, I get errors when unserializing the data. I believe these errors happen because of Unicode data in the strings inside the array i'm serializing, although there are some records with Unicode data that work just fine. (DB field is UTF-8)

I'm wondering whether using json_encode instead of serialize will make a difference / make this more resilient. This is not trivial for me to test, since in my dev environment everything works well, but in production, every now and then (about 1% of records) I get an error.

Btw, I know i'm weaseling out of finding an actual explanation for the problem and just blindly trying something, I'm kind of hoping I can get rid of this without spending too much time on it.

Do you think using json_encode instead of serialize will make this more resilient to "serialization errors"? The data format does look more "forgiving" to me...

UPDATE: The actual error i'm getting is:

 Notice: unserialize(): Error at offset 401 of 569 bytes in C:\blah.php on line 20

Thanks! Daniel

16
  • Strikes me as quite an inefficient process to convert the string to/from an array/object every database access. Commented Mar 18, 2011 at 12:09
  • 1
    If it's UTF8 that causes problem with the unserialize(), that implies that you probably didn't set PHP's internal encoding to UTF8. I know this isn't direct answer to your question - json_encode() vs unserialize() but have you tried with mb_internal_encoding("UTF-8"); and then unserialize()? Commented Mar 18, 2011 at 12:12
  • 1
    The PHP serialize format is unimmunized against string length changes due to multibyte encoding variations. This could be that problem for charset bugs. With JSON you will likewise have to rely on a correct UTF-8 representation. So the resiliency advantage is mostly theoretical. -- Anyway, if this is a serious issue, but not debuggable, then use a binary field or base64/hex marshalling for the whole blob. (This could be undone in the DB if there is a need.) Commented Mar 18, 2011 at 12:34
  • 1
    @Daniel - at php.net/unserialize people left many useful comments for unserializing utf8 encoded data. You might want to try out their code before moving on to change of approach. Commented Mar 18, 2011 at 12:35
  • 1
    Can't tell without a hexdump. But if you get a corrupt UTF-8 sequence, then the DB might return it stripped or replaced with U+DCxx (don't know exactly). Then the serialize format internal strlen will be off, thus corrupting the whole blob. -- So JSON might work better, except that PHPs json_decode() as easily refuses to operate when encountering invalid UTF-8 or JS string escape sequences. -- Regarding base64 - there must certainly be stored procedures to decode it on-the-fly. Commented Mar 18, 2011 at 12:43

7 Answers 7

16

JSON has one main advantage :

  • compatibility with other languages than PHP.

PHP's serialize has one main advantage :

  • it's specifically designed to store PHP-based data -- most notably, it can store serialized objects, instance of classes, that will be re-instanciated to the right class-type when the string is unserialized.

(Yes, those advantages are the exact opposite of each other)


In your case, as you are storing data that's not really structured, both formats should work pretty well.

And the encoding problem you have should not be related to serialize by itself : as long as everything (DB, connection to the DB, PHP files, ...) is in UTF-8, serialization should work too.

Sign up to request clarification or add additional context in comments.

1 Comment

The connection to DB is UTF-8. The PHP file is too. I honestly don't know enough about how PHP handles UTF-8 to know where the problem could be. I don't even know whether it's related to UTF-8, but it's the main thing I can think of, since I understand that PHP's handling of it is not exactly stellar. Any ideas of what other problem I might be having? Thanks!
2

I think unless you absolutely need to preserve php specific types that json_encode() is the way to go for storing structured data in a single field in MySQL. Here's why:

https://dev.mysql.com/doc/refman/5.7/en/json.html

As of MySQL 5.7.8, MySQL supports a native JSON data type defined by RFC 7159 that enables efficient access to data in JSON (JavaScript Object Notation) documents

If you are using a version of MySQL that supports the new JSON data type you can benefit from that feature.

Another important point of consideration is the ability to perform changes on those JSON strings. Suppose you have a url stored in encoded strings all over your database. Wordpress users who've ever tried to migrate an existing database to a new domain name may sympathize here. If it's serialized, it's going to break things. If it's JSON you can simply run a query using REPLACE() and everything will be fine. Example:

$arr = ['url' => 'http://example.com'];
$ser = serialize($arr);
$jsn = json_encode($arr);

$ser = str_replace('http://','https://',$ser);
$jsn = str_replace('http://','https://',$jsn);

print_r(unserialize($ser));
PHP Notice:  unserialize(): Error at offset 39 of 43 bytes in /root/sandbox/encoding.php on line 10
print_r(json_decode($jsn,true));

Array ( [url] => https://example.com )

Comments

2

json_encode() converts non-ASCII characters and symbols (e.g., “Schrödinger” becomes “Schr\u00f6dinger”) but serialize() does not.

Source: https://www.toptal.com/php/10-most-common-mistakes-php-programmers-make#common-mistake-6--ignoring-unicodeutf-8-issues


To leave UTF-8 characters untouched, you can use the option JSON_UNESCAPED_UNICODE as of PHP 5.4.

Source: https://stackoverflow.com/a/804089/1438029

Comments

1

If the problem is (and I believe it is) in UTF-8 encoding, there is not difference between json_encode and serialize. Both will leave characters encoding unchanged.

You should make sure your database/connection is properly set up for handle all UTF-8 characters or encode whole record into supported encoding before inserting to the DB.

Also please specify what "I get an error" means.

1 Comment

@Col. Shrapnel: The OP did not provide enough information in time I wrote this post so believing was the only one option :)
1

Found this in the PHP docs...

function mb_unserialize($serial_str) { 
    $out = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $serial_str ); 
    return unserialize($out); 
} 

I don't quite understand it, but it worked to unserialize the data that I couldn't unserialize before. Moved to JSON now, i'll report in a couple of weeks whether this solved the problem of randomly getting some records "corrupted"

Comments

1

As I'm going through this I'll give my opinion, both serialize and json_encode are good for storing data in DB, but for those looking for performance, I've tested and I get these results, json_encode are a little microsegunds faster tham serialize, i used this script to calculate a the difference time.

$bounced =array();
for($i=count($bounced); $i<9999; ++$i)$bounced[$i]=$i;


$timeStart = microtime(true);
var_dump(serialize ($bounced));
unserialize(serialize ($bounced));
print timer_diff($timeStart) . " sec.\n";
$timeStart = microtime(true);
var_dump(json_encode ($bounced));
json_decode(json_encode ($bounced));
print timer_diff($timeStart) . " sec.\n";

function timer_diff($timeStart)
{
    return number_format(microtime(true) - $timeStart, 3);
}

Comments

0

As a design decision, I'd opt for storing JSON because it can only represent a data structure, whereas serialization is bound to a PHP data object signature.

The advantages I see are: * you are forced to separate the data storage from any logic layer on top. * you are independent from changes to the data object class (say, for example, that you want to add a field).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.