0

File 1:

asdffdsa

File 2:

asdfjklfdsaHGUik

How do I read these binary files with PHP such that I can populate an array with the plaintext like:

$file1_output = ["asdf", "fdsa"];
$file2_output = ["asdfjkl", "fdsaHGUik"];
2
  • So you want only latin letter words? Commented Mar 4, 2010 at 15:17
  • 1
    Depending on the size of the files, and how frequently you're executing this, rather than reinvent the wheel, you might want to just use the existing strings utility: en.wikipedia.org/wiki/Strings_(Unix) Commented Mar 4, 2010 at 15:43

3 Answers 3

1

This will match any word character (0-9,a-z,A-Z and _):

preg_match_all(
    "/[\x30-\x39\x5F\x41-\x5A\x61-\x7a]+/", /* regexp */
    file_get_contents('file1'),             /* file contents */
    $file1_output                           /* array to populate */
);
Sign up to request clarification or add additional context in comments.

Comments

0

Not sure if you could do this some better way, but maybe reading char-by-char from file and checking if it's ASCII code (using ord() function) is in range you're interested in - would also do the trick?

Comments

0

To build on what @Frank Farmer said, I'd use strings:

<?php

$strings_command = '/usr/bin/strings';

$file1_output = array();
$file2_output = array();

exec("$strings_command $path_to_file1",$file1_output);
exec("$strings_command $path_to_file2",$file2_output);

?>

1 Comment

Thanks for the answers. I ended up just spending 5 or 6 hours and cracked the format of the binary. strings is a very useful program I had not known about, though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.