2

I have a function which cleans users input. After the clean input is returned, it goes through json_decode($var, true); Currently, I'm getting an error of malformed string. Though, if I print it out and test with it http://jsonlint.com/, it passes. I've come to realize that the string after the cleansing processes is 149chars long, and before, its 85. To fix this, I also ran it through a regex to remove special characters, but I'm thinking that may undo what the previous function did. Does the "new" function undo what filer_var does? Is this the best way to clean input? Below is my code:

#index.php
$cleanInput = cleanse->cleanInput($_POST);

#cleanse.php OLD
function cleanInput($input){
  foreach($input as $key => $value){
    $cleanInput[$key] = filter_var($value, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH));
  }

   return($cleanInput); //Returns 149char long string, visually 85chars
}


#cleanse.php NEW
function cleanInput($input){
  foreach($input as $key => $value){
    $cleanInput[$key] = preg_replace("[^+A-Za-z0-9]", "", filter_var($value, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH)));
  }

   return($cleanInput); //Returns 85char long string, visually 85chars
}

#outputs
  #Before
    {"name":"Pete Johnson","address":"123 main street","email":"[email protected]","password":"PA$$word"}

  #After
    {"name":"Pete Johnson","address":"123 main street","email":"[email protected]","password":"PA$$word"}
4
  • can we see the actual input at each step please? before cleaning, and after linting? Commented Mar 24, 2012 at 17:08
  • What are the original post values in your POST array?. On a side note have you tried HTMLPurifier? htmlpurifier.org That's normally what I use instead of the built-in filter functions. Commented Mar 24, 2012 at 17:31
  • Its being passed via ajax as a json string. So the value is #Before Commented Mar 24, 2012 at 17:45
  • Ok, thanks. That makes sense. Why don't you decode first and then filter? Commented Mar 24, 2012 at 17:47

1 Answer 1

3

The function call to filter_var($value, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH) creates an output like this: {"name":"Pete Johnson","address":"123 mainstreet","email":"[email protected]","password":"PA$$word"}

That is why json_decode does not work.

Like I said in the comments. Your best bet is to use json_decode on the input initially and then run through the individual elements with HTML_Purifier and or Zend_Validator or write your own to deal with individual fields. For example, email has different validation requirements than password.

EDIT:

I tried running through the new function, but I couldn't get it to work is. So I made a few adjustments to get it to work. Although I'm not sure if that was what you intended for your regex. Here is what I got as output from the this code:

$input = '{"name":"Pete Johnson","address":"123 main street","email":"[email protected]","password":"PA$$word"}';
$cleanedInput = preg_replace("/[^+A-Za-z0-9]/", "", filter_var($input, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH));
echo $cleanedInput;

Output: 34name3434PeteJohnson3434address3434123mainstreet3434email3434myemailgmailcom3434password3434PAword34

Sign up to request clarification or add additional context in comments.

2 Comments

How did you print the string with the html numbers?
I did "View Page Source" in Firefox to see if the values were being encoded. It should work in Chrome and other browsers. For some reason inspect element with Firebug didn't show the html encoded characters.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.