0

This must've been asked before, but I couldn't find it.

I want to allow my users to enter text into an HTML form, and later to display that text in the webpage, exactly as it was written, avoiding:

  1. XSS attacks
  2. Encoded punctuation being displayed (e.g. %2C instead of a comma, + instead of a space)
  3. Unexpected results due to < or > being used and the browser treating it as part of the HTML

The form enctype is the default application/x-www-form-urlencoded. I'm not sure if I really need this enctype, but for various reasons I'm sticking with it for now.

I've seen that I can partially fix (2) by using decodeURI or decodeURIComponent, although it doesn't convert + back to space.

For the rest, isn't there also a built-in function I can use? The only libraries I found were server-side ones for .NET or Java, I didn't find anything for doing it client-side in Javascript, but I found plenty of stern warnings that if you roll your own code, you'll probably make subtle mistakes.

For now I'm using the myDecode function below, which seems to work, but I can't believe it's the best way.

function myDecode(string) {
    // First convert + to space, since decodeURIComponent may introduce new + characters that were previously encoded
    // Then use decodeURIComponent to convert all other punctuation
    // Then escape HTML special characters
    return htmlEscape( decodeURIComponent( string.replace(/\+/g, " ") ) );                                                                        
}

function htmlEscape(string) {
    return string.replace( /&/g, "&amp;")  // remember to do & first, otherwise you'll mess up the subsequent escaping!
                 .replace( /</g, "&lt;" )
                 .replace( />/g, "&gt;" )
                 .replace( /\"/g, "&quot;" )
                 .replace( /\'/g, "&#39" );
}

My test is that the user can enter the below text and have it displayed as-written, without any changes and without running the script:

<script>alert( "Gotcha! + & + " );</script>

But I don't know if that's a strong enough test.

This is just a small hobby project with no sensitive data and very few users, so it doesn't have to be totally bullet proof. But it would be nice to know how to do things the right way.

2
  • Assign to .innerText rather than .innerHTML, then you don't need to use entities or worry about HTML being rendered. Commented Apr 16, 2020 at 16:41
  • @Barmar the user-entered string is placed inside some generated HTML, so that solution doesn't work. Do you suggest I do one pass setting innerHTML with trusted strings, and then another setting the innerText of some generated elements using untrusted strings? Commented Apr 27, 2020 at 16:24

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.