135

How does one encode query parameters to go on a url in Java? I know, this seems like an obvious and already asked question.

There are two subtleties I'm not sure of:

  1. Should spaces be encoded on the url as "+" or as "%20"? In chrome if I type in "http://google.com/foo=?bar me" chrome changes it to be encoded with %20
  2. Is it necessary/correct to encode colons ":" as %3B? Chrome doesn't.

Notes:

  • java.net.URLEncoder.encode doesn't seem to work, it seems to be for encoding data to be form submitted. For example, it encodes space as + instead of %20, and encodes colon which isn't necessary.
  • java.net.URI doesn't encode query parameters
3

7 Answers 7

152

java.net.URLEncoder.encode(String s, String encoding) can help too. It follows the HTML form encoding application/x-www-form-urlencoded.

URLEncoder.encode(query, "UTF-8");

On the other hand, Percent-encoding (also known as URL encoding) encodes space with %20. Colon is a reserved character, so : will still remain a colon, after encoding.

Sign up to request clarification or add additional context in comments.

13 Comments

I mentioned that I didn't think that does url encoding, instead it encodes data to be submitted via a form. comments?
I ended up using URLEncoder.encode and replacing "+" with "%20"
It encodes slashes to "%2F", shouldn't it leave the URL slashes as they are?
@golimar No, it shouldn't. You are supposed to give it parameter value only and not the whole URL. Consider example http://example.com/?url=http://example.com/?q=c&sort=name. Should it encode &sort=name or not? There is no way to distinguish value from the URL. That is the exact reason why you need value encoding in the first place.
But actually, slash is a legal character in querystring parameter values.
|
26

Unfortunately, URLEncoder.encode() does not produce valid percent-encoding (as specified in RFC 3986).

URLEncoder.encode() encodes everything just fine, except space is encoded to "+". All the Java URI encoders that I could find only expose public methods to encode the query, fragment, path parts etc. - but don't expose the "raw" encoding. This is unfortunate as fragment and query are allowed to encode space to +, so we don't want to use them. Path is encoded properly but is "normalized" first so we can't use it for 'generic' encoding either.

Best solution I could come up with:

return URLEncoder.encode(raw, StandardCharsets.UTF_8).replaceAll("\\+", "%20");

If replaceAll() is too slow for you, I guess the alternative is to roll your own encoder...

5 Comments

+ is a perfectly valid encoding of a space.
@LawrenceDol it's true but sometimes + may be interpreted incorrectly - take a look at C# blogs.msdn.microsoft.com/yangxind/2006/11/08/…
This. I compared various alternatives against Javascript's encodeURIComponent method output, and this was the only exact match for the ones I tried (queries with spaces, Turkish and German special characters).
Ahmet+Mehmet Demir => Ahmet%2BMehmet+Demir , According to my understanding the only problem here is MIME type application/x-www-form-urlencoded. In such cases space is encoded to + char, if the intention was searching two entries in a web form, like google search by a GET request. URI RFC allows + char as a valid char. So, it doesn't need to be escaped normally.
For better performance, use instead : URLEncoder.encode(raw, StandardCharsets.UTF_8).replace("+", "%20"))
16

EDIT: URIUtil is no longer available in more recent versions, better answer at Java - encode URL or by Mr. Sindi in this thread.


URIUtil of Apache httpclient is really useful, although there are some alternatives

URIUtil.encodeQuery(url);

For example, it encodes space as "+" instead of "%20"

Both are perfectly valid in the right context. Although if you really preferred you could issue a string replace.

7 Comments

I would have to agree. Use HttpClient, you will be much happier.
That look promising, got a link by chance? I'm googling but finding many.
This method doesn't seem to be present in HttpClient 4.1? hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/…
@Alex, hmm that's annoying, I've always used that routine with good results. One idea is to grab the source code from the 3 release since they now obviously didn't want to maintain it anymore.
URIUtil.encodeWithinQuery is what you would use an encode an individual query parameter, which is what the original question seemed to be asking.
|
11

It is not necessary to encode a colon as %3B in the query, although doing so is not illegal.

URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
query       = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

It also seems that only percent-encoded spaces are valid, as I doubt that space is an ALPHA or a DIGIT

look to the URI specification for more details.

4 Comments

But doing so can change the meaning of the URI, since the interpretation of the query string is up to the server. If you are producing a application/x-www-form-urlencoded query string, either is fine. If you are fixing up a URL that the user typed/pasted in, : should be left alone.
@tc. You are right, if colon is being used as a general delimiter (page 12 of the RFC); however, if it is not being used as a general delimiter, then both encodings should resolve identically.
You also have to be careful as URLs are not really a subset of URI: adamgent.com/post/25161273526/urls-are-not-a-subset-of-uris
A colon is %3A not %3B (thats a semicolon), for anybody who is manually encoding
4

The built in Java URLEncoder is doing what it's supposed to, and you should use it.

A "+" or "%20" are both valid replacements for a space character in a URL. Either one will work.

A ":" should be encoded, as it's a separator character. i.e. http://foo or ftp://bar. The fact that a particular browser can handle it when it's not encoded doesn't make it correct. You should encode them.

As a matter of good practice, be sure to use the method that takes a character encoding parameter. UTF-8 is generally used there, but you should supply it explicitly.

URLEncoder.encode(yourUrl, "UTF-8");

4 Comments

+ is only a representation of space in application/x-www-form-urlencoded; it is not guaranteed to work even when restricted to HTTP. Similarly, : is valid in a query string and should not be converted to %3B; a server can choose to interpret them differently.
this method also encode whole url slashes and other characters which are part e.g http:// to http%3A%2F%2F which is not correct
@ToKra you are not supposed to encode the http:// part. The method is for query parameters and encoded form data. If, however, you wanted to pass the URL of another website as a query parameter, THEN you would want to encode it to avoid confusing the URL parser.
@tc My reading of w3.org/TR/html4/interact/forms.html#h-17.13.3.3 is that all GET form data is encoded as application/x-www-form-urlencoded content type. Doesn't that mean is must work for HTTP?
3

I just want to add anther way to resolve this problem.

If your project depends on spring web, you can use their utils.

import org.springframework.web.util.UriUtils

import java.nio.charset.StandardCharsets

UriUtils.encode('vip:104534049:5', StandardCharsets.UTF_8)

Output:

vip%3A104534049%3A5

Comments

0

Heres an attempt to percent-encode as little as possible, but as much as necessary.

It's very much work-in-progress, but may be of some use?

import java.net.URI;
import java.net.URISyntaxException;
import java.util.BitSet;
import java.util.HexFormat;
import java.util.Objects;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import java.util.stream.Stream;

/**
 * A routine to build URI's, percent-encoding only as necessary, as defined in
 * <a href=https://www.rfc-editor.org/rfc/rfc3986>RFC 3986</a>.<br>
 * <br>
 * This focus of this prototype are the entities Query & Fragment.<br>
 * Everything else is delegated to the...<br>
 * {@link  URI#URI(String, String, String, int, String, String, String)}<br>
 * ...constructor, passing null for both Query & Fragment.<br>
 * <br>
 * After correctly percent-encoding Query & Fragment,
 * they are appended to
 * {@link URI#toString()}.<br>
 * <br>
 * One exception was made to the RFC 3986 encoding:<br>
 * RFC 3986 specifies '&' and '=' are exempt from percent-encoding in Queries.<br>
 * But they are both used as delimiters when providing key/value pairs.<br>
 * If either key or value should contain these characters,
 * parsing the resultant Query could be tricky.<br>
 * This class provides a mechanism to percent-encode them in the keys & values,
 * whilst leaving them untouched when assembling the key/value pairs.
 */
public final class UriBuilder {

    public static void main(final String[] args) throws URISyntaxException {

        final var qALL      = toString(ENCODING_EXEMPT_4_PLAINTEXT_QUERY);

        final var host      = "stackoverflow.com";
        final var path      = "/questions/5330104/encoding-url-query-parameters-in-java";

        newBuilder().setScheme("https").setHost(host).setPath(path).build();
        newBuilder().setScheme("https").setHost(host).setPath(path).setQuery(                              ""        )                  .build();
        newBuilder().setScheme("https").setHost(host).setPath(path).setQuery(                              qALL      )                  .build();
        newBuilder().setScheme("https").setHost(host).setPath(path).setQuery(Query.of(QueryKeyValuePair.of(qALL, "")))                  .build();
        newBuilder().setScheme("https").setHost(host).setPath(path).setQuery(Query.of(QueryKeyValuePair.of("",   "")))                  .build();
        newBuilder().setScheme("https").setHost(host).setPath(path).setQuery(Query.of()                              ).setFragment(qALL).build();
    }

    public static record QueryKeyValuePair(String rawKey, String rawValue, String encoded) {

        public static QueryKeyValuePair of(final String key, final String rawValue) {

            final var     keyValueLength = key.length() + rawValue.length();
            final var     sb             = new StringBuilder(keyValueLength * 3);

            if (keyValueLength == 0) {
                return null;
            }
            percentEncode(sb, key,      ENCODING_EXEMPT_4_KEY_PAIR_QUERY);
            ;             sb.append('=');
            percentEncode(sb, rawValue, ENCODING_EXEMPT_4_KEY_PAIR_QUERY);

            return new QueryKeyValuePair(key, rawValue, sb.toString());
        }
    }

    public static record Query(QueryKeyValuePair[] pairs, String rawPlainTextQuery, String encoded) {

        public static Query of(final QueryKeyValuePair... pairs) {

            if (pairs.length == 0) {
                return null;
            }
            final var percentEncoded = Stream.of(pairs).filter(Objects :: nonNull).map(p -> p.encoded).collect(Collectors.joining("&"));

            if (percentEncoded.isEmpty()) {
                return null;
            } else {
                return new Query(pairs, null, percentEncoded);
            }
        }
        public static Query of(final String rawPlainTextQuery) {

            if (rawPlainTextQuery.isEmpty()) {
                return null;
            }
            final var     sb = new StringBuilder(rawPlainTextQuery.length() * 3);

            percentEncode(sb, rawPlainTextQuery, ENCODING_EXEMPT_4_PLAINTEXT_QUERY);

            return new Query(null, rawPlainTextQuery, sb.toString());
        }
    }

    public static record Fragment(String fragment, String encoded) {

        public static Fragment of(final String fragment) {

            if (fragment.isEmpty()) {
                return null;
            }
            final var     sb = new StringBuilder(fragment.length() * 3);

            percentEncode(sb, fragment, ENCODING_EXEMPT_4_FRAGMENT);

            return new Fragment(fragment, sb.toString());
        }
    }

    private static final HexFormat HEX_FORMAT_UPPER             = HexFormat.of().withUpperCase();

    private static final char      REPLACEMENT_CHARACTER_U_FFFD = '\uFFFD';
    private static final int       RFC_3986_BITSET_LENGTH       = 128;

    private static final BitSet    ENCODING_EXEMPT_4_KEY_PAIR_QUERY;
    private static final BitSet    ENCODING_EXEMPT_4_PLAINTEXT_QUERY;
    private static final BitSet    ENCODING_EXEMPT_4_FRAGMENT;
    ;       static {
        final var SUB_DELIMS_EXCEPT_AND_EQUALS = bitSetOf('!',  '$',  '\'',  '(',  ')',  '*',  '+',  ',',  ';',  ':',  '@');
        final var SUB_DELIMS                   = bitSetOr(SUB_DELIMS_EXCEPT_AND_EQUALS, bitSetOf('&',   '='));

        final var DIGIT                        = bitSetRangeInclusive('0', '9');
        final var ALPHA                        = bitSetOr(
                bitSetRangeInclusive('A', 'Z'),
                bitSetRangeInclusive('a', 'z'));

        final var UNRESERVED                   = bitSetOr(ALPHA, DIGIT, bitSetOf('-',  '.',  '_',  '~'));
        /*
         * Above we defined the ABNF syntax as defined in RFC 3986 Appendix A.
         * 
         * Now we can combine them to define the percent-encoding exemptions for the various entities...
         */
        ENCODING_EXEMPT_4_KEY_PAIR_QUERY       = bitSetOr(UNRESERVED, SUB_DELIMS_EXCEPT_AND_EQUALS, bitSetOf('/',  '?'));
        ENCODING_EXEMPT_4_PLAINTEXT_QUERY      = bitSetOr(UNRESERVED, SUB_DELIMS,                   bitSetOf('/',  '?'));

        ENCODING_EXEMPT_4_FRAGMENT             = ENCODING_EXEMPT_4_PLAINTEXT_QUERY;
    }

    private static void percentEncode(final StringBuilder sb, final String rawValue, final BitSet exemptFromPercentEncoding) {

        rawValue.codePoints().forEach(codePoint -> {
            /*
             * Surrogate Pairs will have both Surrogates in the Codepoint.
             * For orphan Surrogates, the Codepoint will contain only the orphan (d800:dfff).
             * 
             * java.net.URLEncoder percent-encodes orphan Surrogates as "%3F".
             * This is the Hex representation of '?' (Question Mark).
             * 
             * Question Mark may, however, be exempt from percent-encoding, so we use '?'.
             * Whether or not it is then percent-encoded depends on the exemptions parameter.
             * 
             * TODO You might like to consider using the standard Replacement Character instead.
             */
            if (codePoint >>> 11 == 0x1B) {               // 0xD8_00 <= codePoint <= 0xDF_FF
                codePoint = REPLACEMENT_CHARACTER_U_FFFD; // TODO ?
                codePoint = '?';
            }
            if (exemptFromPercentEncoding.get            (codePoint)) {
                sb.append                         ((char) codePoint);
                return;
            }
            for (final var utfByte : encodeTo_UTF_8_bytes(codePoint)) {
                sb.append('%');
                sb.append(HEX_FORMAT_UPPER.toHexDigits(utfByte));
            }
        });
    }

    private static byte[] encodeTo_UTF_8_bytes(int codePoint) {
        /*
         * See sun.nio.cs.UTF_8 for Legal UTF-8 Byte Sequences.
         * 
         * Note:
         * Prior to November 2003, UTF-8 permitted Codepoints requiring one to six Bytes.
         * Now, RFC 3629 explicitly prohibits that, allowing for just one to four Bytes.
         * That makes UTF-8 & UTF-16 compatible.
         * The following logic can, however, handle both paradigms...
         */
        if (codePoint < 0x80) {
            return new byte[] {(byte) codePoint}; // 1-Byte Codepoints are simple & MUST be excluded here anyway.
        }
        final var bitCount            = Integer.SIZE - Integer.numberOfLeadingZeros(codePoint);
        final var utf8byteCount       = (bitCount + 3) / 5;        // Yields incorrect result for 1-Byte Codepoints (which we excluded, above)
        final var utf8firstBytePrefix = 0x3F_00 >>> utf8byteCount; // 2 to 6 1-bits right-shifted into Low-Order Byte, depending on Byte-Count.

        final var utf8bytes           = new byte[utf8byteCount];

        for (int i=utf8byteCount - 1; i >= 0; i--) { // (fill the Byte Array from right to left)

            if (i == 0) {
                utf8bytes[i] = (byte) (utf8firstBytePrefix | (0x3F  &  codePoint)); // First-Byte Prefix + trailing 6 bits
            } else {
                utf8bytes[i] = (byte) (0x80                | (0x3F  &  codePoint)); // Other-Byte Prefix + trailing 6 bits
            }
            codePoint >>>= 6;  // Shift right to ready the next 6 bits (or, for 1st byte, as many as remain)
        }
        return  utf8bytes;
    }

    public  static final int      NULL_PORT = -1;

    private              String   scheme    = null;
    private              String   userInfo  = null;
    private              String   host      = null;
    private              int      port      = NULL_PORT;
    private              String   path      = null;
    public               Query    query     = null;
    public               Fragment fragment  = null;

    public  UriBuilder setScheme  (final String scheme)   {this.scheme   =             scheme;    return this;}
    public  UriBuilder setUserInfo(final String userInfo) {this.userInfo =             userInfo;  return this;}
    public  UriBuilder setHost    (final String host)     {this.host     =             host;      return this;}
    public  UriBuilder setPort    (final int    port)     {this.port     =             port;      return this;}
    public  UriBuilder setPath    (final String path)     {this.path     =             path;      return this;}
    public  UriBuilder setQuery   (final Query  query)    {this.query    =             query;     return this;}
    public  UriBuilder setQuery   (final String rawQuery) {this.query    = Query   .of(rawQuery); return this;}
    public  UriBuilder setFragment(final String fragment) {this.fragment = Fragment.of(fragment); return this;}

    public  URI build() throws URISyntaxException {

        final var prefixURI = new URI(this.scheme, this.userInfo, this.host, this.port, this.path, /* Query  */ null, /* Fragment  */ null);

        final var sb        = new StringBuilder(prefixURI.toString());

        if (this.query    != null) {
            sb.append('?').append(this.query   .encoded);
        }
        if (this.fragment != null) {
            sb.append('#').append(this.fragment.encoded);
        }
        final var uri = new URI(sb.toString());

        System.out.println("Native.....: " + prefixURI);
        System.out.println("Generated..: " + uri);
        System.out.println();

        return    uri;
    }

    public  static  UriBuilder newBuilder() {
        return new UriBuilder();
    }

    private static BitSet bitSetOf(final int...    bitIndices) {
        return IntStream.of(bitIndices).collect(() -> new BitSet(RFC_3986_BITSET_LENGTH), BitSet :: set, BitSet :: or);
    }

    private static BitSet bitSetOr(final BitSet... bitSets) {
        return    Stream.of(bitSets)   .collect(() -> new BitSet(RFC_3986_BITSET_LENGTH), BitSet :: or,  BitSet :: or);
    }

    private static BitSet bitSetRangeInclusive(final int fromIndex, final int toIndex) {

        final var newBitSet =                         new BitSet(RFC_3986_BITSET_LENGTH);
        ;         newBitSet.set(fromIndex, toIndex + 1);
        return    newBitSet;
    }

    private static String toString(final BitSet bitSet) {
        return bitSet.stream().collect(StringBuilder :: new, (s, i) -> s.append((char) i), StringBuilder :: append).toString();
    }
}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.