Encoding URL query parameters in Java [duplicate]

Question

How does one encode query parameters to go on a url in Java? I know, this seems like an obvious and already asked question.

There are two subtleties I'm not sure of:

Should spaces be encoded on the url as "+" or as "%20"? In chrome if I type in "http://google.com/foo=?bar me" chrome changes it to be encoded with %20
Is it necessary/correct to encode colons ":" as %3B? Chrome doesn't.

Notes:

java.net.URLEncoder.encode doesn't seem to work, it seems to be for encoding data to be form submitted. For example, it encodes space as + instead of %20, and encodes colon which isn't necessary.
java.net.URI doesn't encode query parameters

This question looks useful: stackoverflow.com/questions/444112/… — waterlooalex
– waterlooalex, Commented Mar 16, 2011 at 19:14
the structure of the query part is server-dependent, though most expect application/x-www-form-urlencoded key/value pairs. See here for more: illegalargumentexception.blogspot.com/2009/12/… — McDowell
– McDowell, Commented Mar 16, 2011 at 20:18
This question is similar to: How do I encode URI parameter values?. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. — miken32
– miken32, Commented May 29 at 14:14

Buhake Sindi · Accepted Answer · 2024-02-28 01:18:23Z

152

java.net.URLEncoder.encode(String s, String encoding) can help too. It follows the HTML form encoding application/x-www-form-urlencoded.

URLEncoder.encode(query, "UTF-8");

On the other hand, Percent-encoding (also known as URL encoding) encodes space with %20. Colon is a reserved character, so : will still remain a colon, after encoding.

edited Feb 28, 2024 at 1:18

answered Mar 16, 2011 at 18:49

Buhake Sindi

89.5k30 gold badges176 silver badges234 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

waterlooalex Over a year ago

I mentioned that I didn't think that does url encoding, instead it encodes data to be submitted via a form. comments?

waterlooalex Over a year ago

I ended up using URLEncoder.encode and replacing "+" with "%20"

golimar Over a year ago

It encodes slashes to "%2F", shouldn't it leave the URL slashes as they are?

Pijusn Over a year ago

@golimar No, it shouldn't. You are supposed to give it parameter value only and not the whole URL. Consider example http://example.com/?url=http://example.com/?q=c&sort=name. Should it encode &sort=name or not? There is no way to distinguish value from the URL. That is the exact reason why you need value encoding in the first place.

Stijn de Witt Over a year ago

But actually, slash is a legal character in querystring parameter values.

|

miken32 · Accepted Answer · 2025-05-29 14:16:07Z

26

Unfortunately, URLEncoder.encode() does not produce valid percent-encoding (as specified in RFC 3986).

URLEncoder.encode() encodes everything just fine, except space is encoded to "+". All the Java URI encoders that I could find only expose public methods to encode the query, fragment, path parts etc. - but don't expose the "raw" encoding. This is unfortunate as fragment and query are allowed to encode space to +, so we don't want to use them. Path is encoded properly but is "normalized" first so we can't use it for 'generic' encoding either.

Best solution I could come up with:

return URLEncoder.encode(raw, StandardCharsets.UTF_8).replaceAll("\\+", "%20");

If replaceAll() is too slow for you, I guess the alternative is to roll your own encoder...

edited May 29 at 14:16

miken32

42.5k16 gold badges127 silver badges177 bronze badges

answered Jul 30, 2015 at 11:49

Kosta

8421 gold badge8 silver badges13 bronze badges

5 Comments

Cornelius Dol Over a year ago

+ is a perfectly valid encoding of a space.

Ilya Serbis Over a year ago

@LawrenceDol it's true but sometimes + may be interpreted incorrectly - take a look at C# blogs.msdn.microsoft.com/yangxind/2006/11/08/…

Utku Özdemir Over a year ago

This. I compared various alternatives against Javascript's encodeURIComponent method output, and this was the only exact match for the ones I tried (queries with spaces, Turkish and German special characters).

Davut Gürbüz Over a year ago

Ahmet+Mehmet Demir => Ahmet%2BMehmet+Demir , According to my understanding the only problem here is MIME type application/x-www-form-urlencoded. In such cases space is encoded to + char, if the intention was searching two entries in a web form, like google search by a GET request. URI RFC allows + char as a valid char. So, it doesn't need to be escaped normally.

Cédric de Launois Nov 19 at 14:52

For better performance, use instead : URLEncoder.encode(raw, StandardCharsets.UTF_8).replace("+", "%20"))

Community · Accepted Answer · 2017-05-23 12:25:43Z

16

EDIT: URIUtil is no longer available in more recent versions, better answer at Java - encode URL or by Mr. Sindi in this thread.

URIUtil of Apache httpclient is really useful, although there are some alternatives

URIUtil.encodeQuery(url);

For example, it encodes space as "+" instead of "%20"

Both are perfectly valid in the right context. Although if you really preferred you could issue a string replace.

edited May 23, 2017 at 12:25

CommunityBot

11 silver badge

answered Mar 16, 2011 at 18:41

Johan Sjöberg

49.4k22 gold badges135 silver badges150 bronze badges

7 Comments

DaShaun Over a year ago

I would have to agree. Use HttpClient, you will be much happier.

waterlooalex Over a year ago

That look promising, got a link by chance? I'm googling but finding many.

waterlooalex Over a year ago

This method doesn't seem to be present in HttpClient 4.1? hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/…

Johan Sjöberg Over a year ago

@Alex, hmm that's annoying, I've always used that routine with good results. One idea is to grab the source code from the 3 release since they now obviously didn't want to maintain it anymore.

Jesse Glick Over a year ago

URIUtil.encodeWithinQuery is what you would use an encode an individual query parameter, which is what the original question seemed to be asking.

|

Community · Accepted Answer · 2021-10-07 05:51:50Z

11

It is not necessary to encode a colon as %3B in the query, although doing so is not illegal.

URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
query       = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

It also seems that only percent-encoded spaces are valid, as I doubt that space is an ALPHA or a DIGIT

look to the URI specification for more details.

edited Oct 7, 2021 at 5:51

CommunityBot

11 silver badge

answered Mar 16, 2011 at 18:50

Edwin Buck

71.2k7 gold badges103 silver badges145 bronze badges

4 Comments

tc. Over a year ago

But doing so can change the meaning of the URI, since the interpretation of the query string is up to the server. If you are producing a application/x-www-form-urlencoded query string, either is fine. If you are fixing up a URL that the user typed/pasted in, : should be left alone.

Edwin Buck Over a year ago

@tc. You are right, if colon is being used as a general delimiter (page 12 of the RFC); however, if it is not being used as a general delimiter, then both encodings should resolve identically.

Adam Gent Over a year ago

You also have to be careful as URLs are not really a subset of URI: adamgent.com/post/25161273526/urls-are-not-a-subset-of-uris

Marcelino Lucero III Over a year ago

A colon is %3A not %3B (thats a semicolon), for anybody who is manually encoding

rfeak · Accepted Answer · 2011-03-16 19:41:42Z

4

The built in Java URLEncoder is doing what it's supposed to, and you should use it.

A "+" or "%20" are both valid replacements for a space character in a URL. Either one will work.

A ":" should be encoded, as it's a separator character. i.e. http://foo or ftp://bar. The fact that a particular browser can handle it when it's not encoded doesn't make it correct. You should encode them.

As a matter of good practice, be sure to use the method that takes a character encoding parameter. UTF-8 is generally used there, but you should supply it explicitly.

URLEncoder.encode(yourUrl, "UTF-8");

answered Mar 16, 2011 at 19:41

rfeak

8,23431 silver badges28 bronze badges

4 Comments

tc. Over a year ago

+ is only a representation of space in application/x-www-form-urlencoded; it is not guaranteed to work even when restricted to HTTP. Similarly, : is valid in a query string and should not be converted to %3B; a server can choose to interpret them differently.

To Kra Over a year ago

this method also encode whole url slashes and other characters which are part e.g http:// to http%3A%2F%2F which is not correct

beldaz Over a year ago

@ToKra you are not supposed to encode the http:// part. The method is for query parameters and encoded form data. If, however, you wanted to pass the URL of another website as a query parameter, THEN you would want to encode it to avoid confusing the URL parser.

beldaz Over a year ago

@tc My reading of w3.org/TR/html4/interact/forms.html#h-17.13.3.3 is that all GET form data is encoded as application/x-www-form-urlencoded content type. Doesn't that mean is must work for HTTP?

aristotll · Accepted Answer · 2021-10-13 02:51:31Z

3

I just want to add anther way to resolve this problem.

If your project depends on spring web, you can use their utils.

import org.springframework.web.util.UriUtils

import java.nio.charset.StandardCharsets

UriUtils.encode('vip:104534049:5', StandardCharsets.UTF_8)

Output:

vip%3A104534049%3A5

answered Oct 13, 2021 at 2:51

aristotll

9,2666 gold badges35 silver badges54 bronze badges

Comments

Dave The Dane · Accepted Answer · 2025-05-29 13:07:27Z

Heres an attempt to percent-encode as little as possible, but as much as necessary.

It's very much work-in-progress, but may be of some use?

import java.net.URI;
import java.net.URISyntaxException;
import java.util.BitSet;
import java.util.HexFormat;
import java.util.Objects;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import java.util.stream.Stream;

/**
 * A routine to build URI's, percent-encoding only as necessary, as defined in
 * <a href=https://www.rfc-editor.org/rfc/rfc3986>RFC 3986</a>.<br>
 * <br>
 * This focus of this prototype are the entities Query & Fragment.<br>
 * Everything else is delegated to the...<br>
 * {@link  URI#URI(String, String, String, int, String, String, String)}<br>
 * ...constructor, passing null for both Query & Fragment.<br>
 * <br>
 * After correctly percent-encoding Query & Fragment,
 * they are appended to
 * {@link URI#toString()}.<br>
 * <br>
 * One exception was made to the RFC 3986 encoding:<br>
 * RFC 3986 specifies '&' and '=' are exempt from percent-encoding in Queries.<br>
 * But they are both used as delimiters when providing key/value pairs.<br>
 * If either key or value should contain these characters,
 * parsing the resultant Query could be tricky.<br>
 * This class provides a mechanism to percent-encode them in the keys & values,
 * whilst leaving them untouched when assembling the key/value pairs.
 */
public final class UriBuilder {

    public static void main(final String[] args) throws URISyntaxException {

        final var qALL      = toString(ENCODING_EXEMPT_4_PLAINTEXT_QUERY);

        final var host      = "stackoverflow.com";
        final var path      = "/questions/5330104/encoding-url-query-parameters-in-java";

        newBuilder().setScheme("https").setHost(host).setPath(path).build();
        newBuilder().setScheme("https").setHost(host).setPath(path).setQuery(                              ""        )                  .build();
        newBuilder().setScheme("https").setHost(host).setPath(path).setQuery(                              qALL      )                  .build();
        newBuilder().setScheme("https").setHost(host).setPath(path).setQuery(Query.of(QueryKeyValuePair.of(qALL, "")))                  .build();
        newBuilder().setScheme("https").setHost(host).setPath(path).setQuery(Query.of(QueryKeyValuePair.of("",   "")))                  .build();
        newBuilder().setScheme("https").setHost(host).setPath(path).setQuery(Query.of()                              ).setFragment(qALL).build();
    }

    public static record QueryKeyValuePair(String rawKey, String rawValue, String encoded) {

        public static QueryKeyValuePair of(final String key, final String rawValue) {

            final var     keyValueLength = key.length() + rawValue.length();
            final var     sb             = new StringBuilder(keyValueLength * 3);

            if (keyValueLength == 0) {
                return null;
            }
            percentEncode(sb, key,      ENCODING_EXEMPT_4_KEY_PAIR_QUERY);
            ;             sb.append('=');
            percentEncode(sb, rawValue, ENCODING_EXEMPT_4_KEY_PAIR_QUERY);

            return new QueryKeyValuePair(key, rawValue, sb.toString());
        }
    }

    public static record Query(QueryKeyValuePair[] pairs, String rawPlainTextQuery, String encoded) {

        public static Query of(final QueryKeyValuePair... pairs) {

            if (pairs.length == 0) {
                return null;
            }
            final var percentEncoded = Stream.of(pairs).filter(Objects :: nonNull).map(p -> p.encoded).collect(Collectors.joining("&"));

            if (percentEncoded.isEmpty()) {
                return null;
            } else {
                return new Query(pairs, null, percentEncoded);
            }
        }
        public static Query of(final String rawPlainTextQuery) {

            if (rawPlainTextQuery.isEmpty()) {
                return null;
            }
            final var     sb = new StringBuilder(rawPlainTextQuery.length() * 3);

            percentEncode(sb, rawPlainTextQuery, ENCODING_EXEMPT_4_PLAINTEXT_QUERY);

            return new Query(null, rawPlainTextQuery, sb.toString());
        }
    }

    public static record Fragment(String fragment, String encoded) {

        public static Fragment of(final String fragment) {

            if (fragment.isEmpty()) {
                return null;
            }
            final var     sb = new StringBuilder(fragment.length() * 3);

            percentEncode(sb, fragment, ENCODING_EXEMPT_4_FRAGMENT);

            return new Fragment(fragment, sb.toString());
        }
    }

    private static final HexFormat HEX_FORMAT_UPPER             = HexFormat.of().withUpperCase();

    private static final char      REPLACEMENT_CHARACTER_U_FFFD = '\uFFFD';
    private static final int       RFC_3986_BITSET_LENGTH       = 128;

    private static final BitSet    ENCODING_EXEMPT_4_KEY_PAIR_QUERY;
    private static final BitSet    ENCODING_EXEMPT_4_PLAINTEXT_QUERY;
    private static final BitSet    ENCODING_EXEMPT_4_FRAGMENT;
    ;       static {
        final var SUB_DELIMS_EXCEPT_AND_EQUALS = bitSetOf('!',  '$',  '\'',  '(',  ')',  '*',  '+',  ',',  ';',  ':',  '@');
        final var SUB_DELIMS                   = bitSetOr(SUB_DELIMS_EXCEPT_AND_EQUALS, bitSetOf('&',   '='));

        final var DIGIT                        = bitSetRangeInclusive('0', '9');
        final var ALPHA                        = bitSetOr(
                bitSetRangeInclusive('A', 'Z'),
                bitSetRangeInclusive('a', 'z'));

        final var UNRESERVED                   = bitSetOr(ALPHA, DIGIT, bitSetOf('-',  '.',  '_',  '~'));
        /*
         * Above we defined the ABNF syntax as defined in RFC 3986 Appendix A.
         * 
         * Now we can combine them to define the percent-encoding exemptions for the various entities...
         */
        ENCODING_EXEMPT_4_KEY_PAIR_QUERY       = bitSetOr(UNRESERVED, SUB_DELIMS_EXCEPT_AND_EQUALS, bitSetOf('/',  '?'));
        ENCODING_EXEMPT_4_PLAINTEXT_QUERY      = bitSetOr(UNRESERVED, SUB_DELIMS,                   bitSetOf('/',  '?'));

        ENCODING_EXEMPT_4_FRAGMENT             = ENCODING_EXEMPT_4_PLAINTEXT_QUERY;
    }

    private static void percentEncode(final StringBuilder sb, final String rawValue, final BitSet exemptFromPercentEncoding) {

        rawValue.codePoints().forEach(codePoint -> {
            /*
             * Surrogate Pairs will have both Surrogates in the Codepoint.
             * For orphan Surrogates, the Codepoint will contain only the orphan (d800:dfff).
             * 
             * java.net.URLEncoder percent-encodes orphan Surrogates as "%3F".
             * This is the Hex representation of '?' (Question Mark).
             * 
             * Question Mark may, however, be exempt from percent-encoding, so we use '?'.
             * Whether or not it is then percent-encoded depends on the exemptions parameter.
             * 
             * TODO You might like to consider using the standard Replacement Character instead.
             */
            if (codePoint >>> 11 == 0x1B) {               // 0xD8_00 <= codePoint <= 0xDF_FF
                codePoint = REPLACEMENT_CHARACTER_U_FFFD; // TODO ?
                codePoint = '?';
            }
            if (exemptFromPercentEncoding.get            (codePoint)) {
                sb.append                         ((char) codePoint);
                return;
            }
            for (final var utfByte : encodeTo_UTF_8_bytes(codePoint)) {
                sb.append('%');
                sb.append(HEX_FORMAT_UPPER.toHexDigits(utfByte));
            }
        });
    }

    private static byte[] encodeTo_UTF_8_bytes(int codePoint) {
        /*
         * See sun.nio.cs.UTF_8 for Legal UTF-8 Byte Sequences.
         * 
         * Note:
         * Prior to November 2003, UTF-8 permitted Codepoints requiring one to six Bytes.
         * Now, RFC 3629 explicitly prohibits that, allowing for just one to four Bytes.
         * That makes UTF-8 & UTF-16 compatible.
         * The following logic can, however, handle both paradigms...
         */
        if (codePoint < 0x80) {
            return new byte[] {(byte) codePoint}; // 1-Byte Codepoints are simple & MUST be excluded here anyway.
        }
        final var bitCount            = Integer.SIZE - Integer.numberOfLeadingZeros(codePoint);
        final var utf8byteCount       = (bitCount + 3) / 5;        // Yields incorrect result for 1-Byte Codepoints (which we excluded, above)
        final var utf8firstBytePrefix = 0x3F_00 >>> utf8byteCount; // 2 to 6 1-bits right-shifted into Low-Order Byte, depending on Byte-Count.

        final var utf8bytes           = new byte[utf8byteCount];

        for (int i=utf8byteCount - 1; i >= 0; i--) { // (fill the Byte Array from right to left)

            if (i == 0) {
                utf8bytes[i] = (byte) (utf8firstBytePrefix | (0x3F  &  codePoint)); // First-Byte Prefix + trailing 6 bits
            } else {
                utf8bytes[i] = (byte) (0x80                | (0x3F  &  codePoint)); // Other-Byte Prefix + trailing 6 bits
            }
            codePoint >>>= 6;  // Shift right to ready the next 6 bits (or, for 1st byte, as many as remain)
        }
        return  utf8bytes;
    }

    public  static final int      NULL_PORT = -1;

    private              String   scheme    = null;
    private              String   userInfo  = null;
    private              String   host      = null;
    private              int      port      = NULL_PORT;
    private              String   path      = null;
    public               Query    query     = null;
    public               Fragment fragment  = null;

    public  UriBuilder setScheme  (final String scheme)   {this.scheme   =             scheme;    return this;}
    public  UriBuilder setUserInfo(final String userInfo) {this.userInfo =             userInfo;  return this;}
    public  UriBuilder setHost    (final String host)     {this.host     =             host;      return this;}
    public  UriBuilder setPort    (final int    port)     {this.port     =             port;      return this;}
    public  UriBuilder setPath    (final String path)     {this.path     =             path;      return this;}
    public  UriBuilder setQuery   (final Query  query)    {this.query    =             query;     return this;}
    public  UriBuilder setQuery   (final String rawQuery) {this.query    = Query   .of(rawQuery); return this;}
    public  UriBuilder setFragment(final String fragment) {this.fragment = Fragment.of(fragment); return this;}

    public  URI build() throws URISyntaxException {

        final var prefixURI = new URI(this.scheme, this.userInfo, this.host, this.port, this.path, /* Query  */ null, /* Fragment  */ null);

        final var sb        = new StringBuilder(prefixURI.toString());

        if (this.query    != null) {
            sb.append('?').append(this.query   .encoded);
        }
        if (this.fragment != null) {
            sb.append('#').append(this.fragment.encoded);
        }
        final var uri = new URI(sb.toString());

        System.out.println("Native.....: " + prefixURI);
        System.out.println("Generated..: " + uri);
        System.out.println();

        return    uri;
    }

    public  static  UriBuilder newBuilder() {
        return new UriBuilder();
    }

    private static BitSet bitSetOf(final int...    bitIndices) {
        return IntStream.of(bitIndices).collect(() -> new BitSet(RFC_3986_BITSET_LENGTH), BitSet :: set, BitSet :: or);
    }

    private static BitSet bitSetOr(final BitSet... bitSets) {
        return    Stream.of(bitSets)   .collect(() -> new BitSet(RFC_3986_BITSET_LENGTH), BitSet :: or,  BitSet :: or);
    }

    private static BitSet bitSetRangeInclusive(final int fromIndex, final int toIndex) {

        final var newBitSet =                         new BitSet(RFC_3986_BITSET_LENGTH);
        ;         newBitSet.set(fromIndex, toIndex + 1);
        return    newBitSet;
    }

    private static String toString(final BitSet bitSet) {
        return bitSet.stream().collect(StringBuilder :: new, (s, i) -> s.append((char) i), StringBuilder :: append).toString();
    }
}

Collectives™ on Stack Overflow

Encoding URL query parameters in Java [duplicate]

7 Answers 7

13 Comments

5 Comments

7 Comments

4 Comments

4 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

13 Comments

5 Comments

7 Comments

4 Comments

4 Comments

Comments

Comments

Linked

Related