What characters can go into a valid HTTP URL?
Section 5 of RFC 1738 – Uniform Resource Locators specifies the format of an HTTP URL:
httpurl = "http://" hostport [ "/" hpath [ "?" search ]][definition of hostport omitted]hpath = hsegment *[ "/" hsegment ]hsegment = *[ uchar | ";" | ":" | "@" | "&" | "=" ]search = *[ uchar | ";" | ":" | "@" | "&" | "=" ]uchar = unreserved | escapeunreserved = alpha | digit | safe | extraalpha = lowalpha | hialphalowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"safe = "$" | "-" | "_" | "." | "+"extra = "!" | "*" | "'" | "(" | ")" | ","escape = "%" hex hexhex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f"
The “path” and the “query string” parts can contain alphabets, numbers the following characters: $-_.+!*’(,;:@&=/
They can also contain escaped forms (“%” (percent) followed by 2 hex values, i.e. %[A-Fa-f0-9]{2}, for each byte) of other characters.