Not all Base64 is created equal
TIL not all Base64 encoding is created equal
I have a fun, kind of hacky project where I wanted to be able to embed a SQL query in a URL
Since I knew that %
was a reserved character in a URL and that we’re very likely to see a %
in our SQL queries in the form of a LIKE '%statement%'
, I decided to use Base64 encoding on the query string
Python base64.b64encode
and base64.b64decode
to the rescue!
This works, until it doesn’t ☹️
The problem with this solution is that b64encode
does not output URL safe bytes, at least not all the time. The default RFC 4648 output includes +
and /
which are both URL reserved characters
If you try to decode “base64 encoded” bytes from a URL using these methods, you might see an error like:
Invalid base64-encoded string: number of data characters (<number>) cannot be 1 more than a multiple of 4
Luckliy the next section of RFC 4648 describes Base64 encoding with a URL and filename safe alphabet, which swaps +
with -
and /
with _
Even luckier, the python base64
package also includes functions for this section of the RFC: urlsafe_b64encode
and urlsafe_b64decode
If you have even more specific needs, you can check out the altchars=
parameter of the base64.b64encode
and base64.b64decode
functions