Not all Base64 is created equal

#python #web

TIL not all Base64 encoding is created equal

I have a fun, kind of hacky project where I wanted to be able to embed a SQL query in a URL

Since I knew that % was a reserved character in a URL and that we’re very likely to see a % in our SQL queries in the form of a LIKE '%statement%', I decided to use Base64 encoding on the query string

Python base64.b64encode and base64.b64decode to the rescue!

This works, until it doesn’t ☹️

The problem with this solution is that b64encode does not output URL safe bytes, at least not all the time. The default RFC 4648 output includes + and / which are both URL reserved characters

If you try to decode “base64 encoded” bytes from a URL using these methods, you might see an error like:

Invalid base64-encoded string: number of data characters (<number>) cannot be 1 more than a multiple of 4

Luckliy the next section of RFC 4648 describes Base64 encoding with a URL and filename safe alphabet, which swaps + with - and / with _

Even luckier, the python base64 package also includes functions for this section of the RFC: urlsafe_b64encode and urlsafe_b64decode

If you have even more specific needs, you can check out the altchars= parameter of the base64.b64encode and base64.b64decode functions