How Base64 encoding works
Base64 converts arbitrary binary data into a string of printable ASCII characters by re-grouping bits and mapping small values to a 64-character alphabet. This page walks through the encoding algorithm step by step, shows the lookup table, explains padding, and covers the URL-safe variant.
| Property | Value |
|---|---|
| Input unit | 3 bytes (24 bits) |
| Output unit | 4 characters (6 bits each) |
| Size expansion | ~33% (4 chars per 3 bytes) |
| Alphabet size | 64 characters + = padding |
| URL-safe substitution | + → - and / → _ (RFC 4648 §5) |
| Algorithm complexity | O(n) — one pass over input bytes |
| Decoding | Exact inverse of encoding — no key needed |
The core idea
A byte holds 8 bits, which means 256 possible values (0–255). Not all 256 values map to printable, safe ASCII characters — many are control codes or have special meaning in protocols. Base64 sidesteps this by using only 6 bits per output character, which gives 2⁶ = 64 possible values — exactly the 64 characters that make up the alphabet.
Because each output character carries 6 bits instead of 8, Base64 needs 4 characters to represent the same information as 3 bytes (4 × 6 = 24 bits = 3 × 8). This 4:3 expansion is where the ~33% size increase comes from.
The encoding algorithm
Step 1 — Serialize to bytes
Base64 encodes bytes, not characters. If the input is text, it must first be serialized to a byte sequence. UTF-8 is the standard choice: it correctly encodes every Unicode codepoint including emoji, CJK characters, and accented letters.
h → 0x68 → 104 → 01101000 i → 0x69 → 105 → 01101001 ! → 0x21 → 33 → 00100001
Step 2 — Group into 3-byte blocks
The bytes are processed three at a time (24 bits). If the total byte count is not divisible by three, the last group is padded with zero bits to reach 24 bits.
01101000 01101001 00100001
Step 3 — Split into 6-bit groups
The 24-bit block is divided into four 6-bit values. Each value is a number between 0 and 63 that will index into the Base64 alphabet.
011010 | 000110 | 100100 | 100001 26 6 36 33
Step 4 — Look up in the alphabet
Each 6-bit value is mapped to its Base64 character using the lookup table below.
26 → a 6 → G 36 → k 33 → h Output: aGkh
The Base64 alphabet
The 64 characters are split into four groups. The full table (value → character):
The = character is a padding marker — it is not part of the 64-value alphabet, but is appended to the output to make its length a multiple of 4.
Padding with =
If the total number of input bytes is not a multiple of 3, one or two bytes of zero-padding are added before encoding, and the corresponding output characters are replaced by =:
| Input bytes | Remainder mod 3 | Padding chars | Example |
|---|---|---|---|
| Multiple of 3 | 0 | none | "yes" → eWVz |
| Remainder 1 | 1 | == | "a" → YQ== |
| Remainder 2 | 2 | = | "ab" → YWI= |
Decoding — running the algorithm in reverse
Decoding is the exact inverse of encoding:
- Strip any trailing
=padding characters. - Look up each character in the alphabet to get its 6-bit value.
- Concatenate groups of four 6-bit values to form 24-bit blocks, then split each block back into three bytes.
- Decode the resulting byte sequence using the target charset (UTF-8 by default).
If any character in the input is not in the alphabet (and is not a padding =), the decoder should reject the input as invalid.
URL-safe Base64 (Base64URL)
Standard Base64 uses + (index 62) and / (index 63). Both characters have reserved meanings in URLs: + represents a space in query strings, and / is the path separator. Using them unescaped in a URL causes parsing errors or silent corruption.
RFC 4648 §5 defines URL-safe Base64, known as Base64URL, which simply substitutes:
+ → - (index 62) / → _ (index 63)
All other characters remain the same. JSON Web Tokens (JWTs), OAuth tokens, and many authentication systems use Base64URL. The "URL-safe" toggle in base64tool applies this substitution automatically.
Toggle URL-safe mode in the tool →UTF-8, ASCII, and Latin-1
Base64 operates on bytes. The character set (charset) setting controls how the input string is serialized into bytes before encoding, and how bytes are interpreted after decoding:
- UTF-8 — The default. Correctly encodes every Unicode character, including emoji (
🚀), accented letters (é), and CJK characters. Always use UTF-8 unless you have a specific reason not to. - ASCII — Characters above codepoint 127 are truncated to their low 7 bits. Suitable only for pure ASCII input.
- Latin-1 — Maps characters to their ISO 8859-1 byte values. Useful for compatibility with legacy systems that produced Latin-1 encoded Base64.
If you are decoding Base64 that was produced by an older system and the output looks garbled, try switching the charset to Latin-1.
Performance characteristics
Base64 encoding and decoding is O(n) in the number of input bytes — each byte is processed exactly once with no look-back or lookahead. Modern browsers execute Base64 operations at hundreds of megabytes per second using native btoa / atob implementations. base64tool adds a UTF-8 serialization step via TextEncoder and TextDecoder, which is also O(n) and hardware-accelerated in all major browsers.