Base64 decode.
Base64 alphabet → original bytes. Auto-detects URL-safe input, repairs missing padding, and shows you the raw hex when the output isn't text. Local only — pasted ciphertext stays in the tab.
Hello, world! 👋
Sniffing payloads from the prefix.
You can usually identify a Base64-encoded payload by its first few characters before decoding. eyJ means the bytes start with {" — almost certainly JSON, often a JWT segment. iVBOR is the PNG signature. /9j/ is the JPEG SOI marker. JVBER begins %PDF-. UEsDB begins PK\x03\x04 — a ZIP archive (and therefore also DOCX, XLSX, JAR, APK, EPUB, since they're all ZIP containers). These prefixes are stable because the first three input bytes always map to the first four output characters.
| Base64 prefix | Decoded magic | File type |
|---|---|---|
| eyJ | {" | JSON / JWT segment |
| iVBORw0KGgo | \x89PNG\r\n\x1a\n | PNG image |
| /9j/ | \xff\xd8\xff | JPEG image |
| R0lGOD | GIF8 | GIF image |
| JVBER | %PDF- | PDF document |
| UEsDB | PK\x03\x04 | ZIP / DOCX / XLSX / JAR |
| f0VMR | \x7fELF | Linux executable |
| TVqQ | MZ | Windows PE executable |
This trick scales: any file format with a stable magic header has a stable Base64 prefix. Hex viewers like file(1) rely on the same magic numbers. When inspecting unknown payloads — log lines, captured packets, leaked dumps — read the first 8–12 characters before decoding to know what you're about to look at.
Three input bytes (24 bits) become four output characters (4 × 6 bits). So every group of 4 characters in the Base64 string corresponds to a 3-byte slice of the original. The = at the end means "the last group had only 1 or 2 real bytes; the rest is padding."
Where decoders disagree.
RFC 4648 defines a strict decoder: every input character must be in the alphabet, padding must be present where required, and the decoder must reject inputs containing characters outside the alphabet. In practice, most production decoders are lenient — they strip whitespace, accept missing padding, and silently ignore stray newlines. This is convenient but creates parser-differential bugs: the same Base64 string decodes to different bytes depending on which library you use.
The 2024–2025 wave of HTTP request-smuggling bugs in JOSE libraries traces back to exactly this issue: a permissive decoder accepts an input the strict decoder would reject, and the two endpoints downstream interpret the message differently. Authentication services that re-encode and compare against the original ("canonicalization") catch most of these — but only when the comparison is byte-exact and the re-encoder is the strict variant.
| Library / context | Default behaviour | Strict mode? |
|---|---|---|
| Browser atob() | strict — throws on bad chars or wrong length | built-in |
| Node Buffer.from(s, 'base64') | lenient — strips whitespace and bad chars | none |
| Python base64.b64decode | strict by default; validate=False to relax | parameter |
| Go encoding/base64 | strict for .Strict() encodings | method |
| Java Base64.getDecoder() | strict; getMimeDecoder is lenient | variant |
| OpenSSL EVP_DecodeBlock | strict | built-in |
If your application crosses a security boundary with Base64 — accepting tokens from a third party, validating signatures, deserialising configuration — pick the strict decoder. The convenience of accepting whitespace and missing padding is not worth the parser-differential risk. For ergonomic decoders inside a single trusted system (debug tools, log inspection), lenient is fine.
How four characters recover three bytes.
A decoder is the encoder run backwards. It maps each Base64 character to its 6-bit value (a 256-entry lookup table is the standard implementation), accumulates four 6-bit values into a 24-bit register, and writes out three bytes. The trailing-padding handling is the only subtlety: == means the last group had one real byte, = means two real bytes, no padding means three.
function decode(s) {
// Build inverse of the alphabet
const A = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
const inv = new Int8Array(128);
for (let i = 0; i < 128; i++) inv[i] = -1;
for (let i = 0; i < 64; i++) inv[A.charCodeAt(i)] = i;
// Strip padding; we will compute output length from input length
const stripped = s.replace(/=+$/, '');
const out = new Uint8Array(Math.floor(stripped.length * 6 / 8));
let buf = 0, bits = 0, j = 0;
for (let i = 0; i < stripped.length; i++) {
const v = inv[stripped.charCodeAt(i)];
if (v < 0) throw new Error('non-alphabet char');
buf = (buf << 6) | v;
bits += 6;
if (bits >= 8) { bits -= 8; out[j++] = (buf >> bits) & 0xff; }
}
return out;
} Performance characteristics mirror the encoder. A naive byte-at-a-time loop runs at a few hundred MB/s; SIMD-accelerated decoders process a 32-byte chunk per loop iteration and saturate memory bandwidth. The hot path is the lookup table: a 256-byte table fits in one cache line, so the decoder is normally bound by memory throughput, not arithmetic.
A common micro-optimisation: rather than rejecting non-alphabet characters with a branch, mark them with a sentinel value (negative) and check after the entire loop with one branchless OR-reduction over the chunk. The compiler vectorises this; the rejection happens once per 32-byte block instead of once per character.
Five ways a decode silently corrupts.
First, padding mismatch. JWT segments are URL-safe Base64 without padding. Plain atob() in the browser rejects them with InvalidCharacterError. The fix is the one this tool does for you: replace - _ with + /, then re-pad to a multiple of four with =.
Second, length-mod-4 = 1 inputs. A Base64 string can have length mod 4 of 0, 2, or 3 — never 1. If the length mod 4 is 1, the input is corrupt. Lenient decoders may silently truncate; strict decoders must throw. Always check.
Third, embedded whitespace. RFC 2045 (MIME) Base64 wraps every 76 characters; RFC 4648 emits one long line. Decoders that don't strip whitespace will fail on RFC 2045 input. The defensive move: strip \r \n \t before decoding.
Fourth, encoding confusion when the decoded bytes are then UTF-8 decoded. atob returns a "binary string" — one character per byte, with character codes 0–255 — not a string of UTF-8 characters. Calling atob(b64) on a JWT payload produces a string with the right bytes but wrong character semantics; you need new TextDecoder().decode(uint8Array) to get back proper UTF-8 text.
Fifth, BOM handling. UTF-8 byte-order marks (EF BB BF) at the start of decoded text confuse downstream parsers. Either strip the BOM after decoding, or use a UTF-8 decoder configured to ignore it (TextDecoder('utf-8', { ignoreBOM: false })).
| Symptom | Likely cause | Fix |
|---|---|---|
| InvalidCharacterError in atob | URL-safe input with no padding | swap chars + repad |
| Decoded text has 0xEF 0xBB 0xBF prefix | UTF-8 BOM passed through | strip 3 bytes if present |
| Decode silently shorter than expected | length mod 4 = 1 (corrupt input) | reject; do not silently accept |
| Wrong characters after JSON parse | atob result not run through TextDecoder | use Uint8Array → TextDecoder('utf-8') |
| Failure on multi-line input | decoder doesn't strip whitespace | strip /\s/g before decoding |
Decoding gigabytes without copying them.
The convenient one-liner — Buffer.from(big, 'base64') in Node, base64.b64decode(big) in Python — allocates a buffer the size of the entire decoded output. For a 1 GB Base64 input, that's about 750 MB of allocated bytes. Fine for desktop scripts; not fine inside an HTTP request handler that gets called a thousand times a second.
The streaming idiom decodes a chunk at a time, holding only a four-character window in memory. Node's stream.Transform, Go's base64.NewDecoder wrapping an io.Reader, Python's base64.decode(input_stream, output_stream), and the DecompressionStream Web API all support this. The catch: chunk boundaries can fall mid-Base64-group, so the decoder needs a small carry buffer (up to 3 chars) for the next chunk. All the standard libraries handle this correctly; if you write your own streaming decoder, the carry buffer is the only state you need.
A useful diagnostic: when you see "decoded payload looks fine on tiny inputs but corrupt above 64 KB" in production, the likely cause is exactly chunk-boundary mishandling — somebody's homegrown decoder that decodes each WebSocket frame independently without carrying the partial group across frames.
Decoding 4 input bytes produces 3 output bytes — a 25% shrink. So a malicious 100 KB Base64 string only expands to 75 KB. That's fine. But a 100 MB Base64 string expands to 75 MB, which can be enough to OOM a small service. Cap the maximum size of accepted Base64 inputs, or stream-decode with a memory budget.
If round-trip doesn't match, don't trust it.
The cheapest integrity check is a re-encode. Decode the input, encode it again with a strict canonical encoder, and compare to the original (after normalising padding and alphabet). If they don't match, the input was non-canonical: extra whitespace, alphabet-confusion, or something more deliberate. For tokens used in security decisions — JWTs, OAuth state, signed cookies — many libraries already do this on every verification. If yours doesn't, add it.
A worked example. A JWT signature is computed over the exact bytes of header.payload. If a service decodes header and payload, re-serialises them differently (changing key order, whitespace, or alphabet), and then verifies the signature against the new bytes, the signature won't match — even though it was technically valid for the original message. This is why JWT libraries operate on the original encoded segments, not their re-serialisations.
For non-security data — log inspection, debugging, exporting attachments — the round-trip check is overkill but cheap, so the convention in production tools is to do it anyway. The cost is a hundred microseconds; the benefit is detecting copy-paste corruption before it becomes a downstream mystery. This decoder doesn't run a round-trip check by default (the input panel updates as you type, so the cost would feel jittery), but a "verify canonical" toggle in a future revision would be easy to add.
| Use case | Strictness | Validation cost worth it? |
|---|---|---|
| JWT verification | strict canonical | yes (security) |
| OAuth state parameter | strict | yes (security) |
| Image data URI in HTML | lenient | no (browser tolerates whitespace) |
| Log payload inspection | lenient | no (debug tooling) |
| S/MIME signature verification | strict | yes (cryptographic) |
| Pasted clipboard inspection | lenient | no (manual workflow) |
Real payloads you'll decode this week.
JWTs are everywhere — every OAuth-protected API call carries one. The header (alg, typ, kid) is human-readable JSON; the payload (sub, iss, aud, exp, iat, custom claims) is also JSON; the signature is opaque bytes. Decoding the first two segments is the fastest way to see why a token is being rejected: was it the audience? the expiry? a missing claim? Note that you don't need the secret to read a JWT — only to verify it. That asymmetry is the source of a lot of "but I thought JWTs were encrypted" confusion.
Data URIs in HTML and CSS look like data:image/png;base64,iVBORw0KGgo…. The MIME type sits before the comma, the Base64 payload after. Decoders that drop the MIME prefix and decode the rest recover the original file bytes — useful for extracting embedded SVGs or favicons from CSS. Browsers cap data URI sizes (Chrome at ~32 MB, Firefox at ~30 MB) but most legitimate use is well under 100 KB.
PEM-armoured certificates are Base64-encoded DER inside a header/footer envelope: -----BEGIN CERTIFICATE-----, then a wrapped Base64 block, then -----END CERTIFICATE-----. Strip the lines starting with -----, concatenate the rest, decode, and you have a DER-encoded X.509 — which then needs an ASN.1 parser to decode further. The same envelope wraps RSA keys, EC keys, certificate chains, CSRs, and CRLs. The header line tells you which.
S/MIME and PGP messages encode the entire ciphertext as Base64 in the message body so it survives email transit. AWS Signature V4's x-amz-content-sha256 headers carry hex-encoded hashes, but other AWS APIs use Base64 — the convention varies by service. Slack's files.upload takes raw binary; Discord's takes Base64 in JSON. When integrating with a new API, the only reliable check is to round-trip a known test vector and confirm byte-for-byte equality.
Hex (Base16) is twice the size but trivially human-readable — preferred for hashes and short identifiers. Base32 (RFC 4648 §6) is case-insensitive and uses no symbols — preferred for OTP secrets, magnet links, and Onion v3 addresses. Base58 (Bitcoin) drops 0 O I l + / for visual unambiguity. Base85 (Adobe) and z85 (ZeroMQ) have lower overhead than Base64 at the cost of trickier alphabets. Each picked a different point on the readability/density curve.