Tool

Base64 decode.

Base64 alphabet → original bytes. Auto-detects URL-safe input, repairs missing padding, and shows you the raw hex when the output isn't text. Local only — pasted ciphertext stays in the tab.

Input chars

Output bytes

Status

Base64 input

Samples

Decoded text

Hello, world! 👋

Sniffing payloads from the prefix.

You can usually identify a Base64-encoded payload by its first few characters before decoding. eyJ means the bytes start with {" — almost certainly JSON, often a JWT segment. iVBOR is the PNG signature. /9j/ is the JPEG SOI marker. JVBER begins %PDF-. UEsDB begins PK\x03\x04 — a ZIP archive (and therefore also DOCX, XLSX, JAR, APK, EPUB, since they're all ZIP containers). These prefixes are stable because the first three input bytes always map to the first four output characters.

Base64 prefix	Decoded magic	File type
eyJ	{"	JSON / JWT segment
iVBORw0KGgo	\x89PNG\r\n\x1a\n	PNG image
/9j/	\xff\xd8\xff	JPEG image
R0lGOD	GIF8	GIF image
JVBER	%PDF-	PDF document
UEsDB	PK\x03\x04	ZIP / DOCX / XLSX / JAR
f0VMR	\x7fELF	Linux executable
TVqQ	MZ	Windows PE executable

This trick scales: any file format with a stable magic header has a stable Base64 prefix. Hex viewers like file(1) rely on the same magic numbers. When inspecting unknown payloads — log lines, captured packets, leaked dumps — read the first 8–12 characters before decoding to know what you're about to look at.

A quick mental model

Three input bytes (24 bits) become four output characters (4 × 6 bits). So every group of 4 characters in the Base64 string corresponds to a 3-byte slice of the original. The = at the end means "the last group had only 1 or 2 real bytes; the rest is padding."

Where decoders disagree.

RFC 4648 defines a strict decoder: every input character must be in the alphabet, padding must be present where required, and the decoder must reject inputs containing characters outside the alphabet. In practice, most production decoders are lenient — they strip whitespace, accept missing padding, and silently ignore stray newlines. This is convenient but creates parser-differential bugs: the same Base64 string decodes to different bytes depending on which library you use.

The 2024–2025 wave of HTTP request-smuggling bugs in JOSE libraries traces back to exactly this issue: a permissive decoder accepts an input the strict decoder would reject, and the two endpoints downstream interpret the message differently. Authentication services that re-encode and compare against the original ("canonicalization") catch most of these — but only when the comparison is byte-exact and the re-encoder is the strict variant.

Library / context	Default behaviour	Strict mode?
Browser atob()	strict — throws on bad chars or wrong length	built-in
Node Buffer.from(s, 'base64')	lenient — strips whitespace and bad chars	none
Python base64.b64decode	strict by default; validate=False to relax	parameter
Go encoding/base64	strict for .Strict() encodings	method
Java Base64.getDecoder()	strict; getMimeDecoder is lenient	variant
OpenSSL EVP_DecodeBlock	strict	built-in

If your application crosses a security boundary with Base64 — accepting tokens from a third party, validating signatures, deserialising configuration — pick the strict decoder. The convenience of accepting whitespace and missing padding is not worth the parser-differential risk. For ergonomic decoders inside a single trusted system (debug tools, log inspection), lenient is fine.

How four characters recover three bytes.

A decoder is the encoder run backwards. It maps each Base64 character to its 6-bit value (a 256-entry lookup table is the standard implementation), accumulates four 6-bit values into a 24-bit register, and writes out three bytes. The trailing-padding handling is the only subtlety: == means the last group had one real byte, = means two real bytes, no padding means three.

function decode(s) {
  // Build inverse of the alphabet
  const A = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
  const inv = new Int8Array(128);
  for (let i = 0; i < 128; i++) inv[i] = -1;
  for (let i = 0; i < 64; i++) inv[A.charCodeAt(i)] = i;
  // Strip padding; we will compute output length from input length
  const stripped = s.replace(/=+$/, '');
  const out = new Uint8Array(Math.floor(stripped.length * 6 / 8));
  let buf = 0, bits = 0, j = 0;
  for (let i = 0; i < stripped.length; i++) {
    const v = inv[stripped.charCodeAt(i)];
    if (v < 0) throw new Error('non-alphabet char');
    buf = (buf << 6) | v;
    bits += 6;
    if (bits >= 8) { bits -= 8; out[j++] = (buf >> bits) & 0xff; }
  }
  return out;
}

Performance characteristics mirror the encoder. A naive byte-at-a-time loop runs at a few hundred MB/s; SIMD-accelerated decoders process a 32-byte chunk per loop iteration and saturate memory bandwidth. The hot path is the lookup table: a 256-byte table fits in one cache line, so the decoder is normally bound by memory throughput, not arithmetic.

A common micro-optimisation: rather than rejecting non-alphabet characters with a branch, mark them with a sentinel value (negative) and check after the entire loop with one branchless OR-reduction over the chunk. The compiler vectorises this; the rejection happens once per 32-byte block instead of once per character.

Five ways a decode silently corrupts.

First, padding mismatch. JWT segments are URL-safe Base64 without padding. Plain atob() in the browser rejects them with InvalidCharacterError. The fix is the one this tool does for you: replace - _ with + /, then re-pad to a multiple of four with =.

Second, length-mod-4 = 1 inputs. A Base64 string can have length mod 4 of 0, 2, or 3 — never 1. If the length mod 4 is 1, the input is corrupt. Lenient decoders may silently truncate; strict decoders must throw. Always check.

Third, embedded whitespace. RFC 2045 (MIME) Base64 wraps every 76 characters; RFC 4648 emits one long line. Decoders that don't strip whitespace will fail on RFC 2045 input. The defensive move: strip \r \n \t before decoding.

Fourth, encoding confusion when the decoded bytes are then UTF-8 decoded. atob returns a "binary string" — one character per byte, with character codes 0–255 — not a string of UTF-8 characters. Calling atob(b64) on a JWT payload produces a string with the right bytes but wrong character semantics; you need new TextDecoder().decode(uint8Array) to get back proper UTF-8 text.

Fifth, BOM handling. UTF-8 byte-order marks (EF BB BF) at the start of decoded text confuse downstream parsers. Either strip the BOM after decoding, or use a UTF-8 decoder configured to ignore it (TextDecoder('utf-8', { ignoreBOM: false })).

Symptom	Likely cause	Fix
InvalidCharacterError in atob	URL-safe input with no padding	swap chars + repad
Decoded text has 0xEF 0xBB 0xBF prefix	UTF-8 BOM passed through	strip 3 bytes if present
Decode silently shorter than expected	length mod 4 = 1 (corrupt input)	reject; do not silently accept
Wrong characters after JSON parse	atob result not run through TextDecoder	use Uint8Array → TextDecoder('utf-8')
Failure on multi-line input	decoder doesn't strip whitespace	strip /\s/g before decoding

Decoding gigabytes without copying them.

The convenient one-liner — Buffer.from(big, 'base64') in Node, base64.b64decode(big) in Python — allocates a buffer the size of the entire decoded output. For a 1 GB Base64 input, that's about 750 MB of allocated bytes. Fine for desktop scripts; not fine inside an HTTP request handler that gets called a thousand times a second.

The streaming idiom decodes a chunk at a time, holding only a four-character window in memory. Node's stream.Transform, Go's base64.NewDecoder wrapping an io.Reader, Python's base64.decode(input_stream, output_stream), and the DecompressionStream Web API all support this. The catch: chunk boundaries can fall mid-Base64-group, so the decoder needs a small carry buffer (up to 3 chars) for the next chunk. All the standard libraries handle this correctly; if you write your own streaming decoder, the carry buffer is the only state you need.

A useful diagnostic: when you see "decoded payload looks fine on tiny inputs but corrupt above 64 KB" in production, the likely cause is exactly chunk-boundary mishandling — somebody's homegrown decoder that decodes each WebSocket frame independently without carrying the partial group across frames.

DoS via Base64 expansion

Decoding 4 input bytes produces 3 output bytes — a 25% shrink. So a malicious 100 KB Base64 string only expands to 75 KB. That's fine. But a 100 MB Base64 string expands to 75 MB, which can be enough to OOM a small service. Cap the maximum size of accepted Base64 inputs, or stream-decode with a memory budget.

If round-trip doesn't match, don't trust it.

The cheapest integrity check is a re-encode. Decode the input, encode it again with a strict canonical encoder, and compare to the original (after normalising padding and alphabet). If they don't match, the input was non-canonical: extra whitespace, alphabet-confusion, or something more deliberate. For tokens used in security decisions — JWTs, OAuth state, signed cookies — many libraries already do this on every verification. If yours doesn't, add it.

A worked example. A JWT signature is computed over the exact bytes of header.payload. If a service decodes header and payload, re-serialises them differently (changing key order, whitespace, or alphabet), and then verifies the signature against the new bytes, the signature won't match — even though it was technically valid for the original message. This is why JWT libraries operate on the original encoded segments, not their re-serialisations.

For non-security data — log inspection, debugging, exporting attachments — the round-trip check is overkill but cheap, so the convention in production tools is to do it anyway. The cost is a hundred microseconds; the benefit is detecting copy-paste corruption before it becomes a downstream mystery. This decoder doesn't run a round-trip check by default (the input panel updates as you type, so the cost would feel jittery), but a "verify canonical" toggle in a future revision would be easy to add.

Use case	Strictness	Validation cost worth it?
JWT verification	strict canonical	yes (security)
OAuth state parameter	strict	yes (security)
Image data URI in HTML	lenient	no (browser tolerates whitespace)
Log payload inspection	lenient	no (debug tooling)
S/MIME signature verification	strict	yes (cryptographic)
Pasted clipboard inspection	lenient	no (manual workflow)

Real payloads you'll decode this week.

JWTs are everywhere — every OAuth-protected API call carries one. The header (alg, typ, kid) is human-readable JSON; the payload (sub, iss, aud, exp, iat, custom claims) is also JSON; the signature is opaque bytes. Decoding the first two segments is the fastest way to see why a token is being rejected: was it the audience? the expiry? a missing claim? Note that you don't need the secret to read a JWT — only to verify it. That asymmetry is the source of a lot of "but I thought JWTs were encrypted" confusion.

Data URIs in HTML and CSS look like data:image/png;base64,iVBORw0KGgo…. The MIME type sits before the comma, the Base64 payload after. Decoders that drop the MIME prefix and decode the rest recover the original file bytes — useful for extracting embedded SVGs or favicons from CSS. Browsers cap data URI sizes (Chrome at ~32 MB, Firefox at ~30 MB) but most legitimate use is well under 100 KB.

PEM-armoured certificates are Base64-encoded DER inside a header/footer envelope: -----BEGIN CERTIFICATE-----, then a wrapped Base64 block, then -----END CERTIFICATE-----. Strip the lines starting with -----, concatenate the rest, decode, and you have a DER-encoded X.509 — which then needs an ASN.1 parser to decode further. The same envelope wraps RSA keys, EC keys, certificate chains, CSRs, and CRLs. The header line tells you which.

S/MIME and PGP messages encode the entire ciphertext as Base64 in the message body so it survives email transit. AWS Signature V4's x-amz-content-sha256 headers carry hex-encoded hashes, but other AWS APIs use Base64 — the convention varies by service. Slack's files.upload takes raw binary; Discord's takes Base64 in JSON. When integrating with a new API, the only reliable check is to round-trip a known test vector and confirm byte-for-byte equality.

Adjacent encodings worth knowing

Hex (Base16) is twice the size but trivially human-readable — preferred for hashes and short identifiers. Base32 (RFC 4648 §6) is case-insensitive and uses no symbols — preferred for OTP secrets, magnet links, and Onion v3 addresses. Base58 (Bitcoin) drops 0 O I l + / for visual unambiguity. Base85 (Adobe) and z85 (ZeroMQ) have lower overhead than Base64 at the cost of trickier alphabets. Each picked a different point on the readability/density curve.

Found this useful?