Tool

Base64 encode.

Pasted text → Base64 alphabet. UTF-8 safe; toggle URL-safe variant for use in URLs and JWTs. Everything runs locally; nothing leaves the tab.

Input bytes

Output bytes

Overhead

33%

Plain text

Samples

Base64 output

SGVsbG8sIHdvcmxkISDwn5GL

Encoding bytes for text channels.

Base64 was standardised in RFC 1421 (1993) as part of PEM — Privacy-Enhanced Mail. The motivating use case was clear: send binary attachments through email systems that only accepted printable ASCII. Mail relays of the era stripped or mangled the eighth bit of every byte; some converted CRLF to bare LF or wrapped lines at 80 columns; some interpreted certain bytes as control codes. Anything binary needed an envelope that survived a pipeline of paranoid hand-offs. Three decades later, the same concern is still alive: HTTP headers, JSON values, URL parameters, JWT signatures, data URIs, S/MIME attachments, X.509 certificates — every binary payload that crosses a text channel passes through Base64.

The encoding is mechanical. Take three bytes — 24 bits — and split them into four 6-bit groups. Each group is a number from 0 to 63, which indexes into a 64-character alphabet: A–Z a–z 0–9 + /. The output is exactly 4/3 the size of the input, rounded up to the nearest multiple of 4 (that's where the = padding comes from — to fill out the last group when the input length isn't a multiple of 3). The 33% overhead is the price of being able to fit any byte sequence into a 7-bit ASCII text stream.

Input length	Encoded length	Padding
3n bytes	4n chars	none
3n + 1 bytes	4n + 4 chars	==
3n + 2 bytes	4n + 4 chars	=

This is why a Base64 string's length is always a multiple of 4, and why decoders must understand padding to recover the original byte count. URL-safe Base64 drops the padding (it's recoverable from string length mod 4) and swaps + / for - _ so the encoded value can sit inside a URL or HTTP header without further percent-encoding.

Five flavours of the same idea.

"Base64" is a family. RFC 4648 — the modern reference, replacing RFC 1421/2045/3548 — formalises five variants. They differ in the last two alphabet characters and in how strict they are about padding. Picking the wrong one will round-trip cleanly within your own system but break when a counterparty uses a different decoder.

Variant	Last two chars	Padding	Used by
Standard (RFC 4648 §4)	+ /	required =	MIME, PEM, S/MIME, OpenSSL -base64
URL-safe (RFC 4648 §5)	- _	optional	JWT, JWS, JWE, JWK, OAuth state
Modified Base64 for filenames	+ ,	none	IMAP UTF-7 mailbox names (RFC 3501)
YEnc-adjacent	varies	varies	Usenet binaries (rare today)
Bcrypt's $-base64	. /	none	OpenBSD bcrypt password hashes

For everything you're likely to build today, the choice collapses to two: standard or URL-safe. Standard goes inside MIME bodies, PEM-armoured certificates, and SQL bytea dumps. URL-safe is mandatory for anything that travels in a URL or HTTP header — JWTs, OAuth state, signed S3 GET URLs, push-notification tokens. The split exists because + in a URL is interpreted as a literal space (a legacy of application/x-www-form-urlencoded) and / divides path segments.

Trivia: there's also Base32 and Base85

Base32 (RFC 4648 §6) is case-insensitive and uses only A–Z + 2–7 — useful when humans need to read the encoding aloud. Base85 (Adobe's Ascii85, used in PostScript and PDF) packs 4 bytes into 5 chars instead of 3-into-4, giving 25% overhead instead of 33%, but expanding the alphabet to 85 characters that don't all play nice with shells and headers.

How the bits actually re-pack.

A worked example. Encode the three ASCII bytes M a n — values 77, 97, 110 — which is also the canonical example from RFC 4648. As bits, that's 01001101 01100001 01101110. Regroup into 6-bit chunks: 010011 010110 000101 101110. Convert each chunk to its alphabet character: index 19 is T, 22 is W, 5 is F, 46 is u. Result: TWFu. Three bytes in, four chars out, no padding — the input length was a clean multiple of 3.

When the input isn't a multiple of 3, the encoder right-pads with zero bits to fill the last 6-bit group, then appends = characters to make the output a multiple of 4. Ma (two bytes) encodes to TWE=; M (one byte) encodes to TQ==. The = is not part of the alphabet — it's a visual marker telling the decoder how many trailing zero bits to ignore.

function encode(bytes) {
  const A = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
  let out = '';
  for (let i = 0; i < bytes.length; i += 3) {
    const b1 = bytes[i], b2 = bytes[i + 1] ?? 0, b3 = bytes[i + 2] ?? 0;
    const n = (b1 << 16) | (b2 << 8) | b3;
    out += A[(n >> 18) & 63] + A[(n >> 12) & 63];
    out += i + 1 < bytes.length ? A[(n >> 6) & 63] : '=';
    out += i + 2 < bytes.length ? A[n & 63] : '=';
  }
  return out;
}

A naive implementation processes one input byte at a time and accumulates a bit buffer. A fast implementation works on 6-byte (48-bit) blocks because 48 is the LCM of 8 and 6 — eight bytes-in produce eight chars-out per loop, and CPUs can pull a 64-bit word with one load. Highly tuned encoders use SIMD to encode dozens of bytes per cycle: Intel's _mm256_shuffle_epi8-based encoders measure at over 6 GB/s on modern x86, faster than memcpy on many systems.

Five ways Base64 will bite you.

First, line wrapping. RFC 2045 (MIME) requires Base64 lines to wrap at 76 characters. RFC 4648 says no wrapping unless explicitly requested. Most modern uses (JWT, JSON, data URIs) follow 4648 and emit one long line. But OpenSSL's -base64, base64 coreutils, and Python's base64.encodestring still wrap by default. Mixing wrapping conventions across services means some decoders see embedded newlines and either accept, ignore, or reject them. Always strip whitespace before decoding.

Second, the padding-vs-no-padding mismatch. JWT headers and payloads are URL-safe Base64 without padding. Browser atob() requires padding. Mismatched expectations are the single most common JWT decode bug — the fix is always the same three lines: replace - _ with + /, then re-pad the string to a multiple of 4 with =.

Third, character-set confusion. Some pipelines apply HTML escaping after Base64 encoding; the encoded string contains + and /, which survive HTML escaping unchanged but break when re-decoded by a tolerant parser that interprets + as a space. The fix is consistency: always know which Base64 variant your data is in, and don't apply additional encodings on top of it without round-tripping a test vector.

Fourth, the size-limit surprise. A photo at 2 MB becomes 2.66 MB once Base64-encoded. Pasted as a data URI into HTML and gzipped over the wire, the gzip-compression ratio recovers most of the overhead. But Base64ed JSON inside Postgres text columns or in DynamoDB string attributes pays the full 33% storage cost forever, with no compression. For binary at scale, store as bytea / B-typed binary and encode only on retrieval.

Fifth, timing-safe comparison. Comparing two Base64-encoded strings byte-by-byte with === reveals timing information about which prefix matched. For HMAC tags, signatures, and tokens, decode both sides and compare the underlying bytes with a constant-time function (crypto.timingSafeEqual in Node, hmac.compare_digest in Python). String comparison short-circuits on the first mismatched character; constant-time comparison processes the entire input regardless.

A 33% tax that isn't always worth paying.

Base64 exists because most pre-2000 protocols couldn't carry arbitrary bytes. Today, that constraint is much rarer. HTTP/1.1 bodies, gRPC frames, WebSocket binary frames, S3 object content, and Postgres bytea columns are all 8-bit clean. If both ends of a channel speak binary, sending Base64 over them is paying overhead for a service nobody asked for.

The encoding earns its keep in three places: (1) crossing into text-only environments — JSON values, XML attributes, URL parameters, log lines, environment variables; (2) embedding in source-code artefacts — config files, git commits, CI templates, where the artefact is intended to be human-readable; (3) crypto data structures with text grammars — JWTs, JOSE, X.509 PEM. Outside these cases, prefer the binary path. A common anti-pattern: services Base64-encoding image uploads to put them in a JSON field on a queue, when the queue accepts binary natively. The Base64 round-trip costs CPU at every encoder/decoder hop, costs memory in object stores, and complicates debugging — none of which pay back unless something downstream actually requires text.

Situation	Base64 makes sense?	Better alternative if not
Embedding a small image in HTML / CSS	yes	—
JWT / OAuth / signed cookie payloads	yes (URL-safe)	—
Storing binary in JSON over HTTPS	maybe	multipart/form-data
Sending images via gRPC	no	bytes field directly
Storing photos in Postgres	no	bytea column or S3 with URL ref
Logging hashes in plain text	maybe	hex (1 char per 4 bits, easier to read)
Queue payloads with binary fields	no	protobuf / MessagePack / Avro

Base64 isn't encryption

A surprising amount of "encoded" data in the wild is plain Base64 mistaken for an obfuscation layer. Anyone with a decoder reads it instantly. Use Base64 for transport; use AES-GCM, ChaCha20-Poly1305, or libsodium's secretbox for confidentiality. The two layers stack — encrypt then encode — because most ciphertext also needs to traverse text channels.

Speed, memory, and side channels.

A naive Base64 encoder runs at maybe 200–400 MB/s on a modern CPU. The fastest published implementations — Wojciech Muła and Daniel Lemire's AVX2 routines used in Chromium and Node — exceed 6 GB/s for both encoding and decoding. The trick is processing a 32-byte block per loop iteration: load with one aligned read, do the bit-shuffling with two vpshufb instructions and a multiply-add, store with one aligned write. The control flow is branchless except for the final tail handling.

For the typical web service, none of this matters: the Base64 encoder is a rounding error next to the database query and the JSON serialiser. It does matter when you're processing image uploads at scale, when a single request encodes a multi-megabyte certificate chain, or when a streaming pipeline Base64s every event flowing through. In those cases the difference between a hand-rolled JavaScript loop (~50 MB/s in V8) and Node's built-in Buffer.from(x).toString('base64') (~2 GB/s, calling into native code) is two orders of magnitude.

Memory matters too. Encoding a 1 GB file with Buffer.toString('base64') allocates a 1.33 GB string in one go — fine until it isn't. For large inputs, stream through a Base64Encode transform that emits 76-character (or arbitrary) chunks as the source streams in, never materialising the whole encoded payload at once. Node's stream.Transform, Go's base64.NewEncoder wrapping an io.Writer, and Python's base64.encodebytes with a BytesIO sink all support this idiom.

Implementation	Throughput	Notes
JavaScript hand-rolled (V8)	~50 MB/s	byte-at-a-time loop, no SIMD
Node Buffer.toString	~2 GB/s	native libuv routine
Browser btoa()	~500 MB/s	bytes only via TextEncoder hop
Go encoding/base64	~1.5 GB/s	SSSE3 path on amd64
Muła–Lemire AVX2	6+ GB/s	32-byte block, 2× vpshufb

On the security side, three classes of bug show up regularly. Decoder-tolerance mismatches: a permissive decoder on the receiver accepts inputs the sender's strict encoder would never produce, opening parser-differential attacks (the "Smuggler" class of bugs in JOSE libraries). Constant-time problems: comparing decoded bytes with regular memcmp leaks timing information about the first byte mismatch. Length-extension confusion: attackers can append == padding to a base64 token in some implementations, making two distinct payloads decode to the same bytes. None of these are Base64's fault — they're all caused by libraries that round-trip without strict validation. The defence is to use a single canonical encoder/decoder library across your stack and to validate the canonical re-encoding matches the input on the strict path.

Found this useful?