Tool

SQL format.

Token-based pretty printer for SELECT / INSERT / UPDATE / DELETE / WITH (CTE). Major clauses (SELECT, FROM, WHERE, GROUP BY, ORDER BY, joins) start on their own line; column lists wrap one-per-line. Keywords get uppercased. Pasted SQL stays in your tab.

Status
ok
In bytes
445
Out lines
12

Indent
Input SQL
Samples
Formatted
WITH active_users AS (SELECT id, email, plan FROM users WHERE active = TRUE AND created_at > '2024-01-01'),
recent_orders AS (SELECT user_id, sum (total_cents)AS spend FROM orders WHERE created_at > now () - interval '30 days' GROUP BY user_id)
SELECTu . id,
u . email,
u . plan,
coalesce (r . spend, 0)AS spend_30d
FROM active_users u
LEFT JOIN recent_orders r
ON r . user_id = u . id
WHERE u . plan IN ('pro', 'enterprise')
ORDER BYspend_30d DESC
LIMIT 50;

House-style debates about SQL.

Should keywords be UPPERCASE? Most style guides say yes — the visual contrast with snake_case identifiers makes structure pop. The C2 wiki and the SQL ANSI standard wrote keywords in caps as a typographic convention, and most database documentation still does. Modern PostgreSQL docs and books like SQL Antipatterns follow the same convention. The minority view (e.g. some Rails shops) treats SQL as just-another-string and keeps everything lowercase. Both work; pick one and let the formatter enforce it.

Should commas be leading or trailing? Leading commas (, name) make diffs cleaner when columns are added at the end — every line starts the same way. Trailing commas read more naturally and match every other language. Postgres allows trailing commas in SELECT from version 15 (2022); MySQL still doesn't. The formatter on this page emits trailing commas because it's the closer-to-natural-language convention.

Should JOIN conditions be on their own line? In multi-line queries, indenting ON by one level under JOIN lets you see what's joined to what at a glance — and pulls the join condition out of visual competition with the WHERE clause. The most readable variant puts the join key directly under the JOIN keyword and continues additional AND conditions on their own lines.

DecisionCommon conventionsTradeoff
Keyword caseUPPER vs loweruppercase boosts contrast; lowercase reads natural
Commasleading vs trailingleading helps diffs; trailing reads natural
Join layoutsame line vs ON on own lineown-line wins for 3+ joins
Indent size2 / 4 / tab2 lines up with JS conventions; 4 reads weighty
Subquery handlinginline vs CTECTEs read top-down; inline can nest unboundedly
Identifier quotingnone vs double quotesquoting required for reserved words
Format on save, not in review

Make the SQL formatter part of pre-commit hooks (sqlfluff, prettier-plugin-sql, pg_format) so reviewers never argue about whitespace. The diff stays focused on real changes; the team converges on one style instantly.

Lexing without a full grammar.

A real SQL parser is enormous — Postgres' parser is 100,000+ lines of C, supports hundreds of statement types, and threads through query planner statistics. A formatter doesn't need any of that. It just needs to recognise five categories of tokens — keywords, identifiers, literals, operators, punctuation — and to know which keywords start new clauses. Everything else is whitespace and indent rules.

The tokeniser walks the input character by character, classifying runs. A run of letters that matches the keyword set becomes a keyword token; otherwise an identifier. Quoted runs become string literals (with internal '' doubling preserved). Numeric runs become number literals. Single - followed by another - starts a line comment until newline. /* starts a block comment until */. Everything else falls into operator or punctuation buckets.

SELECT u.id, u.email                 -- comment
  FROM users u
  LEFT JOIN orders o ON o.user_id = u.id
 WHERE u.created_at >= '2024-01-01'
   AND u.plan IN ('pro', 'enterprise')
 ORDER BY u.id;

-- token stream:
-- KW(SELECT) ID(u) OP(.) ID(id) , ID(u) OP(.) ID(email) CMT(--…)
-- KW(FROM) ID(users) ID(u) KW(LEFT) KW(JOIN) ID(orders) ID(o)
-- KW(ON) ID(o) OP(.) ID(user_id) OP(=) ID(u) OP(.) ID(id)
-- KW(WHERE) ID(u) OP(.) ID(created_at) OP(>=) STR('2024-01-01')
-- KW(AND) ID(u) OP(.) ID(plan) KW(IN) ( STR('pro') , STR('enterprise') )
-- KW(ORDER) KW(BY) ID(u) OP(.) ID(id) ;

The formatter then walks the token stream looking for clause heads — SELECT, FROM, WHERE, JOIN, GROUP BY, etc. — and emits a newline plus indent before each. Inside SELECT, commas at the top depth break to their own line. Parentheses bump the indent depth so subqueries are visually offset. That's the whole algorithm; everything else is bookkeeping.

Five dialects, one formatter.

SQL ANSI is a 1986 standard updated every few years; SQL:2023 is the current revision. Every database engine implements a subset plus its own extensions. Postgres has RETURNING, DISTINCT ON, LATERAL joins, full window functions, :: casting, $$ dollar-quoted strings. MySQL has REPLACE INTO, ON DUPLICATE KEY UPDATE, the ||-as-OR mode (vs ANSI's ||-as-concat). SQLite is permissive about types. SQL Server has TOP instead of LIMIT, square-bracket identifier quoting, [dbo] schema qualifiers. BigQuery has QUALIFY, struct types, UNNEST, and array literals.

A token-based formatter handles all of these because it treats unknown keywords as identifiers and unknown punctuation as operators. The output may not be perfectly idiomatic for every dialect — Postgres' RETURNING deserves its own line, but a formatter that doesn't recognise it will run it inline. Round-tripping the formatted query through the database (the only true correctness check) is recommended after any reformatting on production code.

FeaturePostgresMySQLSQL ServerBigQuery
LimitLIMIT nLIMIT nTOP nLIMIT n
Identifier quote"x"`x`[x]`x`
Concatenation||CONCAT()+CONCAT()
Castx::intCASTCASTCAST
UPSERTON CONFLICTON DUPLICATE KEYMERGEMERGE
ReturningRETURNINGOUTPUT

Where a token-based formatter gives up.

A few classes of input expose the limits of pure-token formatting. Heavily nested expressions — CASE WHEN x THEN (subquery) ELSE … — need semantic understanding to align cleanly. Window function frames — ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — read better when broken across lines, but the formatter has no way to know unless it tracks specific keyword sequences. Multi-statement scripts that depend on session state (variables, transactions) sometimes have semicolons that aren't statement terminators (e.g. inside a stored procedure body). The formatter on this page handles each well enough but isn't perfect.

For reference, the dedicated SQL formatters get this right by parsing into an AST. sqlfluff uses a Python-based dialect-aware parser; pg_format ships with PostgreSQL and uses Postgres' own parser; sqlglot provides a portable AST across dialects with rewriting rules. They're heavier dependencies but produce demonstrably better output on complex queries. For everyday formatting needs — inline statements in code reviews, debugging extractions from app logs, copy-paste from a query result — the token-based approach this tool uses is good enough and runs locally without server round-trips.

A useful pattern: format with a quick token-based tool while iterating, run the dialect-aware formatter at PR time, and let CI block PRs whose SQL files don't match the canonical format. Each tool plays to its strengths; nobody waits on a slow parser during interactive editing.

Bugs hide in the unformatted query.

A 200-line SQL query crammed into a single line is a security bug waiting to happen. The classic vector: a developer needs to add a condition, scrolls to find WHERE, types AND user_id = ? at the wrong place, and accidentally lands inside a subquery. The query parses fine, runs fine on small data, and quietly returns wrong results in production. Formatted SQL — clauses on their own lines, indentation showing nesting — makes the right insertion point obvious.

Performance bugs hide too. SELECT * FROM users WHERE id IN (huge list) runs slowly because the planner can't push the predicate. Reformatting reveals the IN clause as a candidate for refactoring into a join. Missing indexes show up the same way: a WHERE clause that looks innocuous on one line reveals a function call on a column when reformatted (WHERE LOWER(email) = ?) — the function defeats the index, and the formatter's clean indent makes the issue visible.

A nontrivial fraction of "this query was fine yesterday" incidents trace back to a developer modifying a long inline SQL string and changing semantics in a way that wasn't visually obvious. Formatting in source control would have caught most of them at code review. This is the case for storing formatted SQL — not just formatting it for display — so version-controlled diffs are meaningful.

RiskSymptomHow formatting helps
Boolean-logic mistakewrong rows returnedparens / indent reveal precedence
Missing indexslow queryfunction calls on columns visible
Cartesian joinrow count explodesmissing ON clause shows up indented
WHERE in wrong subqueryincorrect filterindent shows nesting depth
NULLs in IN listsilently empty resultslist contents on own lines
Forgotten aliasambiguous columnFROM clause clarifies tables

SQL tooling worth knowing.

Beyond formatters, the SQL tooling ecosystem has matured significantly in the last five years. sqlfluff is the closest equivalent to eslint — it lints rule violations (unused CTEs, missing aliases, ambiguous selects), supports custom rules, integrates with most editors. sqlglot is a portable AST library that round-trips between dialects (read MySQL, emit Postgres, etc.) — invaluable for cross-database migrations. pgFormatter ships with Postgres and produces dialect-aware formatting that's hard to beat for Postgres-only shops.

For interactive query exploration, DBeaver, TablePlus, and Postico all format on demand and offer live syntax checking. DBT uses sqlfluff as its standard linter for analytics-engineering pipelines. Code-search platforms like Sourcegraph and GitHub recognise SQL syntax and let you grep across dialects. The era when "SQL is just a string in your code" is over for serious teams; the queries deserve the same tooling investment as application code.

The cultural shift worth noting: SQL used to be hidden inside ORMs because it was considered too dangerous to write directly. The pendulum has swung. Modern teams write raw SQL in version-controlled files, generate types from the database schema, and treat queries as first-class artefacts with tests. Formatters like this one are the smallest piece of that ecosystem — but the entry point. Once you're formatting consistently, linting, type generation, and contract testing follow naturally.

Adjacent

For schema migrations, see Isolation Levels and Transactions. For query planning depth, the Database Sharding simulator covers when a single-machine query plan stops being enough.

Half a century of querying tables.

SQL came out of IBM's System R project in 1974, originally as SEQUEL — Structured English Query Language — designed by Donald Chamberlin and Raymond Boyce as a friendlier interface to the relational algebra E.F. Codd had formalised four years earlier. The name shortened to SQL when the trademark for SEQUEL turned out to be already in use. Oracle shipped the first commercial implementation in 1979; IBM's DB2 followed in 1983. ANSI standardised SQL in 1986, ISO in 1987. The language has been continuously revised since — SQL-92 added outer joins and subqueries, SQL:1999 added recursive CTEs and triggers, SQL:2003 added window functions, SQL:2016 added JSON, SQL:2023 added property graphs.

Despite half a century of revisions, the core query is largely unchanged: SELECT … FROM … WHERE … GROUP BY … HAVING … ORDER BY … LIMIT …. The clauses execute in a different logical order than they're written (FROM first, then WHERE, then GROUP BY, then HAVING, then SELECT, then ORDER BY, then LIMIT), which is the source of countless "why doesn't my alias work in WHERE" confusions. Every formatter on this list preserves the lexical ordering — there's no formatter that reorders clauses to logical order — because preserving the exact text the developer wrote is more important than correctness pedagogy.

The "NoSQL" movement of the late 2000s argued SQL's relational model wouldn't scale. The pendulum swung partway back when most NoSQL stores added SQL interfaces (Cassandra's CQL, MongoDB's $lookup aggregation pipeline, DynamoDB's PartiQL). The new entrants — DuckDB, ClickHouse, Snowflake, BigQuery — all expose SQL as the primary interface, optimised for analytical workloads instead of OLTP. Half a century later, SQL remains the lingua franca for asking questions of structured data, and that durability is exactly why investing in tooling — formatters, linters, AST manipulation, schema-aware editors — pays off.

Found this useful?