Strings & runes

A Go string is an immutable byte sequence, typically UTF-8. A rune is a single Unicode code point — an alias for int32. The two confuse newcomers: s[i] returns a byte (uint8), not a character. for i, r := range s iterates by runes. Once that distinction is reflex, half of Unicode bugs disappear.

1 · The intuition

Internally, a Go string is just a pointer + length. The bytes are immutable — you can't s[0] = 'x'. The encoding is "whatever you put in", but the standard library universally assumes UTF-8.

A rune is a 32-bit Unicode code point. The character é is one rune but two bytes in UTF-8. The character "⌘" is one rune but three bytes. This is why indexing returns a byte and ranging returns runes.

2 · Try it — bytes vs runes

go main.go · the difference exposed

123456789101112131415161718192021

package main

import "fmt"

func main() {
    s := "héllo"
    fmt.Println("len:", len(s))  // 6 — bytes, not runes

    // Indexing returns a byte
    fmt.Printf("s[1] = %d (%c)
", s[1], s[1])  // partial bytes of é

    // Range iterates by runes — i is the BYTE index
    for i, r := range s {
        fmt.Printf("  i=%d r=%U (%c)
", i, r, r)
    }

    // Counting runes
    fmt.Println("runes:", len([]rune(s)))  // 5
}

The key insight. i in the range loop is the byte offset, not the rune index. After é (2 bytes), the next index is 3, not 2. If you need rune positions, build them yourself or use utf8.RuneCountInString.

3 · The strings package — the toolkit

go main.go · stdlib strings

1234567891011121314151617181920

package main

import (
    "fmt"
    "strings"
)

func main() {
    s := "  Hello, Go!  "

    fmt.Println(strings.TrimSpace(s))
    fmt.Println(strings.ToLower(s))
    fmt.Println(strings.Replace(s, "Go", "world", -1))
    fmt.Println(strings.Contains(s, "Hello"))
    fmt.Println(strings.Split("a,b,c,d", ","))
    fmt.Println(strings.Join([]string{"a", "b", "c"}, "-"))
    fmt.Println(strings.HasPrefix(s, "  H"))
    fmt.Println(strings.Index(s, "Go"))
    fmt.Println(strings.Repeat("ab", 3))
}

4 · strings.Builder — efficient concatenation

go main.go · Builder for hot paths

123456789101112131415161718192021222324252627

package main

import (
    "fmt"
    "strings"
)

func main() {
    // The wrong way — quadratic, allocates on every +=
    s := ""
    for i := 0; i < 5; i++ {
        s += "x"  // each iteration copies the whole string
    }
    fmt.Println(s)

    // The right way — linear with strings.Builder
    var b strings.Builder
    b.Grow(1024)  // pre-allocate if you know rough size
    for i := 0; i < 5; i++ {
        b.WriteString("x")
    }
    fmt.Println(b.String())

    // For just a few small strings, += is fine. The threshold
    // for Builder being worth it is roughly: 10+ concatenations
    // OR strings totaling more than ~1KB.
}

5 · fmt formatting — the verbs

Verb	For	Example
`%v`	default format	`fmt.Printf("%v", anything)`
`%+v`	structs with field names	`{Name:Alice Age:30}`
`%#v`	Go-syntax representation	`main.User{Name:"Alice"}`
`%T`	type	`*main.User`
`%d` / `%b` / `%x`	int, binary, hex	`255` / `11111111` / `ff`
`%f` / `%e` / `%g`	float (decimal, scientific, shortest)	`3.140000`
`%s` / `%q`	string, quoted	`hello` / `"hello"`
`%c` / `%U`	rune as char, Unicode codepoint	`⌘` / `U+2318`
`%p`	pointer	`0xc000010210`
`%w`	wrap error (in Errorf)	(see errors page)

6 · strconv — strings ↔ numbers

go main.go · strconv basics

1234567891011121314151617181920212223242526

package main

import (
    "fmt"
    "strconv"
)

func main() {
    // String to int
    n, err := strconv.Atoi("42")
    if err == nil { fmt.Println(n) }

    // Int to string
    s := strconv.Itoa(123)
    fmt.Println(s)

    // String to float
    f, _ := strconv.ParseFloat("3.14", 64)
    fmt.Println(f)

    // Int to string in a specific base
    fmt.Println(strconv.FormatInt(255, 16))  // ff

    // Quote / unquote a string (for safe printing)
    fmt.Println(strconv.Quote("hello	world"))  // "hello	world"
}

7 · From the wild

go unicode/utf8 · the rune iteration primitive

1234567891011

// DecodeRuneInString unpacks the first UTF-8 encoding in s and returns
// the rune and its width in bytes.
func DecodeRuneInString(s string) (r rune, size int)

// This is the building block under "range over string". You almost never
// call it directly — but knowing it exists helps when you need to walk a
// string by runes manually (e.g. for cursor positions in a text editor).

// Bytes → runes conversion
func RuneCountInString(s string) int       // count of code points
func ValidString(s string) bool            // is valid UTF-8?

From the wild: go standard library · BSD-3-Clause

8 · Coming from another language?

Language	String model
Python 3	str = sequence of Unicode code points. Go is sequence of bytes. `len()` differs.
Java	String = UTF-16 char array. Code points may use surrogate pairs. Go avoids the surrogate complexity — UTF-8 is variable but single-stream.
JavaScript	Similar to Java — UTF-16. `"é"` has length 1; in Go, length 2.
Rust	Strings are valid UTF-8, indexed by bytes. Closest match to Go. Both expose `chars()` / `range` for code point iteration.
C	Null-terminated byte arrays. Go's strings know their length — no buffer overrun bugs.

9 · Common mistakes

Indexing assuming character = byte. s[0] is the first byte, which may be a partial UTF-8 sequence.
Using len(s) for character count. It's the byte count. For runes: utf8.RuneCountInString(s) or len([]rune(s)).
Concatenating in a loop without Builder. Quadratic. Reach for strings.Builder or pre-size with strings.Join.
Comparing strings with case-sensitive default. strings.EqualFold is the case-insensitive helper.
Treating a string as a slice of runes interchangeably. The conversion []rune(s) allocates. Use range when iterating once.

10 · Exercises (~10 min)

Rune count. Write a function that takes a string and returns (bytes, runes). Test on "héllo", "日本語", "hello".
Reverse a string by runes. Convert to []rune, swap in place, convert back. Test on "héllo".
Builder vs +=. Time both for 10000 concatenations. Report the ratio.
Word count. Read input, split on whitespace, print top-3 most common words. Use strings.Fields and a map.

11 · When it clicks

You never write for i := 0; i < len(s); i++ when characters matter.
You reach for strings.Builder as soon as you see concatenation in a loop.
You distinguish %v, %+v, %#v by reflex.
You can predict the byte index after a multi-byte rune in a range loop.

Next · Day 6 / Concept 16

io.Reader and io.Writer

Two interfaces underpin everything. Master them once.

Next concept

Found this useful?