15 / 20 · Day 6
Day 6 · Concept 15

Strings & runes

A Go string is an immutable byte sequence, typically UTF-8. A rune is a single Unicode code point — an alias for int32. The two confuse newcomers: s[i] returns a byte (uint8), not a character. for i, r := range s iterates by runes. Once that distinction is reflex, half of Unicode bugs disappear.


1 · The intuition

Internally, a Go string is just a pointer + length. The bytes are immutable — you can't s[0] = 'x'. The encoding is "whatever you put in", but the standard library universally assumes UTF-8.

A rune is a 32-bit Unicode code point. The character é is one rune but two bytes in UTF-8. The character "⌘" is one rune but three bytes. This is why indexing returns a byte and ranging returns runes.

2 · Try it — bytes vs runes

go main.go · the difference exposed
package main

import "fmt"

func main() {
    s := "héllo"
    fmt.Println("len:", len(s))  // 6 — bytes, not runes

    // Indexing returns a byte
    fmt.Printf("s[1] = %d (%c)
", s[1], s[1])  // partial bytes of é

    // Range iterates by runes — i is the BYTE index
    for i, r := range s {
        fmt.Printf("  i=%d r=%U (%c)
", i, r, r)
    }

    // Counting runes
    fmt.Println("runes:", len([]rune(s)))  // 5
}
The key insight. i in the range loop is the byte offset, not the rune index. After é (2 bytes), the next index is 3, not 2. If you need rune positions, build them yourself or use utf8.RuneCountInString.

3 · The strings package — the toolkit

go main.go · stdlib strings
package main

import (
    "fmt"
    "strings"
)

func main() {
    s := "  Hello, Go!  "

    fmt.Println(strings.TrimSpace(s))
    fmt.Println(strings.ToLower(s))
    fmt.Println(strings.Replace(s, "Go", "world", -1))
    fmt.Println(strings.Contains(s, "Hello"))
    fmt.Println(strings.Split("a,b,c,d", ","))
    fmt.Println(strings.Join([]string{"a", "b", "c"}, "-"))
    fmt.Println(strings.HasPrefix(s, "  H"))
    fmt.Println(strings.Index(s, "Go"))
    fmt.Println(strings.Repeat("ab", 3))
}

4 · strings.Builder — efficient concatenation

go main.go · Builder for hot paths
package main

import (
    "fmt"
    "strings"
)

func main() {
    // The wrong way — quadratic, allocates on every +=
    s := ""
    for i := 0; i < 5; i++ {
        s += "x"  // each iteration copies the whole string
    }
    fmt.Println(s)

    // The right way — linear with strings.Builder
    var b strings.Builder
    b.Grow(1024)  // pre-allocate if you know rough size
    for i := 0; i < 5; i++ {
        b.WriteString("x")
    }
    fmt.Println(b.String())

    // For just a few small strings, += is fine. The threshold
    // for Builder being worth it is roughly: 10+ concatenations
    // OR strings totaling more than ~1KB.
}

5 · fmt formatting — the verbs

VerbForExample
%vdefault formatfmt.Printf("%v", anything)
%+vstructs with field names{Name:Alice Age:30}
%#vGo-syntax representationmain.User{Name:"Alice"}
%Ttype*main.User
%d / %b / %xint, binary, hex255 / 11111111 / ff
%f / %e / %gfloat (decimal, scientific, shortest)3.140000
%s / %qstring, quotedhello / "hello"
%c / %Urune as char, Unicode codepoint / U+2318
%ppointer0xc000010210
%wwrap error (in Errorf)(see errors page)

6 · strconv — strings ↔ numbers

go main.go · strconv basics
package main

import (
    "fmt"
    "strconv"
)

func main() {
    // String to int
    n, err := strconv.Atoi("42")
    if err == nil { fmt.Println(n) }

    // Int to string
    s := strconv.Itoa(123)
    fmt.Println(s)

    // String to float
    f, _ := strconv.ParseFloat("3.14", 64)
    fmt.Println(f)

    // Int to string in a specific base
    fmt.Println(strconv.FormatInt(255, 16))  // ff

    // Quote / unquote a string (for safe printing)
    fmt.Println(strconv.Quote("hello	world"))  // "hello	world"
}

7 · From the wild

go unicode/utf8 · the rune iteration primitive
// DecodeRuneInString unpacks the first UTF-8 encoding in s and returns
// the rune and its width in bytes.
func DecodeRuneInString(s string) (r rune, size int)

// This is the building block under "range over string". You almost never
// call it directly — but knowing it exists helps when you need to walk a
// string by runes manually (e.g. for cursor positions in a text editor).

// Bytes → runes conversion
func RuneCountInString(s string) int       // count of code points
func ValidString(s string) bool            // is valid UTF-8?
From the wild: go standard library · BSD-3-Clause

8 · Coming from another language?

LanguageString model
Python 3str = sequence of Unicode code points. Go is sequence of bytes. len() differs.
JavaString = UTF-16 char array. Code points may use surrogate pairs. Go avoids the surrogate complexity — UTF-8 is variable but single-stream.
JavaScriptSimilar to Java — UTF-16. "é" has length 1; in Go, length 2.
RustStrings are valid UTF-8, indexed by bytes. Closest match to Go. Both expose chars() / range for code point iteration.
CNull-terminated byte arrays. Go's strings know their length — no buffer overrun bugs.

9 · Common mistakes

  • Indexing assuming character = byte. s[0] is the first byte, which may be a partial UTF-8 sequence.
  • Using len(s) for character count. It's the byte count. For runes: utf8.RuneCountInString(s) or len([]rune(s)).
  • Concatenating in a loop without Builder. Quadratic. Reach for strings.Builder or pre-size with strings.Join.
  • Comparing strings with case-sensitive default. strings.EqualFold is the case-insensitive helper.
  • Treating a string as a slice of runes interchangeably. The conversion []rune(s) allocates. Use range when iterating once.

10 · Exercises (~10 min)

  1. Rune count. Write a function that takes a string and returns (bytes, runes). Test on "héllo", "日本語", "hello".
  2. Reverse a string by runes. Convert to []rune, swap in place, convert back. Test on "héllo".
  3. Builder vs +=. Time both for 10000 concatenations. Report the ratio.
  4. Word count. Read input, split on whitespace, print top-3 most common words. Use strings.Fields and a map.

11 · When it clicks

  • You never write for i := 0; i < len(s); i++ when characters matter.
  • You reach for strings.Builder as soon as you see concatenation in a loop.
  • You distinguish %v, %+v, %#v by reflex.
  • You can predict the byte index after a multi-byte rune in a range loop.
Found this useful?