Day 6 · Concept 15
Strings & runes
A Go string is an immutable byte sequence, typically UTF-8. A rune is a single
Unicode code point — an alias for int32. The two confuse newcomers:
s[i] returns a byte (uint8), not a character. for i, r := range s
iterates by runes. Once that distinction is reflex, half of Unicode bugs disappear.
1 · The intuition
Internally, a Go string is just a pointer + length. The bytes are immutable —
you can't s[0] = 'x'. The encoding is "whatever you put in", but
the standard library universally assumes UTF-8.
A rune is a 32-bit Unicode code point. The character é is one rune
but two bytes in UTF-8. The character "⌘" is one rune but three
bytes. This is why indexing returns a byte and ranging returns runes.
2 · Try it — bytes vs runes
package main
import "fmt"
func main() {
s := "héllo"
fmt.Println("len:", len(s)) // 6 — bytes, not runes
// Indexing returns a byte
fmt.Printf("s[1] = %d (%c)
", s[1], s[1]) // partial bytes of é
// Range iterates by runes — i is the BYTE index
for i, r := range s {
fmt.Printf(" i=%d r=%U (%c)
", i, r, r)
}
// Counting runes
fmt.Println("runes:", len([]rune(s))) // 5
}The key insight.
i in the range loop is the
byte offset, not the rune index. After é (2 bytes), the
next index is 3, not 2. If you need rune positions, build them yourself or use
utf8.RuneCountInString.3 · The strings package — the toolkit
package main
import (
"fmt"
"strings"
)
func main() {
s := " Hello, Go! "
fmt.Println(strings.TrimSpace(s))
fmt.Println(strings.ToLower(s))
fmt.Println(strings.Replace(s, "Go", "world", -1))
fmt.Println(strings.Contains(s, "Hello"))
fmt.Println(strings.Split("a,b,c,d", ","))
fmt.Println(strings.Join([]string{"a", "b", "c"}, "-"))
fmt.Println(strings.HasPrefix(s, " H"))
fmt.Println(strings.Index(s, "Go"))
fmt.Println(strings.Repeat("ab", 3))
}4 · strings.Builder — efficient concatenation
package main
import (
"fmt"
"strings"
)
func main() {
// The wrong way — quadratic, allocates on every +=
s := ""
for i := 0; i < 5; i++ {
s += "x" // each iteration copies the whole string
}
fmt.Println(s)
// The right way — linear with strings.Builder
var b strings.Builder
b.Grow(1024) // pre-allocate if you know rough size
for i := 0; i < 5; i++ {
b.WriteString("x")
}
fmt.Println(b.String())
// For just a few small strings, += is fine. The threshold
// for Builder being worth it is roughly: 10+ concatenations
// OR strings totaling more than ~1KB.
}5 · fmt formatting — the verbs
| Verb | For | Example |
|---|---|---|
%v | default format | fmt.Printf("%v", anything) |
%+v | structs with field names | {Name:Alice Age:30} |
%#v | Go-syntax representation | main.User{Name:"Alice"} |
%T | type | *main.User |
%d / %b / %x | int, binary, hex | 255 / 11111111 / ff |
%f / %e / %g | float (decimal, scientific, shortest) | 3.140000 |
%s / %q | string, quoted | hello / "hello" |
%c / %U | rune as char, Unicode codepoint | ⌘ / U+2318 |
%p | pointer | 0xc000010210 |
%w | wrap error (in Errorf) | (see errors page) |
6 · strconv — strings ↔ numbers
package main
import (
"fmt"
"strconv"
)
func main() {
// String to int
n, err := strconv.Atoi("42")
if err == nil { fmt.Println(n) }
// Int to string
s := strconv.Itoa(123)
fmt.Println(s)
// String to float
f, _ := strconv.ParseFloat("3.14", 64)
fmt.Println(f)
// Int to string in a specific base
fmt.Println(strconv.FormatInt(255, 16)) // ff
// Quote / unquote a string (for safe printing)
fmt.Println(strconv.Quote("hello world")) // "hello world"
}7 · From the wild
// DecodeRuneInString unpacks the first UTF-8 encoding in s and returns
// the rune and its width in bytes.
func DecodeRuneInString(s string) (r rune, size int)
// This is the building block under "range over string". You almost never
// call it directly — but knowing it exists helps when you need to walk a
// string by runes manually (e.g. for cursor positions in a text editor).
// Bytes → runes conversion
func RuneCountInString(s string) int // count of code points
func ValidString(s string) bool // is valid UTF-8?From the wild: go standard library · BSD-3-Clause
8 · Coming from another language?
| Language | String model |
|---|---|
| Python 3 | str = sequence of Unicode code points. Go is sequence of bytes. len() differs. |
| Java | String = UTF-16 char array. Code points may use surrogate pairs. Go avoids the surrogate complexity — UTF-8 is variable but single-stream. |
| JavaScript | Similar to Java — UTF-16. "é" has length 1; in Go, length 2. |
| Rust | Strings are valid UTF-8, indexed by bytes. Closest match to Go. Both expose chars() / range for code point iteration. |
| C | Null-terminated byte arrays. Go's strings know their length — no buffer overrun bugs. |
9 · Common mistakes
- Indexing assuming character = byte.
s[0]is the first byte, which may be a partial UTF-8 sequence. - Using
len(s)for character count. It's the byte count. For runes:utf8.RuneCountInString(s)orlen([]rune(s)). - Concatenating in a loop without Builder. Quadratic. Reach for
strings.Builderor pre-size withstrings.Join. - Comparing strings with case-sensitive default.
strings.EqualFoldis the case-insensitive helper. - Treating a string as a slice of runes interchangeably. The conversion
[]rune(s)allocates. Userangewhen iterating once.
10 · Exercises (~10 min)
- Rune count. Write a function that takes a string and returns (bytes, runes). Test on
"héllo","日本語","hello". - Reverse a string by runes. Convert to
[]rune, swap in place, convert back. Test on"héllo". - Builder vs +=. Time both for 10000 concatenations. Report the ratio.
- Word count. Read input, split on whitespace, print top-3 most common words. Use
strings.Fieldsand a map.
11 · When it clicks
- You never write
for i := 0; i < len(s); i++when characters matter. - You reach for
strings.Builderas soon as you see concatenation in a loop. - You distinguish
%v,%+v,%#vby reflex. - You can predict the byte index after a multi-byte rune in a range loop.
Found this useful?