Strings, runes, and bytes

A Go string is a read-only sequence of bytes. Notice the word "bytes" and not "characters". Most of the times this won't matter, but as soon as you introduce an accented letter or an emoji, they come apart.

Here is the minimum demonstration:

plain := "hello"
fancy := "héllo"

fmt.Println(len(plain))
fmt.Println(len(fancy))

Both strings look five characters long to a human. But len(plain) prints 5 and len(fancy) prints 6. The é in fancy takes two bytes, not one, in the UTF-8 encoding Go uses for string literals, and len counts bytes.

This is not a bug. Counting bytes is cheap (a string already knows its byte length), while counting user-visible characters is expensive and ambiguous in Unicode. If you need rune counts, you ask for them explicitly with the utf8 package:

import "unicode/utf8"

fmt.Println(utf8.RuneCountInString(plain))   // 5
fmt.Println(utf8.RuneCountInString(fancy))   // 5

Now both report 5, because RuneCountInString counts runes, not bytes. That is often closer to what a human expects, but it is not exactly the same as counting user-visible characters in every Unicode case. We talk more about that distinction next. For now the takeaway is: a string is a byte sequence, and the number of bytes is not always the same as the number of runes.

Byte, rune, and the relationship between them

Two types sit behind every string:

A byte is a value from 0 to 255. The type byte is an alias for uint8. One byte holds one character only for ASCII text; anything beyond ASCII spans two, three, or four bytes.
A rune is a Unicode code point. The type rune is an alias for int32. In many everyday cases one rune lines up with one character a human sees, such as é, 漢, or 🚀, but Unicode has edge cases where one visible character uses multiple runes. A single rune occupies one to four bytes when it is written out in UTF-8.

Strings store their characters as UTF-8 bytes. When you want to work with the characters, you treat them as runes instead.

Here is what each looks like in code. Single quotes produce a rune literal, a numeric constant equal to the Unicode code point of the character. You can assign it directly to a rune variable, or to a byte when the value fits in the byte range (0 to 255):

var b byte = 'A'    // 65,  the ASCII code for 'A'
var r rune = 'é'    // 233, the Unicode code point for 'é'

fmt.Println(b)
fmt.Println(r)

Double quotes always build a string, so "A" and 'A' are different things: the first is a one-character string, the second is an integer value. Mixing them up is one of the most common mistakes when you are coming from a language where 'A' and "A" are interchangeable.

byte or rune?

Reach for byte when you are working with raw binary data or ASCII-only text where every character is guaranteed to fit in one byte. Reach for rune when you are working with human text that may contain non-ASCII characters (accents, non-Latin scripts, emoji). When in doubt about user-facing text, rune is the safer default.

Indexing a string gives you a byte

Because a string is a byte sequence, indexing with s[i] returns a single byte:

fmt.Println(fancy[0])   // 104  (byte value of 'h')
fmt.Println(fancy[1])   // 195  (first byte of the é encoding)
fmt.Println(fancy[2])   // 169  (second byte of the é encoding)

fancy[0] is 104, the ASCII code for h. That is what you would expect. But fancy[1] is 195, the first of two bytes that together encode é in UTF-8, and fancy[2] is 169, the second. Indexing does not get you the letter é; it gets you one raw byte of its encoding.

This is why you rarely want to loop over a string with a classic counter. The range keyword, introduced in the control-flow chapter, iterates over runes and is the usual safe way to walk a string one code point at a time.

If you genuinely need the i-th rune (not the i-th byte), the usual trick is to convert the string to a slice of runes first and index that: []rune(s)[i]. The conversion walks the whole string to decode it, so save this for the odd place you really need rune-by-index access. For plain iteration, range handles the decoding without allocating a whole rune slice.

Strings are immutable

You can read the bytes of a string, but you cannot change them:

s := "hello"
s[0] = 'H'   // error: cannot assign to s[0] (value of type byte)

To "modify" a string you build a new one, usually by concatenating pieces together with +. The original string stays the same; only your variable is reassigned to point at the new value:

s := "hello"
s = "H" + s[1:]
fmt.Println(s)   // Hello

s[1:] is a slice expression that produces a new string from byte index 1 to the end (covered properly in the slices chapter). The + glues "H" and "ello" into a brand-new string, and s is reassigned to it. The original "hello" is untouched; nothing can reach it any more.

The strings package has helpers for more involved transformations (replacing substrings, uppercasing, trimming whitespace). It is covered in the Standard library chapter.

Raw strings with backticks

A double-quoted string interprets backslashes as the start of an escape sequence: \n becomes a newline, \t becomes a tab, \\ becomes a single backslash, and so on. That is convenient when you want control characters in your text, but it causes friction whenever your actual data contains a backslash. Try writing a Windows file path:

path := "C:\Users\kamran\data.txt"   // compile error

Go rejects this at compile time. The backslash starts an escape sequence, and \U followed by sers is not a valid one, so the compiler refuses to move on. To use this path in a double-quoted string you have to double each backslash so the compiler treats them as data, not escape prefixes:

path := "C:\\Users\\kamran\\data.txt"
fmt.Println(path)   // C:\Users\kamran\data.txt

Readable once you know the rule, tedious once you have a long regex or a nested path, and very easy to get wrong.

Go's second string form sidesteps the whole problem. Enclose the text in backticks instead of double quotes and you get a raw string: escape sequences are not interpreted, and every character in the source appears in the final value verbatim.

path := `C:\Users\kamran\data.txt`
fmt.Println(path)   // C:\Users\kamran\data.txt

Same final value, no doubled backslashes. Raw strings also span multiple lines naturally, which a double-quoted string cannot:

poem := `Roses are red,
Violets are blue.`

Gotcha

Inside a raw string, \n is not a newline. It is the two literal characters \ and n. If you need a real newline in a raw string, break the line in the source: the line break itself becomes the newline. If you need the literal text \n (for example, regex metacharacters or documentation examples), that is exactly what backticks give you without any effort.

Raw strings are the right choice for:

Windows file paths (no doubled backslashes)
Regular-expression patterns (no double-escaping)
Multi-line text such as SQL queries, HTML snippets, or embedded templates

More string manipulation later

Splitting, joining, uppercasing, searching, replacing, and everything else you actually do with strings lives in the strings package, which is covered in depth in the Standard library chapter.

Task

Extend the starter so it also prints the first two byte values of fancy:

Add fmt.Println(fancy[0]). You should see 104.
Add fmt.Println(fancy[1]). You should see 195.

Leave the two existing len lines in place.

Expected output

Numbers in depth

Type conversions