Efficient String Concatenation in Go

In this article I investigate the computational performance of various string concatenation methods in the Go programming language.

To evaluate the options, I prepared some typical Go benchmarks using the Go testing package. A benchmark looks something like this:

func BenchmarkBufferString(b *testing.B, numConcat int) {
    var ns string
    for i := 0; i < b.N; i++ {
        next := nextString()
        buffer := bytes.NewBufferString("")
        for u := 0; u < numConcat; u++ {
            buffer.WriteString(next())
        }
        ns = buffer.String()
    }
    global = ns
}

For the purposes of these benchmarks, I imagined having a process that returns string segments one by one, and these segments need to be concatenated to form one string. To represent such a process, I created a simple iterator that returns the string version of its current index on each call:

// nextString is an iterator we use to represent a process
// that returns strings that we want to concatenate in order.
func nextString() func() string {
    n := 0
    // closure captures variable n
    return func() string {
        n += 1
        return strconv.Itoa(n)
    }
}

Every method we evaluate will call this function in a loop so that the fully concatenated string looks something like 0123456789101112131415.... There is function call overhead to take into account in this, but since all the benchmarks will have the same overhead, the comparison is still fair. Now that we have the basic setup for the benchmark, let’s see what the methods are we’ll be evaluating.

Method 1: Naive Appending with +=

This is the most obvious approach to the problem. Use the concatenate operator (+=) to append each segment to the string. The thing to know about this is that strings are immutable in Go - each time a string is assigned to a variable a new address is assigned in memory to represent the new value. This is different from languages like C++ and BASIC, where a string variable can be modified in place. In Go, each time you append to the end of a string, it must create a new string and copy the contents of both the existing string and the appended string into it. As the strings you are manipulating become large this process becomes increasingly slow. In fact, this method of naive appending is O(N2).

Method 2: strings.Join

The Go strings standard library package provides a Join function that takes a slice of strings and concatenates them. In our case, this requires first building a slice of strings and then calling the function on the slice. Building a slice takes time if you don’t already have one, and we therefore count this as part of the time taken to perform the concatenation in our benchmarks. The final concatenation will be O(N), but building the slice is O(N) (average) and uses O(N) memory.

Here is an example of using strings.Join:

s := []string{}
for i := 0; i < 1000; i++ {
    s = append(s, "a")
}
fmt.Println(strings.Join(s, ""))

See full code on the Go Playground

Method 3: bytes.Buffer

The last method we’ll be evaluating is using the Buffer type from the bytes package. bytes.Buffer implements io.Writer, and can concatenate strings in O(N) time without requiring us to build a new slice.

Here is an example of using bytes.Buffer:

var buffer bytes.Buffer
for i := 0; i < 1000; i++ {
    buffer.WriteString("a")
}
fmt.Println(buffer.String())

See full code on the Go Playground

Now that we’ve defined the methods we want to test, let’s proceed to run the benchmarks. I’d like to skip right to the end and start with the winning approach. Then we’ll work back to see how and when this method is best.

And the Winner Is…

Method #3 (bytes.Buffer) is the winner! But not by far, and not in all cases. Here is a graph of how the methods compare in the case with a 10000 strings with a total size of 37kB (small strings of <= 5 characters):

Comparison of String Concat Methods in Go

It shows how bytes.Buffer is the winner by quite a large margin in this case. Let’s see how the methods compare when the strings are a bit longer, at a total size of 244kB (25 characters each):

String Concat Methods in Go (244kB)

With longer strings strings.Join doesn’t perform so badly, but it’s still not as good as using bytes.Buffer.

I also ran benchmarks on small numbers of concatenations, and in that case naive concatenation gets the job done slightly faster than the other methods, so it’s still fine to use for small numbers of concatenations and small segments.

Lastly, how big of a factor does building a string slice play in Method #2? We can exclude slice building from the strings.Join benchmark, but make it still call nextString on every iteration so that the tests remain fair. In this case, the results look like this:

String Concat Methods, Not Counting Slice-building

When we don’t count the time it takes to build a string slice, strings.Join edges out bytes.Buffer and rules supreme.

In Summary

In summary, for very few concatenations of short strings (fewer than hundred, length shorter than 10), using Method #1, naive string appending, is just fine and can actually perform slightly better. For more heavy-duty cases where efficiency is important, Method #3, bytes.Buffer is the best choice out of the methods evaluated. That said, strings.Join is a good choice when you already have a string slice that just needs to be concatenated into one string.

Get the Code

Below are the raw results of running go test -bench. for the different methods and different number of concatenations on my machine (Go 1.2 on Ubuntu 64bit, Intel® Core™ i7-3667U CPU @ 2.00GHz):

BenchmarkNaiveConcat10    500000          6403 ns/op
BenchmarkNaiveConcat100    20000         77001 ns/op
BenchmarkNaiveConcat1000         500       3105361 ns/op
BenchmarkNaiveConcat10000          5     236872134 ns/op
BenchmarkJoin10   200000          8129 ns/op
BenchmarkJoin100       50000         51891 ns/op
BenchmarkJoin1000       5000        465179 ns/op
BenchmarkJoin10000       500       5911317 ns/op
BenchmarkBufferString10   200000          7688 ns/op
BenchmarkBufferString100       50000         45027 ns/op
BenchmarkBufferString1000       5000        398311 ns/op
BenchmarkBufferString10000       500       3702906 ns/op

You can play around with the benchmarks, run them yourself and check my implementation on Github. A typical benchmark looks like this:

var global string

// benchmarkNaiveConcat provides a benchmark for basic built-in
// Go string concatenation. Because strings are immutable in Go,
// it performs the worst of the tested methods. The time taken to
// set up the array that is appended is not counted towards the
// time for naive concatenation.
func benchmarkNaiveConcat(b *testing.B, numConcat int) {
    var ns string
    for i := 0; i < b.N; i++ {
        next := nextString()
        ns = ""
        for u := 0; u < numConcat; u++ {
            ns += next()
        }
    }
    // we assign to a global variable to make sure compiler
    // or runtime optimizations don't skip over the operations
    // we were benchmarking. This might be unnecessary, but it's
    // safer.
    global = ns
}

I’m a software engineer at Gengo in Tokyo. I like writing about Go, software engineering and languages. If you liked this article, feel free to reach out on Twitter; over there I’m known as @ironzeb.

 
100
Kudos
 
100
Kudos

Read this next

The Popularity of Go

When you look at the Google Trends graph above, it would seem like the Go programming language, also known as Golang, is on its way to big things. But let’s not get ahead of ourselves, Go fans. Let’s first have a cup of java... Continue →