# Efficient Bit Manipulation in Go 1.9

Go 1.9, released in August 2017, introduced the `maths/bits`

package. This package provides optimized functions for bit counting and manipulation of unsigned integers. These are useful in low-level environments where the individual bits stored in an integer are of importance. Let’s take a look at some of the functions.

First up, `Len`

returns the minimum number of bits required to represent a value `x`

.

```
package main
import (
"fmt"
"math/bits"
)
func main() {
var a uint = 31
fmt.Printf("bits.Len(%d) = %d\n", a, bits.Len(a))
a++
fmt.Printf("bits.Len(%d) = %d\n", a, bits.Len(a))
}
```

Which outputs:

```
bits.Len(31) = 5
bits.Len(32) = 6
```

This is because 31 is `11111`

in binary, and 32 is `100000`

. The first significant bit is in each case is in position 5 and 6, respectively, when counting positions from the right.

This also allows us to easily illustrate the `OnesCount`

function, which returns the number of bits set to 1 in a given number:

```
package main
import (
"fmt"
"math/bits"
)
func main() {
var a uint = 31
fmt.Printf("bits.OnesCount(%d) = %d\n", a, bits.OnesCount(a))
a++
fmt.Printf("bits.OnesCount(%d) = %d\n", a, bits.OnesCount(a))
}
```

which outputs:

```
bits.OnesCount(31) = 5
bits.OnesCount(32) = 1
```

We also have `LeadingZeros`

, `TrailingZeros`

, `Reverse`

and `ReverseBytes`

. Let’s use all of these in a single program to demonstrate how they work. In the next example, we use the `uint16`

version of these functions to keep the output short and readable. We could just as easily use the 32-bit, 64-bit or unspecified versions.

```
package main
import (
"fmt"
"math/bits"
"os"
"text/tabwriter"
)
func main() {
// read an unsigned integer from stdin
var i uint16
_, err := fmt.Scanf("%d", &i)
if err != nil {
i = 100
fmt.Println("Defaulting to 100")
}
// create a tabwriter for formatting the output
w := new(tabwriter.Writer)
// Format in tab-separated columns with a tab stop of 8.
w.Init(os.Stdout, 0, 8, 0, '\t', 0)
fmt.Fprintf(w, "Binary:\t%0.16bb\n", i)
fmt.Fprintf(w, "Len:\t%16d\n", bits.Len16(i))
fmt.Fprintf(w, "Ones count:\t%16d\n", bits.OnesCount16(i))
fmt.Fprintf(w, "Leading zeros:\t%16d\n", bits.LeadingZeros16(i))
fmt.Fprintf(w, "Trailing zeros:\t%16d\n", bits.TrailingZeros16(i))
fmt.Fprintf(w, "Reversed:\t%0.16bb\n", bits.Reverse16(i))
fmt.Fprintf(w, "Reversed Bytes:\t%0.16bb\n", bits.ReverseBytes16(i))
w.Flush()
}
```

An example run with 123 as the input gives us the following:

```
$ go run bits.go
123
Binary: 0000000001111011b
Len: 7
Ones count: 6
Leading zeros: 9
Trailing zeros: 0
Reversed: 1101111000000000b
Reversed Bytes: 0111101100000000b
```

It’s not difficult to see what these functions do: `LeadingZeros`

returns the number of zeros to the left of the most significant set bit. `TrailingZeros`

returns the number of zeros to the right of the least significant set bit. `Reverse`

returns the number with all bits in reverse order, and `ReverseBytes`

reverses the number byte-by-byte, that is, in 8-bit blocks.

## How does it perform?

This is all good and well, and it can be useful at times. But some of these functions are relatively simple to implement manually. How much more efficient are the library implementations, really?

Let’s compare the performance of `OnesCount64`

with that of a naive algorithm for counting the number of set bits. Below is a simple benchmark that compares a naive implementation of `OnesCount64`

, which uses bit shifting to count the number of set bits, to the built-in `bits.OnesCount64`

:

```
package main
import (
"math/bits"
"testing"
)
func NaiveOnesCount64(x uint64) int {
var c, i uint64
for i = 0; i < 64; i++ {
c += (x >> i) & 1
}
return int(c)
}
var numbers = []uint64{
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
100, 200, 300, 400, 500, 600, 700, 800, 900,
11111, 22222, 33333, 44444, 55555, 66666, 77777, 88888, 99999,
190239012390, 123901312, 6549654, 45904059,
}
// Output is a global sync variable so that test outputs
// do not get elided by the compiler.
var Output int
func TestNaiveOnesCount64(t *testing.T) {
m := len(numbers)
for n := 0; n < m; n++ {
x := numbers[n%m]
got := NaiveOnesCount64(x)
want := bits.OnesCount64(x)
if got != want {
t.Errorf("NaiveOnesCount64(%d) = %d, want %d", x, got, want)
}
}
}
func BenchmarkNaiveOnesCount64(b *testing.B) {
// run NaiveOnesCount b.N times
var s int
for n := 0; n < b.N; n++ {
// use &31 to avoid modulo operation
s += NaiveOnesCount64(numbers[n&31])
}
Output = s
}
func BenchmarkBitsOnesCount64(b *testing.B) {
// run NaiveOnesCount b.N times
var s int
for n := 0; n < b.N; n++ {
// use &31 to avoid modulo operation
s += bits.OnesCount64(numbers[n&31])
}
Output = s
}
```

Running the comparison on a 64-bit Linux machine, we get:

```
$ go test -bench=Count
goos: linux
goarch: amd64
BenchmarkNaiveOnesCount64-4 20000000 71.6 ns/op
BenchmarkBitsOnesCount64-4 2000000000 1.15 ns/op
PASS
ok _/home/code/bits 3.946s
```

The built-in `OnesCount`

is orders of magnitude faster! This is because the built-in function uses an algorithm called *Parallel summing of adjacent bits*. A full description of this is given in Chapter 5 of Hacker’s Delight, Counting Bits. (**Correction**: on x86-64 systems, it actually uses POPCNT instructions, see note at bottom of this post).

Below is another benchmark comparison, this time for `TrailingZeros64`

:

```
package main
import (
"math/bits"
"testing"
)
// NaiveTrailingZeros64 returns the number of trailing zero bits in x;
// the result is 64 for x == 0.
func NaiveTrailingZeros64(x uint64) int {
var c, i uint64
for i = 0; i < 64; i++ {
if (x>>i)&1 == 1 {
break
}
c++
}
return int(c)
}
var nums = []uint64{
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
100, 200, 300, 400, 500, 600, 700, 800, 900,
11111, 22222, 33333, 44444, 55555, 66666, 77777, 88888, 99999,
190239012390, 123901312, 6549654, 45904059,
}
// Output is a global sync variable so that test outputs
// do not get elided by the compiler.
var Output int
func TestNaiveTrailingZeros64(t *testing.T) {
m := len(nums)
for n := 0; n < m; n++ {
x := nums[n%m]
got := NaiveTrailingZeros64(x)
want := bits.TrailingZeros64(x)
if got != want {
t.Errorf("NaiveTrailingZeros64(%d) = %d, want %d", x, got, want)
}
}
}
func BenchmarkNaiveTrailingZeros64(b *testing.B) {
// run NaiveOnesCount b.N times
s := 0
for n := 0; n < b.N; n++ {
s += NaiveTrailingZeros64(nums[n&31])
}
Output = s
}
func BenchmarkBitsTrailingZeros64(b *testing.B) {
// run NaiveOnesCount b.N times
s := 0
for n := 0; n < b.N; n++ {
s += bits.TrailingZeros64(nums[n&31])
}
Output = s
}
```

Running this we get:

```
$ go test -bench=Trailing
goos: linux
goarch: amd64
BenchmarkNaiveTrailingZeros64-4 300000000 5.86 ns/op
BenchmarkBitsTrailingZeros64-4 2000000000 1.00 ns/op
PASS
ok _/home/code/bits 4.460s
```

Again, we see that `TrailingZeros64`

is significantly more efficient than the naive solution, though not by as much in this case. This time the reason is that it implements an algorithm first described by Charles E. Leiserson, Harald Prokop and Keith H. Randallin in 1998, using what is called De Bruijn numbers. It’s a neat trick, and I recommend reading the original paper. (**Correction**: Again, it was pointed out that this is not true on most 64-bit systems, see note at the bottom of post)

## Example Use Case

Now that we have seen that the built-in functions are much faster, you may still be asking “why should I care?” Bit manipulation isn’t all that common of a thing to need, and we still have to shift left and right with the `<<`

and `>>`

operators when we do need it, so this is a fair question.

One example where this has already made an impact is in Will Fitzgerald’s bitset package. A blog post by Daniel Lemire shows how changing to use the new bits package made the `Count`

function run twice as fast, and the previous implementation relied on hand-tuned assembly code!

```
$ go test -run=XXX -bench Count >old.txt
$ git checkout go19
$ go get golang.org/x/tools/cmd/benchcmp
$ go test -run=XXX -bench Count >new.txt
$ benchcmp old.txt new.txt
benchmark old ns/op new ns/op delta
BenchmarkCount-4 2676 1352 -49.48%
BenchmarkLemireCount-4 2569228 1401137 -45.46%
```

Bitsets are commonly used in all sorts of applications that are memory or space-aware, from databases (e.g. bloom filters) to priority queues and examining stream data. The use of the new `bits`

package in bitsets provides an example of the positive impact this will have on the performance of many Go applications.

## Conclusion

The new bit manipulation package in Go 1.9 wasn’t celebrated all that much, but it is pretty exciting if you do low-level programming in Go. It can provide quick efficiency gains if you are already using some of these common functions in a hot path in your code. Kudos to Robert Griesemer, Roman Budnikov and Lynn Boger, who worked on getting this into the standard library.

**Update**: some great feedback from giovannibajo and klauspost on r/golang - apparently a lot of the efficiency of the OnesCount and TrailingZeros on 64-bit systems comes from calling POPCNT and BSF instructions when the CPU supports it. The algorithms mentioned in the article are a fallback if the system does not support these methods.

I’m a London-based software engineer who likes writing (about) Go. If you like this article, feel free to reach out on Twitter; over there I’m known as @ironzeb. I’m currently writing Production Go, a book that walks through building a modern production-ready application Go.