Reduce allocations and comparison performance with the new unique package in Go 1.23
Posted on Friday, 2nd August 2024
Having been playing around with the upcoming Go 1.23 release (which at the time of writing is still in draft but due for release this month) I was interested to understand the new unique
package and the problems it aims to solve. Below is a write-up of that investigation which I’m hoping will prove useful to others interested in learning more.
Interning and Go
Interning, a term originally introduced by Lisp is the process of storing only one copy of a value in memory and sharing a unique reference to it instead of allocating multiple copies and wasting memory. For instance the Go compiler already performs interning of string constants at compile time, so instead of allocating multiple identical strings it will allocate a single instance of the string and share a reference to it, as demonstrated in the snippet below. (Go Playground here)
package main
import (
"fmt"
"reflect"
"unsafe"
)
const greet1 = "hello world"
const greet2 = "hello world"
func main() {
a := greet1
p := (*reflect.StringHeader)(unsafe.Pointer(&a))
fmt.Println("a address:", *&p.Data)
b := greet2
p2 := (*reflect.StringHeader)(unsafe.Pointer(&b))
fmt.Println("b address:", *&p2.Data)
}
$ go run main.go
a address: 4310983661
b address: 4310983661
Before Go 1.23, interning runtime values has only been available via third-party packages. However, as of Go 1.23 interning is now included in the standard library via the new unique
package.
The unique
Package in Go 1.23
Interning (or ‘canonicalizing values’) can now be performed using the unique
package via its Handle
type that acts as a globally unique identity for any provided (comparable) value, meaning two handles compare equal exactly if the two values used to create the handles would have also compared equal.
type Handle[T comparable] struct {}
func (h Handle[T]) Value() T {}
func Make[T comparable](value T) Handle[T] {}
Internally the Handle
type is backed by a concurrent-safe map (well, an adaptive radix tree to be precise due to its need to shrink and grow the tree more efficiently than a map) which acts as a read-through cache, storing unique values when not detected in the cache and returning a Handle[T]
which is designed to be passed around as values/fields instead of the underlying value, saving you additional allocations and also resulting in cheaper comparisons as you’ll only incur a reference comparison as opposed to a value comparison.
Couldn’t you achieve the same behaviour with a map of unique values?
Sure, the same interning behaviour could be achieved using a custom map to reduce duplicate allocations, however handling garbage collecting efficiently isn’t possible. The unique
package has a notion of “weak references” that means the garbage collector can clean up in a single cycle made possible by access to runtime internals, something a custom rolled solution wouldn’t be able to perform.
How to use the unique
package
Let’s take a look at an trivial example, inspired by the net/netip
package which actually uses the unique
package for efficient handling of IPv6 zone names.
package main
import (
"unique"
)
type addrDetail struct {
isV6 bool
zoneV6 string
}
func main() {
h1 := unique.Make(addrDetail{isV6: true, zoneV6: "2001:0db8:0001:0000:0000:0ab9:C0A8:0102"})
// this addrDetail won't be allocated as it already exists in the underlying map
h2 := unique.Make(addrDetail{isV6: true, zoneV6: "2001:0db8:0001:0000:0000:0ab9:C0A8:0102"})
if h1 == h2 {
println("addresses are equal")
}
// Value() returns a copy of the underlying value (ie, different memory address)
println(h1.Value().zoneV6)
}
Handle Comparison Performance
Earlier I mentioned that along with a reduction of unnecessary allocations via deduplication of values, the unique
package can also reduce the cost of object comparisons, which can be especially evident when dealing with comparing large strings, structs with string fields or arrays, where the comparison can be reduced to a simple pointer comparison. Let’s take a look.
Below we have a benchmark that repeats a comma-separated string of IPv6 zones and then performs a string comparison on the two identical copies, one benchmark without wrapping the string in a Handle
type, the other with.
package main
import (
"strings"
"testing"
"unique"
)
func BenchmarkStringCompareSmall(b *testing.B) { benchStringComparison(b, 10) }
func BenchmarkStringCompareMedium(b *testing.B) { benchStringComparison(b, 100) }
func BenchmarkStringCompareLarge(b *testing.B) { benchStringComparison(b, 1000000) }
func BenchmarkCanonicalisingSmall(b *testing.B) { benchCanonicalising(b, 10) }
func BenchmarkCanonicalisingMedium(b *testing.B) { benchCanonicalising(b, 100) }
func BenchmarkCanonicalisingLarge(b *testing.B) { benchCanonicalising(b, 1000000) }
func benchStringComparison(b *testing.B, count int) {
s1 := strings.Repeat("2001:0db8:0001:0000:0000:0ab9:C0A8:0102,", count)
s2 := strings.Repeat("2001:0db8:0001:0000:0000:0ab9:C0A8:0102,", count)
b.ResetTimer()
for n := 0; n < b.N; n++ {
if s1 != s2 {
b.Fatal()
}
}
b.ReportAllocs()
}
func benchCanonicalising(b *testing.B, count int) {
s1 := unique.Make(strings.Repeat("2001:0db8:0001:0000:0000:0ab9:C0A8:0102,", count))
s2 := unique.Make(strings.Repeat("2001:0db8:0001:0000:0000:0ab9:C0A8:0102,", count))
b.ResetTimer()
for n := 0; n < b.N; n++ {
if s1 != s2 {
b.Fatal()
}
}
b.ReportAllocs()
}
Benchmark Results
Let’s run the benchmark and check out the results:
$ go test -run='^$' -bench=.
goos: darwin
goarch: arm64
pkg: go-experiment
cpu: Apple M1
BenchmarkStringCompareSmall-8 116581837 9.392 ns/op 0 B/op 0 allocs/op
BenchmarkStringCompareMedium-8 14944300 80.15 ns/op 0 B/op 0 allocs/op
BenchmarkStringCompareLarge-8 903 1296028 ns/op 0 B/op 0 allocs/op
BenchmarkCanonicalisingSmall-8 1000000000 0.3132 ns/op 0 B/op 0 allocs/op
BenchmarkCanonicalisingMedium-8 1000000000 0.3140 ns/op 0 B/op 0 allocs/op
BenchmarkCanonicalisingLarge-8 1000000000 0.3128 ns/op 0 B/op 0 allocs/op
PASS
ok go-experiment 5.596s
Running the benchmarks we can see that regardless of the size of the string, the nanoseconds per operation (ns/op) duration is consistently low regardless of whether performing the comparison over a string 10 copies long or 1,000,000, whereas the non-canonicalised version grows relative to the size of the string.
Conclusion
Hopefully, this post has given you enough of an understanding of what the unique
package is and the problems it aims to solve. It’s another great package that’s now included as part of Go’s already solid standard library and hopefully in the future if you run into the potential use cases as those discussed above then this post will give you enough of an understanding to reach for interning.
Enjoy this post? Don't be a stranger!
Follow me on Twitter at @_josephwoodward and say Hi! I love to learn in the open, meet others in the community and talk Go, software engineering and distributed systems related topics.