Reduce allocations and comparison performance with the new unique package in Go 1.23

Posted on Friday, 2nd August 2024

Having been playing around with the upcoming Go 1.23 release (which at the time of writing is still in draft but due for release this month) I was interested to understand the new unique package and the problems it aims to solve. Below is a write-up of that investigation which I’m hoping will prove useful to others interested in learning more.

Interning and Go

Interning, a term originally introduced by Lisp is the process of storing only one copy of a value in memory and sharing a unique reference to it instead of allocating multiple copies and wasting memory. For instance the Go compiler already performs interning of string constants at compile time, so instead of allocating multiple identical strings it will allocate a single instance of the string and share a reference to it, as demonstrated in the snippet below. (Go Playground here)

package main

import (
	"fmt"
	"reflect"
	"unsafe"
)

const greet1 = "hello world"
const greet2 = "hello world"

func main() {
	a := greet1
	p := (*reflect.StringHeader)(unsafe.Pointer(&a))
	fmt.Println("a address:", *&p.Data)

	b := greet2
	p2 := (*reflect.StringHeader)(unsafe.Pointer(&b))
	fmt.Println("b address:", *&p2.Data)
}
$ go run main.go

a address: 4310983661
b address: 4310983661

Before Go 1.23, interning runtime values has only been available via third-party packages. However, as of Go 1.23 interning is now included in the standard library via the new unique package.

The unique Package in Go 1.23

Interning (or ‘canonicalizing values’) can now be performed using the unique package via its Handle type that acts as a globally unique identity for any provided (comparable) value, meaning two handles compare equal exactly if the two values used to create the handles would have also compared equal.

type Handle[T comparable] struct {}
func (h Handle[T]) Value() T {}
func Make[T comparable](value T) Handle[T] {}

Internally the Handle type is backed by a concurrent-safe map (well, an adaptive radix tree to be precise due to its need to shrink and grow the tree more efficiently than a map) which acts as a read-through cache, storing unique values when not detected in the cache and returning a Handle[T] which is designed to be passed around as values/fields instead of the underlying value, saving you additional allocations and also resulting in cheaper comparisons as you’ll only incur a reference comparison as opposed to a value comparison.

Couldn’t you achieve the same behaviour with a map of unique values?

Sure, the same interning behaviour could be achieved using a custom map to reduce duplicate allocations, however handling garbage collecting efficiently isn’t possible. The unique package has a notion of “weak references” that means the garbage collector can clean up in a single cycle made possible by access to runtime internals, something a custom rolled solution wouldn’t be able to perform.

How to use the unique package

Let’s take a look at an trivial example, inspired by the net/netip package which actually uses the unique package for efficient handling of IPv6 zone names.

package main

import (
	"unique"
)

type addrDetail struct {
	isV6   bool
	zoneV6 string
}

func main() {
	h1 := unique.Make(addrDetail{isV6: true, zoneV6: "2001:0db8:0001:0000:0000:0ab9:C0A8:0102"})

	// this addrDetail won't be allocated as it already exists in the underlying map
	h2 := unique.Make(addrDetail{isV6: true, zoneV6: "2001:0db8:0001:0000:0000:0ab9:C0A8:0102"})

	if h1 == h2 {
		println("addresses are equal")
	}
	
	// Value() returns a copy of the underlying value (ie, different memory address)
	println(h1.Value().zoneV6) 
}

Handle Comparison Performance

Earlier I mentioned that along with a reduction of unnecessary allocations via deduplication of values, the unique package can also reduce the cost of object comparisons, which can be especially evident when dealing with comparing large strings, structs with string fields or arrays, where the comparison can be reduced to a simple pointer comparison. Let’s take a look.

Below we have a benchmark that repeats a comma-separated string of IPv6 zones and then performs a string comparison on the two identical copies, one benchmark without wrapping the string in a Handle type, the other with.

package main

import (
	"strings"
	"testing"
	"unique"
)

func BenchmarkStringCompareSmall(b *testing.B)  { benchStringComparison(b, 10) }
func BenchmarkStringCompareMedium(b *testing.B) { benchStringComparison(b, 100) }
func BenchmarkStringCompareLarge(b *testing.B)  { benchStringComparison(b, 1000000) }

func BenchmarkCanonicalisingSmall(b *testing.B)  { benchCanonicalising(b, 10) }
func BenchmarkCanonicalisingMedium(b *testing.B) { benchCanonicalising(b, 100) }
func BenchmarkCanonicalisingLarge(b *testing.B)  { benchCanonicalising(b, 1000000) }

func benchStringComparison(b *testing.B, count int) {
	s1 := strings.Repeat("2001:0db8:0001:0000:0000:0ab9:C0A8:0102,", count)
	s2 := strings.Repeat("2001:0db8:0001:0000:0000:0ab9:C0A8:0102,", count)
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		if s1 != s2 {
			b.Fatal()
		}
	}
	b.ReportAllocs()
}

func benchCanonicalising(b *testing.B, count int) {
	s1 := unique.Make(strings.Repeat("2001:0db8:0001:0000:0000:0ab9:C0A8:0102,", count))
	s2 := unique.Make(strings.Repeat("2001:0db8:0001:0000:0000:0ab9:C0A8:0102,", count))
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		if s1 != s2 {
			b.Fatal()
		}
	}
	b.ReportAllocs()
}

Benchmark Results

Let’s run the benchmark and check out the results:

$ go test -run='^$' -bench=.

goos: darwin
goarch: arm64
pkg: go-experiment
cpu: Apple M1
BenchmarkStringCompareSmall-8     	116581837	        9.392 ns/op	      0 B/op	      0 allocs/op
BenchmarkStringCompareMedium-8    	14944300	       80.15 ns/op	      0 B/op	      0 allocs/op
BenchmarkStringCompareLarge-8     	903	               1296028 ns/op	      0 B/op	      0 allocs/op

BenchmarkCanonicalisingSmall-8    	1000000000	        0.3132 ns/op	      0 B/op	      0 allocs/op
BenchmarkCanonicalisingMedium-8   	1000000000	        0.3140 ns/op	      0 B/op	      0 allocs/op
BenchmarkCanonicalisingLarge-8    	1000000000	        0.3128 ns/op	      0 B/op	      0 allocs/op

PASS
ok  	go-experiment	5.596s

Running the benchmarks we can see that regardless of the size of the string, the nanoseconds per operation (ns/op) duration is consistently low regardless of whether performing the comparison over a string 10 copies long or 1,000,000, whereas the non-canonicalised version grows relative to the size of the string.

Conclusion

Hopefully, this post has given you enough of an understanding of what the unique package is and the problems it aims to solve. It’s another great package that’s now included as part of Go’s already solid standard library and hopefully in the future if you run into the potential use cases as those discussed above then this post will give you enough of an understanding to reach for interning.