Go has a high overhead for each C call, so batch
all openssl operations in the new C function chacha20poly1305_seal.
Benchmark results:
internal/speed$ go test -bench BenchmarkStupidXchacha -count 10 > old.txt
internal/speed$ go test -bench BenchmarkStupidXchacha -count 10 > new.txt
internal/speed$ benchstat old.txt new.txt
name old time/op new time/op delta
StupidXchacha-4 8.79µs ± 1% 7.25µs ± 1% -17.54% (p=0.000 n=10+10)
name old speed new speed delta
StupidXchacha-4 466MB/s ± 1% 565MB/s ± 1% +21.27% (p=0.000 n=10+10)