Previously we ran through the decryption steps even for an empty
ciphertext slice. The functions handle it correctly, but returning
early skips all the extra calls.
Speeds up the tar extract benchmark by about 4%.
go-fuse caps MaxWrite at MAX_KERNEL_WRITE anyway, and we
actually depend on this behavoir now as the byte pools
are sized according to MAX_KERNEL_WRITE.
So let's use MAX_KERNEL_WRITE explicitely.
We use two levels of buffers:
1) 4kiB+overhead for each ciphertext block
2) 128kiB+overhead for each FUSE write (32 ciphertext blocks)
This commit adds a sync.Pool for both levels.
The memory-efficiency for small writes could be improved,
as we now always use a 128kiB buffer.
128kiB = 32 x 4kiB pages is the maximum we get from the kernel. Splitting
up smaller writes is probably not worth it.
Parallelism is limited to two for now.
Spawn a worker goroutine that reads the next 512-byte block
while the current one is being drained.
This should help reduce waiting times when /dev/urandom is very
slow (like on Linux 3.16 kernels).
On my machine, reading 512-byte blocks from /dev/urandom
(same via getentropy syscall) is a lot faster in terms of
throughput:
Blocksize Throughput
16 28.18 MB/s
512 83.75 MB/s
For a single-threaded streaming write, this drops the CPU usage of
nonceGenerator.Get to almost 1/3:
flat flat% sum% cum cum%
Before 0 0% 95.08% 0.35s 2.92% github.com/rfjakob/gocryptfs/internal/cryptocore.(*nonceGenerator).Get
After 0.01s 0.092% 92.34% 0.13s 1.20% github.com/rfjakob/gocryptfs/internal/cryptocore.(*nonceGenerator).Get
This change makes the nonce reading single-threaded, which may
hurt massively-parallel writes.
This check would need locking to be multithreading-safe.
But as it is in the fastpath, just remove it.
rand.Read() already guarantees that the value is random.
We got another warning for force_other:
cli_args.go:26:45: don't use underscores in Go names; struct field force_owner should be forceOwner
Use a broader grep.
Before Go 1.5, GOMAXPROCS defaulted to 1, hence it made
sense to unconditionally increase it to 4.
But since Go 1.5, GOMAXPROCS defaults to the number of cores,
so don't keep it from increasing above 4.
Also, update the performance numbers.
Previously, it was at the go-fuse default of 64KiB. Getting
bigger writes should increase throughput somewhat.
Testing on tmpfs shows an improvement from 112MiB/s to 120MiB/s.
Travis failed on Go 1.6.3 with this error:
internal/pathiv/pathiv_test.go:20: no args in Error call
This change should solve the problem and provides a better error
message on (real) test failure.