Random texts

YetAnotherTool

mbuffer reads data from an input and writes it to one or more outputs. The more important thing though is the buffering, as you can just tell it to use X amount of memory as a buffer, the default is usually 2 MBytes. Typically the input is stdin and the output stdout, so it works well in a pipe like zfs send | mbuffer | zfs receive but files or TCP connections are possible as well, so it can replace cat and netcat. Tape drives and autoloader have some support as well, but I have no experience with that.

Pipes in shell scripts or directly on the command line are so common that you barely think about them, but they have a serious limitation. They block nearly instantly if the other end of the pipe is not ready to receive the data. This is good and bad at the same time. Good: The sender immediately notices that the receiver has failed, for example. Bad: Unless both ends of the pipe can send and receive data at the same time you lose throughput. Imagine a sender that can average 1 MByte/s output, while using a chunk size of 1 MByte, that means that the sender tries to shove 1 MBytes into the pipe as fast as possible, then has some work to do for nearly a second (e.g. waiting on a hard disk) and then tries to send another 1 MByte chunk. If the receiving end of the pipe does not use the exact same chunk size and e.g. has to do work after each 100 Kilobytes for about a tenth of a second, that means that the receiver is also able to accept about 1 MByte/s, but the reality will look completely different: The sender will generate 1 MByte data and sends the first 100 Kbyte through the pipe, then it locks up, waiting for the reciever for about 0.1 seconds, sending the next 100 Kbyte, waiting again... repeat ten times. In total generating this 1 MByte and sending it through the pipe will take roughly 2 seconds. Meaning that the data rate has just been halved. The generation only takes one second, while the 2nd second has just been wasted trying to shove that data into the pipe.

If you insert mbuffer between those two programs, it looks completely different. While the internal buffer in mbuffer isn't full, it will continue to accept new data from the sender. At the same time, whenever the receiver is ready to accept data, mbuffer will be able to send data from its internal buffer, unless it's empty. With the 2 MByte default buffer size that mbuffer is using, the above example should be running at roughly the full speed of 1 MByte/s, though with a data chunk size of 1 MBytes, I'd probably increase the buffer size to 4 MBytes, just to avoid possible choke points.

The downside is obvious: All that data has to be copied around in memory twice as much, increasing CPU usage. Another downside is that the sending process may think that the receiver got all the data, but actually it's just in mbuffer. This means that mbuffer mainly shines in scenarios where both sides have possible choke points, e.g. because they read and write to relatively slow disks or there's a somewhat unreliable network connection like Wifi in the middle that can choke because of retransmissions.

Another nice bonus: mbuffer displays a running status of the amount of data that went through the pipe:

$ mbuffer < /dev/zero > /dev/null
in @ 9529 MiB/s, out @ 9511 MiB/s, 18.8 GiB total, buffer   1% full ^C
mbuffer: error: error closing input: Bad file descriptor
mbuffer: warning: error during output to <stdout>: canceled
summary: 18.8 GiByte in  1.8sec - average of 10.4 GiB/s

Reading from /dev/zero and sending into /dev/null obviously is only useful as a benchmark, but it works as a nice example.

Homepage: http://www.maier-komor.de/mbuffer.html First version is from 2001 and it is still actively being developed, though the basic featureset has been stable for a long time now. It's available on at least Debian stable and NixOS.

#YetAnotherTool

direnv manages shell environments.

direnv is one of these tools that you basically setup once and after that forget that it's there. You just notice it when it does the job you set it up for and are happy it saves you a lot of hassle.

Enter a directory that you've configured with direnv and it will import things into your environment. That works well for e.g. programming languages where you need specific tools in your PATH or you just need an environment variable to point to a specific file in that environment, like the ANSIBLE_INVENTORY variable. Got two ansible environments in ~/ansible-test and ~/ansible-prod? Drop the following file as .envrc into each one:

ANSIBLE_INVENTORY="$(expand_path hosts)"

You can now cd ~/ansible-test/roles/sshkeys and when running ansible it will use ~/ansible-test/hosts as its inventory file.

Security is good, direnv only executes files that you have authorized by executing direnv allow in that directory. And if the file changes, you need to authorize the file again, so nobody can sneak in bad commands.

direnv also allows importing the environments of other tools like rbenv, Nix, Guix, rvm and node. With the Nix package manager it's even possible to install programs on demand. Add the line use nix -p ansible to the above .envrc and direnv will ensure that ansible is installed when you enter that directory. Leave that directory and ansible is gone again. I'm assuming you don't have it installed system-wide or in your user-profile.

Another way to use direnv comes from @tercean@chaos.social, as he puts it: direnv + git env vars = simple way to manage identities per customer

direnv really helps avoid cluttering your regular shell environment from single-use environment variables and you won't have to remember the names of the files to source to setup a specific environment anymore.

#YetAnotherTool