Random texts

Anything that I feel I have to write down and that I'm not embarassed enough to hide. RSS and ActivityPub (@tokudan@blog.tokudan.de).

This sets and exports all variables set in a .env file that can also be used by systemd to setup environment variables.

# Read and export the variables from .env
source "$HOME/.env"
while read -r var; do
        export "${var?}"
done < <(sed -e 's_=.*$__' "$HOME/.env")

Update: Apparently set -a / set +a before and after the sourcing makes it a lot easier. Thanks @bekopharm@social.tchncs.de for this hint.

#bash #systemd

A selection of various command lines for #ffmpeg that I either found online and used and adapted somehow or built myself

rotate iPhone Videos according to their metadata, requires reencoding the video

ffmpeg -i "$v" -movflags use_metadata_tags -c copy -c:v h264 -profile:v high -b:v 16000k "${v%.*}.new.mov"

convert animation videos 10bit to 8bit encoding, so that a raspberry pi can play it

ffmpeg -i abc.mkv -map 0 -c copy -c:v libx264 -profile baseline -tune animation -crf 18 'abc-recode.mkv'

cut and crop

ffmpeg -i stream.mpg -ss 3138 -t 4.0 -y -map v:0 -map a:0 -filter:v 'crop=245:203:924:305' -c:a ac3 -b:a 151k -c:v h264 -b:v 3500k -r 30 boso-3rd-assistant.mkv

cut file into multiple segments according to time (1h)

ffmpeg -i input.mp4 -c copy -map 0 -segment_time 01:00:00 -f segment -reset_timestamps 1 output-%03d.mp4

Ansible has issues with “run_once” and “when” in the same task. If the “when” only evaluates to true for some hosts, it's basically undefined wether the task will run or be skipped based on whatever host happens to be the first one to be evaluated. If the first one is skipped, the task won't even run once if all others would evaluate as true.

Example:

- name: Create temporary directory to download agent on ansible host
  when: package.pkgversion != agent_installed_version
  register: tempdir_ansiblehost
  run_once: true
  delegate_to: localhost
  check_mode: no
  tempfile:
    state: directory

#ansible

with_items: "{{ old_mounts | sort | reverse }}"

results in...

item=<list_reverseiterator object at 0x7fdc252dad60>

The “fix” is to use the list filter:

with_items: "{{ old_mounts | sort | reverse | list }}"

#ansible

I had to figure out a way to remove specific hosts that had generated new host keys from the SSH known hosts file on the AWX system. What I came up with is the following playbook:

- hosts: all
  gather_facts: false
  tasks:
  - name: Remove host key from known_hosts
    command:
      cmd: ssh-keygen -R {{ inventory_hostname }}
    delegate_to: "localhost"

I just run this playbook with the limit set to the host or hosts I want to clear and have setup a template that just asks me for that limit.

I known that there is a module known_hosts, but it has a shortcoming in my opinion: It points to ~/.ssh/known_hosts by default instead of parsing the ~/.ssh/config file to determine the default location.

#ansible #awx

#PHP #Webapplications in #NixOS are a bit special, as they commonly violate the split between configuration, data and application. Sometimes it's all in the same directory but more commonly it's a subdirectory that contains the data. Packaging the sources can be easy or complicated, depending on wether there is some build process. For Shaarli I just use their full.tar.gz and don't have to worry about that.

The package expression is very basic:

{ lib, stdenv, fetchurl, config ? null, dataDir ? "/var/lib/shaarli" }:

stdenv.mkDerivation rec {
  name = "shaarli-${version}";
  version = "0.11.1";
  preferLocalBuild = true;

  src = fetchurl {
    url = "https://github.com/shaarli/Shaarli/releases/download/v0.11.1/shaarli-v${version}-full.tar.gz";
    sha256 = "1psijcmi24hk0gxh1zdsm299xj11i7find2045nnx3r96cgnwjpn";
  };

  phases = [ "installPhase" ];
  installPhase = ''
    mkdir $out
    tar xzf $src
    cp -ra Shaarli/. $out/
    find $out -type d -exec chmod 0755 {} \;
    find $out -type f -exec chmod 0644 {} \;
    for a in cache data pagecache tmp; do
      mv $out/$a $out/$a.orig
      ln -s "${dataDir}/$a" $out/$a
    done
  '';

  meta = with stdenv.lib; {
    description = "";
    # License is complicated...
    #license = licenses.agpl3;
    homepage = "https://github.com/shaarli/Shaarli";
    platforms = platforms.all;
    maintainers = with stdenv.lib.maintainers; [ tokudan ];
  };
}

What's uncommon is that I have two optional arguments: config and dataDir. config is not used in my Shaarli derivation and is just part of the boilerplate I use for PHP apps. I use it to feed in a config.php if that makes sense for the PHP app, for example my roundcube config uses it. dataDir on the other hand is used in the installPhase. I move away some directories to $a.orig so the install service can setup the dataDirectory if it doesn't exist yet. It's not perfect, but works for now. Then the directories are replaced with symlinks to /var/lib/shaarli – or whatever was specified in dataDir. This derivation gives me a package that is specific to one instance of shaarli. If I run a second instance, I need to specify a different dataDir, leading to another build of the derivation.

The second part of the equation is the system configuration. How do I include the above derivation in my system? I use nginx and phppool with specific users for each php app. Here is the part of my system configuration that uses the package:

{ config, lib, pkgs, ... }:

let
  phppoolName = "shaarli_pool";
  dataDir = "/var/lib/shaarli";
  vhost = "shaarli.example.com";

  shaarli = pkgs.callPackage ./pkg-shaarli.nix {
    inherit dataDir;
  };
in
{
  services.nginx.virtualHosts."${vhost}" = {
    forceSSL = true;
    enableACME = true;
    root = "${shaarli}";
    extraConfig = ''
      index index.php;
      etag off;
      add_header etag "\"${builtins.substring 11 32 shaarli}\"";
      '';
    locations."/robots.txt" = {
      extraConfig = ''
        add_header Content-Type text/plain;
        return 200 "User-agent: *\nDisallow: /\n";
        '';
    };
    locations."/" = {
      extraConfig = ''
        try_files $uri $uri/ index.php;
        '';
    };
    locations."~ (index)\.php$" = {
      extraConfig = ''
        fastcgi_split_path_info ^(.+\.php)(/.*)$;
        if (!-f $document_root$fastcgi_script_name) {
        return 404;
        }

        fastcgi_pass unix:${config.services.phpfpm.pools."${vhost}".socket};
        fastcgi_index index.php;

        fastcgi_param   QUERY_STRING            $query_string;
        fastcgi_param   REQUEST_METHOD          $request_method;
        fastcgi_param   CONTENT_TYPE            $content_type;
        fastcgi_param   CONTENT_LENGTH          $content_length;

        fastcgi_param   SCRIPT_FILENAME         $document_root$fastcgi_script_name;
        fastcgi_param   SCRIPT_NAME             $fastcgi_script_name;
        fastcgi_param   PATH_INFO               $fastcgi_path_info;
        fastcgi_param   PATH_TRANSLATED         $document_root$fastcgi_path_info;
        fastcgi_param   REQUEST_URI             $request_uri;
        fastcgi_param   DOCUMENT_URI            $document_uri;
        fastcgi_param   DOCUMENT_ROOT           $document_root;
        fastcgi_param   SERVER_PROTOCOL         $server_protocol;

        fastcgi_param   GATEWAY_INTERFACE       CGI/1.1;
        fastcgi_param   SERVER_SOFTWARE         nginx/$nginx_version;

        fastcgi_param   REMOTE_ADDR             $remote_addr;
        fastcgi_param   REMOTE_PORT             $remote_port;
        fastcgi_param   SERVER_ADDR             $server_addr;
        fastcgi_param   SERVER_PORT             $server_port;
        fastcgi_param   SERVER_NAME             $server_name;

        fastcgi_param   HTTPS                   $https;
        fastcgi_param   HTTP_PROXY              "";
        '';
    };
    locations."~ \.php$" = {
      extraConfig = ''
        deny all;
        '';
    };
  };
  services.phpfpm.pools."${vhost}" = {
    user = "shaarli";
    group = "shaarli";
    settings = {
      "listen.owner" = "nginx";
      "listen.group" = "nginx";
      "user" = "shaarli";
      "pm" = "dynamic";
      "pm.max_children" = "75";
      "pm.min_spare_servers" = "5";
      "pm.max_spare_servers" = "20";
      "pm.max_requests" = "10";
      "catch_workers_output" = "1";
    };
  };
  users.extraUsers.shaarli = { group = "shaarli"; };
  users.extraGroups.shaarli = { };
  systemd.services.shaarli-install = {
    serviceConfig.Type = "oneshot";
    wantedBy = [ "multi-user.target" ];
    script = ''
      if [ ! -d "${dataDir}" ]; then
        mkdir -p ${dataDir}/{cache,data,pagecache,tmp}
        cp -R ${shaarli}/data.orig/.htaccess ${dataDir}/cache/
        cp -R ${shaarli}/data.orig/.htaccess ${dataDir}/data/
        cp -R ${shaarli}/data.orig/.htaccess ${dataDir}/pagecache/
        cp -R ${shaarli}/data.orig/.htaccess ${dataDir}/tmp/
      fi
      chown -Rc shaarli:shaarli ${dataDir}
      find ${dataDir} -type d ! -perm 0700 -exec chmod 0700 {} \; -exec chmod g-s {} \;
      find ${dataDir} -type f ! -perm 0600 -exec chmod 0600 {} \;
    '';
  };
}

The let block just defines some variables to be used by the expression, but there are a couple of important options I use below that: The nginx extraConfig contains

      etag off;
      add_header etag "\"${builtins.substring 11 32 shaarli}\"";

This is both nice and bad at the same time: It leaks some information to the outside world by publishing part of the hash of my Shaarli derivation. On the other hand it ensures that Browsers will refresh their caches as needed if I switch to another derivation, as they use that part of the hash to verify if the file on the server has changed and do not rely on the file modification time, which would always be the unix epoch in the nix store.

At the bottom you can see systemd.services.shaarli-install, which is the service that sets up the data directory when the configuration is activated. Note that with its current implementation it cannot detect if the Shaarli version changed and run any update scripts, but that's hopefully not necessary for Shaarli.

This type of packaging seems to work for most php webapps. It's certainly not perfect and has a lot of redundancies, but for me it gets the job done.

Got a message from a #freifunk colleague that users are unable to change their password on our mailserver. They just get bounced back into the login form of our PostfixAdmin after submitting it. Quick check: Yes, I have the same problem. Even the admin login is broken. No idea when it broke. #NixOS allows me to quickly activate an old configuration and software by executing a script (/nix/var/nix/profiles/system-476-link/bin/switch-to-configuration test), so I went back 15 days. That old generation worked. First success. Switching only takes a couple of seconds unless you care about kernel, etc. which would require a reboot. So finding the exact generation where it broke only took me about 5 minutes. But what causes it? I already had a guess, as I saw which services changed, but I wanted to be sure: nix-store -qR /nix/var/nix/profiles/system-476-link | sort -t- -k2 gives me the complete list of all included files and software in that configuration. So I dumped the known-good and known-bad lists and diff'ed them. /nix/store/...-dovecot-2.3.10.1 vs. /nix/store/...-dovecot-2.3.11.3 and a couple of unrelated libraries. PostfixAdmin or PHP did not change. But PostfixAdmin uses Dovecot to check passwords, e.g. during login. PostfixAdmin uses a simple command defined in the configuration file, so it should be easy to verify. Of course it works as root, but as the user that PostfixAdmin is actually running:

[pfa@mail:~]$ /nix/store/...-dovecot-2.3.10.1/bin/doveadm pw -r 12
doveadm(pfadmin): Error: net_connect_unix(/var/run/dovecot/stats-writer) failed: Permission denied
Enter new password:
Retype new password:
{CRYPT}$2y$12$...

[pfa@mail:~]$ /nix/store/...-dovecot-2.3.11.3/bin/doveadm pw -r 12
doveconf: Fatal: Error in configuration file /etc/dovecot/dovecot.conf line 7: ssl_cert: Can't open file /var/lib/acme/mail.example.org/fullchain.pem: Permission denied

There's our culprit, Dovecot's new version breaks because it's unable to read a private key, which it doesn't even need for its current job. Apparently it's a known issue in Dovecot, as it has been reported on the Dovecot mailinglist about a week ago: https://dovecot.org/pipermail/dovecot/2020-August/119642.html There's even a workaround. Instead of specifying the ssl certificate in the config file, you move that part into a new config file that's only readable by root and use !include_try to include that file. Easy, right? Well, NixOS requires all config files to be world-readable (for users on that system). So I modified the dovecot service to create that root-only config file before starting. And PostfixAdmin is happy again and allows users to login and change their password.

mbuffer reads data from an input and writes it to one or more outputs. The more important thing though is the buffering, as you can just tell it to use X amount of memory as a buffer, the default is usually 2 MBytes. Typically the input is stdin and the output stdout, so it works well in a pipe like zfs send | mbuffer | zfs receive but files or TCP connections are possible as well, so it can replace cat and netcat. Tape drives and autoloader have some support as well, but I have no experience with that.

Pipes in shell scripts or directly on the command line are so common that you barely think about them, but they have a serious limitation. They block nearly instantly if the other end of the pipe is not ready to receive the data. This is good and bad at the same time. Good: The sender immediately notices that the receiver has failed, for example. Bad: Unless both ends of the pipe can send and receive data at the same time you lose throughput. Imagine a sender that can average 1 MByte/s output, while using a chunk size of 1 MByte, that means that the sender tries to shove 1 MBytes into the pipe as fast as possible, then has some work to do for nearly a second (e.g. waiting on a hard disk) and then tries to send another 1 MByte chunk. If the receiving end of the pipe does not use the exact same chunk size and e.g. has to do work after each 100 Kilobytes for about a tenth of a second, that means that the receiver is also able to accept about 1 MByte/s, but the reality will look completely different: The sender will generate 1 MByte data and sends the first 100 Kbyte through the pipe, then it locks up, waiting for the reciever for about 0.1 seconds, sending the next 100 Kbyte, waiting again... repeat ten times. In total generating this 1 MByte and sending it through the pipe will take roughly 2 seconds. Meaning that the data rate has just been halved. The generation only takes one second, while the 2nd second has just been wasted trying to shove that data into the pipe.

If you insert mbuffer between those two programs, it looks completely different. While the internal buffer in mbuffer isn't full, it will continue to accept new data from the sender. At the same time, whenever the receiver is ready to accept data, mbuffer will be able to send data from its internal buffer, unless it's empty. With the 2 MByte default buffer size that mbuffer is using, the above example should be running at roughly the full speed of 1 MByte/s, though with a data chunk size of 1 MBytes, I'd probably increase the buffer size to 4 MBytes, just to avoid possible choke points.

The downside is obvious: All that data has to be copied around in memory twice as much, increasing CPU usage. Another downside is that the sending process may think that the receiver got all the data, but actually it's just in mbuffer. This means that mbuffer mainly shines in scenarios where both sides have possible choke points, e.g. because they read and write to relatively slow disks or there's a somewhat unreliable network connection like Wifi in the middle that can choke because of retransmissions.

Another nice bonus: mbuffer displays a running status of the amount of data that went through the pipe:

$ mbuffer < /dev/zero > /dev/null
in @ 9529 MiB/s, out @ 9511 MiB/s, 18.8 GiB total, buffer   1% full ^C
mbuffer: error: error closing input: Bad file descriptor
mbuffer: warning: error during output to <stdout>: canceled
summary: 18.8 GiByte in  1.8sec - average of 10.4 GiB/s

Reading from /dev/zero and sending into /dev/null obviously is only useful as a benchmark, but it works as a nice example.

Homepage: http://www.maier-komor.de/mbuffer.html First version is from 2001 and it is still actively being developed, though the basic featureset has been stable for a long time now. It's available on at least Debian stable and NixOS.

#YetAnotherTool

direnv manages shell environments.

direnv is one of these tools that you basically setup once and after that forget that it's there. You just notice it when it does the job you set it up for and are happy it saves you a lot of hassle.

Enter a directory that you've configured with direnv and it will import things into your environment. That works well for e.g. programming languages where you need specific tools in your PATH or you just need an environment variable to point to a specific file in that environment, like the ANSIBLE_INVENTORY variable. Got two ansible environments in ~/ansible-test and ~/ansible-prod? Drop the following file as .envrc into each one:

ANSIBLE_INVENTORY="$(expand_path hosts)"

You can now cd ~/ansible-test/roles/sshkeys and when running ansible it will use ~/ansible-test/hosts as its inventory file.

Security is good, direnv only executes files that you have authorized by executing direnv allow in that directory. And if the file changes, you need to authorize the file again, so nobody can sneak in bad commands.

direnv also allows importing the environments of other tools like rbenv, Nix, Guix, rvm and node. With the Nix package manager it's even possible to install programs on demand. Add the line use nix -p ansible to the above .envrc and direnv will ensure that ansible is installed when you enter that directory. Leave that directory and ansible is gone again. I'm assuming you don't have it installed system-wide or in your user-profile.

Another way to use direnv comes from @tercean@chaos.social, as he puts it: direnv + git env vars = simple way to manage identities per customer

direnv really helps avoid cluttering your regular shell environment from single-use environment variables and you won't have to remember the names of the files to source to setup a specific environment anymore.

#YetAnotherTool

I kind of wonder why there's no decent user interface for pulseaudio – or if it exists, why it's unknown. Pulseaudio is pretty powerful, but the usability is bad. A simple graph application that lets you connect the dots would go a long way. I remember something like that on Windows about 15 years ago, you could throw in various inputs, outputs and filters and just connect them by dragging lines. Not sure if it still exists or is usable.

Here's what I want to do: 1. Play a game 2. Talk on the microphone 3. Listen to other people on Jitsi

So I have three inputs: Game, Jitsi, Microphone The game sound has to go to both headset and Jitsi recording. The Jitsi output only goes to my headset. The Microphone only goes to the Jitsi recording.

Pulseaudio can create the missing points in betweern very easily. You need two things here: 1. The sink name of your headset output, for me that's alsa_output.usb-Logitech_PRO_X_000000000000-00.analog-stereo. 2. The source name of your headset microphone, for me that's alsa_input.usb-Logitech_PRO_X_000000000000-00.mono-fallback. You can use the following two commands to find them:

# List current sinks:
pactl list short sinks
# List current sources:
pactl list short sources

Then you create the missing points. First the game sink, this will be the output that the game will use:

pactl load-module module-null-sink sink_name=game sink_properties=device.description=game

The you create the sink that Jitsi can record:

pactl load-module module-null-sink sink_name=streamout sink_properties=device.description=streamout

Finally you tell pulseaudio what audio needs to be sent where:

# Loop the microphone into streamout
pactl load-module module-loopback source=alsa_input.usb-Logitech_PRO_X_000000000000-00.mono-fallback sink=streamout
# Loop the game into streamout
pactl load-module module-loopback source=game.monitor sink=streamout
# Loop the game into headset
pactl load-module module-loopback source=game.monitor sink=alsa_output.usb-Logitech_PRO_X_000000000000-00.analog-stereo

Sadly at this point pulseaudio will already start to generate CPU load for copying around and resampling silence...

Now start your game (or any other application) and have it play some sound. Then start pavucontrol. On the first tab named “Playback”, you can find the currently playing applications. There should be a button on the right that lists the current output the game is using. Click it and select “game”. You should now hear it on your headset or whatever output you have decided to use. Next start Jitsi and have it record the “Monitor of streamout”. Finally verify that your microphone is working and you're done.