Anonymize logs produced by docker

tl;dr: The handling with the GDPR-requirements for the anonymization of protocol files that are produced by Docker seems to be a not trivial task. Help is available here in the form of syslog-ng which offers various configuration options as a replacement for syslog / rsyslogd.

Contents

During the last weekend, I made the interesting discovery, how hard it is, to anonymize logfiles, generated by docker. While there is plenty of documentation for the larger webservers (e.g. Nginx or Apache), the number of people, who try to anonymize docker logs seems to be small.

Docker allows you, to configure the logging adapter that is used. By default, all logs are written into json files (adapter: json-file) and you don’t get a chance to modify them in the process. The journald/systemd community seems to be completely ignorant on this topic (even though GDPR is quite a thing…)1

I ended up with syslog-ng which is a dropin replacement for syslog or rsyslogd and provides a good support for both custom filters and rewrite operations. A good introdution on the topic of anonymized logs in syslog-ng can be found on moblog2.

Setup

To separate all docker logs from other system logs, I opted for a custom socket that is used by docker to publish log events. Each event is then rewritten using a regex that replaces the last part of any IP with a zero.

First you need to install syslog-ng and then create a file in /etc/syslog-ng/conf.d which contains the definition (e.g. docker.conf):

Match and rewrite ip

rewrite r_ip {
  subst('\b(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.(1?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\b',
  "$1\.$2\.$3\.0]", value("MESSAGE"), type("pcre"), flags("global"));
};

# Open local port 1000 for docker logs
source s_net { tcp(ip(127.0.0.1) port(1000)); };
destination d_docker { file("/var/log/docker.log"); };

# Apply chain
log { source(s_net);  rewrite(r_ip); destination(d_docker); };

Now you can enable the logging-adapter by default in /etc/docker/daemon.json:

{
  "log-driver": "syslog",
  "log-opts": {
    "syslog-address": "tcp://127.0.0.1:1000",
    "tag": "{{.ImageName}}/{{.Name}}/{{.ID}}"
  }
}

Note: the tag is optional, but should be configured as otherwise you’ll only get the ID of the docker-container in your logs. Other possible tags are documented3.

As soon as you now restart both dockerdand syslog-ng, the new logfile will be created and any logs written there.

Note: if you start docker container manually, you’ll be notified, that no output is displayed due to the chosen log adapter. You can override the manually both in docker4 and docker-compose5.

Quick excursus: Filtering

As a goody, you can also use the filter(...) operation to filter out logs, that you are not interested in. Filters are applied to fields of the log entry. Some of the available fields are:

  • host(…)
  • message(…)
  • program(…)
# Filter
filter f_foo { not message(".*Foobar.*") };

# Apply chain
log { source(s_net); filter(f_foo); rewrite(r_ip); destination(d_docker); };

Footnotes

Tags

Comments

Related

Autoscaling GitLab Runner Instances on Google Cloud Platform

Migrating GitLab CI jobs to Google Cloud Platform is possible with little effort due to the good support provided by GitLab and relieves the load off your hardware.
This can be worthwhile even for small projects or private GitLab instances without generating major costs.

Kubernetes Logging with fluentd and logz.io

By using logz.io, it is relatively easy to outsource Kubernetes logfiles that do not contain sensitive data to an external service and analyze them there with Kibana.

Spring Boot Application in OpenShift / OKD

Now that we have packaged an existing Spring Boot application into a Docker Image, we can deploy it to a Kubernet cluster as well.
In this example the additional features of OpenShift/OKD are used to enable a continuous deployment of the application.

Dockerize Spring Boot Applications

It’s quite easy to run a Spring Boot Application inside a Docker Container. Here, however, some pitfalls should be considered so that you can draw the maximum benefits from this.

Setup a Kubernetes Cluster with Ansible

Although all large Cloud provider nowadays offer Managed Kubernetes Clusters, I prefer to have access to a local cluster especially during development. In this post, we will setup a Kubernetes Cluster using Ansible and Kubeadm. The cluster will include a single master node and two (or more) worker nodes. Most of the work done here is based on a tutorial by bsder1. Requirements I will use three Ubuntu 18.04 LTS (Bionic Beaver) servers, each with 4GB RAM and 2 CPUs, you should also be fine with 1GB RAM.