0ad, Firewall and Fedora33

By default, the firewall will prevent connection to your 0ad server. To adjust that, you need to open up the port 20595 (UDP). This three lines create a Firewalld service called 0ad, attach it to the default zone and reload the firewall:

$ sudo firewall-cmd --permanent --new-service=0ad --set-description="0ad, A free, open-source game of ancient warfare" --add-port=20595/udp
$ sudo firewall-cmd --zone=FedoraWorkstation --add-service=0ad --permanent
$ sudo firewall-cmd --reload

eGPU, Wayland, Gnome3 and Fedora 33

I’ve just got a Razor Core X that I use with a Radeon graphic card. By default Wayland, well Mutter actually, continue to use the Intel card of my T580.

To force it to use the second card, I had to add a udev rules and reboot. And that’s all!

$ cat /etc/udev/rules.d/61-mutter-primary-gpu.rules
ENV{DEVNAME}=="/dev/dri/card1", TAG+="mutter-device-preferred-primary"

note: You need Gnome 3.38.2 for this to work properly. See: https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1562

How to waste a friday…

Yesterday morning I got frustrated by a really slow download speed of some files. What should have taken seconds with my 400mb/s connection actually takes more than 16 minutes. In addition, I’m able to reproduce the problem on my router.

Here curl statistics show a 26:52 minutes long download at 415kb/s:

$ curl -o /dev/null https://s3.us-east-2.amazonaws.com/zuul-images/fedora-open-vm-tools-livecd.iso
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
0 627M 0 2684k 0 0 398k 0 0:26:52 0:00:06 0:26:46 415k^C

Just to be sure, I retry with a lower MTU. The issue remains. Two friends in Montréal also confirm everything works well for them.

My laptop uses a wire connection that I’ve been using for years now. A large part of the population works from home and the Internet is under constant pressure.So I suspected an ISP bandwidth limitation to preserve the Quality of Service. I contacted them and … complaint… a lot.

At some point, the technician manages to get my attention and asks me to connect the modem directly to my laptop.

This is an obviously pointless request since the downloads are also slow from the router. But well, if I want them to act, I also need to be cooperative. I give it a try and… I immediately felt embarrassed. It was damn fast! The download speed is actually even above the 400mb/s ceiling. WTF.

So I start to reconsider my whole life in a deep introspection. How can this be real? My router runs OpenBSD 6.7 with an absolutely basic pf configuration. Its hardware is decent (Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz). The router is connected to the modem with a very common Realtek RTL8125 and the LAN is connected to an Intel I350. I can download large files between my laptop and the router at full speed.

All this takes out of the equation my laptop, the wire, the router’s I350. The last potential culprit is the Realtek NIC. I replace it with another Intel I350 and retried. Same problem. The downloads are still slow on my laptop and the router. The Realtek NIC is innocent.

So I start thinking. Why the rest of the family is not really affected by the poor performances of the Internet connection. And so, I try to download the same file from another computer. And it’s fast… Damn, what’s going on. I switch some ports of the switch just to be sure. The problem is now between my laptop, the wire.

My laptop uses a bonding of the Wifi and the Wire connection. The gives me the ability to move around with the laptop without losing all my open connections. For instance, I can remove my wire in the middle of a meeting and the laptop will use the Wifi seamlessly. But well, I disgress. I remove this special configuration. And, … the problem remains.

But during this last check, I also saw a large number of RX errors associated with the laptop NIC. Interesting, let’s try another NIC. I plug in a USB3 Realtek RTL8156 NIC and this time… it works.

The wire is a CAT6 cable and is not that long (20m), but it sounds like the NIC (Intel I219-LM) of my T580 is a bit picky with the quality of the signal. It can also be a problem with the e1000e drive of the new 5.10 kernel. The cable is good, I’ve just tested it. Anyway I’ve put a switch just before my laptop NIC and everything works great now.

I’m still not sure why by download are still slow on OpenBSD. But this is an adventure for another day. The slow downloads were all with HTTPS sites (S3 and a Caddy website), the DF flag was on (TCP Don’t Fragment) which exaserves the impact of the transmission errors.

I found the whole situation to be interesting. It’s a series of wrong assumptions and the solution is really far from what I would have imagined.

Also, thank you Teksavvy for your great support.

update: this would partially explain the OpenBSD S3 download problem.

update2: I’m now running Linux 5.10.11 and… I don’t see any RX errors anymore! The S3 download is just as fast as it should be. So this was indeed a problem with the e1000e driver.

My Github workflow

The Ansible community uses Github to develop ansible-core and most of the Ansible Collections. The only exception I know is the Openstack’s ansible-collection-openstack which uses a Gerrit (ansible-collections-openstack).

So, as an Ansible developer, my normal day-to-day activities involve a lot of GitHub interactions. I review Pull Request (PR) and prepare new PR all the time.

Before joining Ansible, I was working with Gerrit which is a nice alternative solution to collaborate on a stream of patches.

In Gerrit, each patch from a branch is a PR. Everytime we update a patch, its sha2 changes, and so Gerrit tracks them with a dedicated ID called Change-id. It looks like an extra line in the body of the commit message. e.g:

     Change-Id: Ic8aaa0728a43936cd4c6e1ed590e01ba8f0fbf5b

Gerrit provides a tool called git-review to pull and push the patches. When a contributor pushes a series of patches, each patch is correctly tracked by Gerrit and updates the right existing PR. This allows the contributor to reorganize the patches, change the order of series or import a patch from another branch.

With GitHub, a branch is a PR and most of the time, the projects prefer to use the branch to trace the iteration of the PR:

  • my fancy feature
  • fix: correct the test-suite
  • fix: fix the fix
  • fox: typo in previous commit
  • bla

And this is fine, because most of the time, the branch will ultimately be squashed (one branch -> one Git commit) during the final merge.

GitHub workflow is certainly more friendly for newcomers but it tends to be a source of complexity when you want to work on several PR at the same time. For instance, I work on a new feature, but I also want to cherry-pick an experimental commit from a contributor. In this case I must remove this commit before I push my branch back on GitHub, or the extra commit will end-up in my feature branch.

Another example, if I’m working on a feature branch and find an issue with something unrelated, I need to switch to another branch to commit by fix and push it. This is cumbersome and often people just prefer to merge the fix in their feature branch which leads to confusion and questions during the code review.

To simplify, Gerrit allows better code modularity but also implies a better understanding of Git  which is annoying when we try to attract new contributors. This is the reason why we use the current workflow.

To address the problem I wrote a script called push-patch (https://github.com/goneri/push-patch). I use it to push just my commits. For instance, I work on this branch:

  • 1: doc: explain how to do something
  • 2: typo: adjust a little details
  • 3: a workaround for issue #19 that should not be merged

The two first commits are not directly related with the feature I’m implementing. And I would like to submit them immediately.

push-patch will allow me to only push the change 1 and 2 in two dedicated PR. Both branches will be based on main and can be merged independently.

$ push-patch 1
$ push-patch 2

Now, and that’s the cool part 😋! Let’s imagine I want to push another revision of my first patch, I can use “git rebase -i” to adjust this commit and use push-patch again to use the updated patch.

$ vim foo
$ git add foo
$ git rebase --continue
$ ./push-patch 1

Internally push-patch uses git-notes to trace the remote branch of the patch. The Public-Branch field traces the name of the branch in my remote clone of the project and Pr-Url is the URL of the PR in the upstream project. e.g:

commit 1198db8807ebf9f4099598bcd41df25d465cbcae (HEAD -> main)
Author: Gonéri Le Bouder <goneri@lebouder.net>
Date:   Thu Jan 7 11:31:41 2021 -0500

   elb_application_lb: enable the functional test
   Remove the `unsupported` aliases for the `elb_application_lb` test.
   Use HTTP instead of HTTPS to avoid the dependency on
   `iam:ListServerCertificates` and the other Certificate related operations.

   Public-Branch: elb_application_lb-enable-the-functional-test_24328
   PR-Url: https://github.com/ansible-collections/community.aws/pull/348

This means that even if the patch content evolves, push-patch will still be able to continue to update the right PR.

In a nutshell, for each patch it will:

  1. clone the project and switch on the main branch
  2. read the patch notes
    1. if a branch name already exists it will use it, otherwise it will create a new one
  3. switch to the branch
  4. cherry-pick the patch
  5. push the branch

push-patch expects just the sha2 of the commit to push. It also accepts a list of sha2. This is the reason why I often type thing like that:

push-patch $(git log -2 –pretty=tformat:%H)

The command passes to push-patch the SHA2 of the two last commits. It will push them in the two associated branches upstream. And at the end, I can use git log, or better tig, to get the URL of the Github review.

Right now, the command is a shell script and depends on the hub command. I would like to rewrite it with a better programming language.

What about you? Do you also use some special tools to handle your PR?

Ansible: How we prepare the vSphere instances of the VMware CI

As explain quickly in CI of the Ansible modules for VMware: a retrospective, the Ansible CI uses OpenStack to spawn ephemeral vSphere labs. Our CI tests are run against them.

A full vSphere deployment is a long process that requires quite a lot of resources. In addition to that, vSphere is rather picky regarding its execution environment.

The CI of the VMware modules for Ansible runs on OpenStack. Our OpenStack providers use kvm based hypervisor. They expect image in the qcow2 format.

In this blog post, we will explain how we prepare a cloud image of vSphere (also called golden image).

a full lab running on libvirt

First thing, get an large ESXi instance

The vSphere (VCSA) installation process depends on an ESXi. In our case we use a script and Virt-Lightning to prepare and run an ESXi image on Libvirt. But you can use your own ESXi node as soon as it respects the following minimal constraints:

  • 12GB of memory
  • 50GB of disk space
  • 2 vCPUs

Deploy the vSphere VM (VCSA)

For this, I use my own role called goneri.ansible-role-vcenter-instance. It delegates to the vcsa-deploy command deployment. As a result, you don’t needany human interaction during the full process. This is handy if you want to deploy your vSphere in a CI environment.

At the end of the process, you’ve got a large VM running on your ESXi node.

In my case, all these steps are handled by the following playbook: https://github.com/virt-lightning/vcsa_to_qcow2/blob/master/install_vcsa.yml

Tune-up the instance

Before you shut down the freshly created VM, you would like to do some adjustment.
I use the following playbook for this: prepare_vm.yml

During this step, I ensure that:

  • Cloud-Init is installed,
  • the root account is enabled with a real shell,
  • the virtio drivers are available

Cloud-Init is the de-facto tool that handle all the post-configuration tasks that we can expect from a Cloud image: inject the user SSH key, resize the filesystem, create an user account, etc.

By default, the vSphere VCSA comes with a gazillion of disks, this is a problem in the case of a cloud environment where an instance is associated with a single disk image.
So I also move the content of the different partitions in the root filesystem and adjust the /etc/fstab to remove all the reference to the other disks. This way I will be able to only maintain on qcow2 image.

All these steps are handled by the following playbook: prepare_vm.yml

Prepare the final Qcow2 image

At this stage, the VM is still running, so I shut it down.
Once this is done, I extract the raw image of the disk using the curl command:

curl -v -k --user 'root:!234AaAa56' -o vCenterServerAppliance.raw '
  • root:!234AaAa56 is my login and password
  • vCenterServerAppliance.raw is the name of the local file
  • is the IP address of my ESXi
  • vCenter-Server-Appliance is the name of the vSphere instance vCenter-Server-Appliance-flat.vmdk is the associated raw disk

The local .raw file is large (50GB), ensure you’ve got enough free space.

You can finally convert the raw file to a qcow2 file. You can use Qemu’s qemu-img for that, it will work fine BUT the image will be monstrously large. I instead use virt-sparsify from the libGuestFS project. This command will reduce the size of the image to the bare minimum.

virt-sparsify --tmp tmp --compress --convert qcow2 vCenterServerAppliance.raw vSphere.qcow2


You can upload the image in your OpenStack project with following command:

openstack image create --disk-format qcow2 --file vSphere.qcow2 --property hw_qemu_guest_agent=no vSphere

If your OpenStack provider uses Ceph, you will probably want to reconvert the image to a flat raw file before the upload. With vSphere 6.7U3 and before, you need to force the use of a e1000 NIC. For that, add --property hw_vif_model=e1000 to the command above.

I’ve just done done the whole process with vSphere 7.0.0U1 in 1h30 (Lenovo T580 laptop). I use the ./run.sh script from https://github.com/virt-lightning/vcsa_to_qcow2, which auotmate everything.

The final result is certainly not supported by VMware, but we’ve already run hundreds of successful CI jobs with this kind of vSphere instances. The CI prepares a fresh CI lab in around 10 minutes.

Ansible and k8s: How to get the K8S_AUTH_API_KEY value?

The community.kubernetes collection accepts an api_key parameter that may sounds a bit confusing. It’s actually the value of the token of a serviceaccount. It’s actually an OAuth 2.0 (Bearer) token, it’s associated with a user and a secret key. It’s rather similar to what we can do with a login and a password.

In this example, we want to run our playbook as the k8sadmin user. We need to find the token associated with the user. The are actually looks for the a secret. You can list them this way:

[root@kind-vm ~]# kubectl -n kube-system get secret
NAME                                             TYPE                                  DATA   AGE
foobar                                           Opaque                                0      5h3m
foobar-token-w8lmt                               kubernetes.io/service-account-token   3      5h15m
foobar2-token-hpd6f                              kubernetes.io/service-account-token   3      5h9m
generic-garbage-collector-token-l7hvk            kubernetes.io/service-account-token   3      25h
horizontal-pod-autoscaler-token-sssg5            kubernetes.io/service-account-token   3      25h
job-controller-token-dnfds                       kubernetes.io/service-account-token   3      25h
k8sadmin-token-bklpd                             kubernetes.io/service-account-token   3      5h40m

The use the -n parameter to specific the kube-system namespace. Our system account is in the list, it’s k8sadmin-token-bklpd. We can see the content of the token with this command:

[root@kind-vm ~]# kubectl -n kube-system describe secret k8sadmin-token-bklpd
Name:         k8sadmin-token-bklpd
Namespace:    kube-system
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: k8sadmin
             kubernetes.io/service-account.uid: 412bf773-ca8e-4afa-a778-dac0f11b7807

Type:  kubernetes.io/service-account-token

namespace:  11 bytes
token:      eyJhbGciO(...)2A
ca.crt:     1066 bytes

Here, you're done. The token is in the command output. You need now to pass its content to Ansible. Just keep in mind the token needs to remain secret. So it's a good idea to encrypt it with Ansible Vault.
You can use the K8S_AUTH_API_KEY environment variable to pass the token to the k8s_* modules:

$ K8S_AUTH_API_KEY=eyJhbGciO(…)2A ansible-playbook my_playbook.yaml

Ansible: Performance Impact of the Python version

Until recently, I was not really paying attention to the version of Python I was using with Ansible, this as soon as it was Python3. The default version was always good enough for Ansible.

During the last weeks, I spent the majority of my time working on the performance the community.kubernetes collection. The modules of these collection depend on a large library (OpenShift SDK) and Python needs to reload it before every task execution. The goal was to benefit from what is already in place with vmware.vmware_rest: See: my AnsibleFest presentation.

And while working on this, I realized that my metrics were not consistent, I was not able to reproduce some test-cases that I did 2 months ago. After a quick investigation, the Python version matters much more than expected.

To compare the different Python versions, I decided to run some tests.

The target host is a t2.medium instance (2 vCPUS, 4GiB) running on AWS. And the Operating system is Fedora 33, which is really handy for this because it ships all the Python versions from 3.6 to 3.10!

I use the last stable version of Ansible (2.10.3) that I install with pip in a Python virtual environment. The list of the dependencies present in the virtualenvs.

Finally, I deploy Kubernetes on Podman with Kubernetes Kind.

For the first test, I use a Python one-liner to evaluate the time Python takes to load the OpenShift SDK. This is one of the operations that I want to optimize for my work and so it matters a lot to me.

for i in $(seq 100); do
echo ${i}
venv-${python}/bin/python -c 'from openshift.dynamic import DynamicClient' >> /dev/null 2>&1
view raw run.sh hosted with ❤ by GitHub

Here the loading is done 100 times in a row.

The result shows a steady improvement of the performance since Python 3.6.

time (sec)48.40145.08841.75140.92440.385

With this test, the loading of the SDK is 16.5% faster with Python 3.10.

The next test does the same thing, but this time through Ansible. My test uses the following playbook:

hosts: localhost
gather_facts: false
kind: Pod
with_sequence: count=100
view raw gistfile1.yaml hosted with ❤ by GitHub

It runs the k8s_info module 100 times in a row. In addition, I also use an ansible.cfg with the following content. This way, ansible-playbook returns a nice output of the task execution duration:

callback_whitelist = ansible.posix.profile_tasks
view raw ansible.cfg hosted with ❤ by GitHub
time (sec)85.580.575.3575.0571.19

It’s a 16.76% boost between Python 3.6 and Python 3.10. I was not expecting such tight correlation between the two tests.

While Python is obviously not the faster technology out there, it’s great to see how its performance are getting better release after release. Python 3.10 is not even released yet and looks promising.

If your playbooks use some modules with dependency on large Python library, it may be interesting to give a try to the lastest Python versions.

And for those who are still running Python 2.7, I get a 49.2% the performance boost between 2.7 and 3.10.

How to start minishift on Fedora-33

update: minishift is bascially dead and won’t support OpenShift 4. You probably want to use crc instead.

I just spent to much time trying to start minishift with my user account. After 3h fighting with permission issues between libvirt and the ~/.minishift directory, I’ve finally decided to stay sane and use sudo… This is how I start minishift on my Fedora-33.

Install and start libvirt

sudo dnf install -y libvirt qemu-kvm
sudo systemctl start libvirtd

Fetch and install minishift and the kvm driver

sudo dnf install -y origin-clients
curl -L https://github.com/minishift/minishift/releases/download/v1.34.3/minishift-1.34.3-linux-amd64.tgz|sudo tar -C /usr/local/bin -xvz minishift-1.34.3-linux-amd64/minishift  --strip-com
sudo curl -L https://github.com/dhiltgen/docker-machine-kvm/releases/download/v0.10.0/docker-machine-driver-kvm-centos7 -o /usr/local/bin/docker-machine-driver-kvm
sudo chmod +x /usr/local/bin/docker-machine-driver-kvm

Start minishift (with sudo…)

sudo minishift

And configure the local user

sudo cp -r /root/.kube /home/goneri
sudo chown -R goneri:goneri /home/goneri/.kube

How to speed up your (API client) modules

The slide deck of my presentation for AnsibleFest 2020. It focus on the modules designed to interact with a remote service (REST, SOAP, etc). In general these modules just wrap a SDK library, the presentation explains how to improve the performance. I actually use this strategy ( ansible_turbo.module ) with the vmware.vmware_rest collection to speed up the modules.

How we use auto-generate content in the documentation of our Ansible Collection


Most of the content of the vmware.vmware_rest collection is auto-generated. This article focuses on the documentation and explains how we build it.

Auto-generated example blocks

This collection comes with an exhaustive series of functional tests. Technically speaking, these tests are just some Ansible playbooks that we run with ansible-playbook. They should run all the modules and ideally, in all the potential scenarios (e.g: create, modify, delete). If the playbooks execution is fine, the test is successful and we assume the modules are in a consistent state.

We can hardly generate the content of documentation but these playbooks are an interesting source of inspiration since they actually cover and go beyond all the use-cases that we want to document.

Our strategy is to record all the tasks and their results in a directory. And our documentation will just point on this content. This provides two interesting benefits:

  • We know our examples work fine because it’s actually the output of the CI.
  • When the format of a result changes, our documentation will take it into account automatically.

We import these files in our git repository, git-diff shows us the difference between the previous version. It’s an opportunity to spot a regression.

Cooking the collection

How do we collect the tasks and the results?

For this, we use a callback plugin ( https://github.com/goneri/ansible-collection-goneri.utils ). The configuration is done using three environment variables:

  • ANSIBLE_CALLBACK_WHITELIST=goneri.utils.collect_task_outputs: Ask Ansible to load the callback plugin.
  • COLLECT_TASK_OUTPUTS_COLLECTION=vmware.vmware_rest: Specify the name of the collection.
  • COLLECT_TASK_OUTPUTS_TARGET_DIR=/somewhere: Target directory where to write the results.

When we finally calls the ansible-playbook command, the callback plugin will be loaded, record all the interaction of the vmware.vmware_rest modules and store the results in the target directory.

The final script looks like that:

#!/usr/bin/env bash
set -eux

export ANSIBLE_CALLBACK_WHITELIST=goneri.utils.collect_task_outputs
export COLLECT_TASK_OUTPUTS_COLLECTION=vmware.vmware_rest
export COLLECT_TASK_OUTPUTS_TARGET_DIR=$(realpath ../../../../docs/source/vmware_rest_scenarios/task_outputs/)
export INVENTORY_PATH=/tmp/inventory-vmware_rest
source ../init.sh
exec ansible-playbook -i ${INVENTORY_PATH} playbook.yaml

The documentation

Like a lot of Python project, Ansible uses ReStructuredText for it’s documentation. To include our samples we use the literalinclude directive. The result looks like that, the includes are done line 3 and 8:

Here we use ``vcenter_datastore_info`` to get a list of all the datastores:

.. literalinclude:: task_outputs/Retrieve_a_list_of_all_the_datastores.task.yaml


.. literalinclude:: task_outputs/Retrieve_a_list_of_all_the_datastores.result.json

This is how the final result looks like:

And the RETURN blocks?

Each Ansible module is supposed to come with a RETURN block ( https://docs.ansible.com/ansible/latest/dev_guide/developing_modules_documenting.html#documentation-block ) that describe the output of the module. Each key of the module output is documented in this JSON structure.
The RETURN section and the task result above should be consistent. We can actually reformat the result and generate a JSON structure that matches the RETURN block expectation.
Once this is done, we just need to inject the content in the module file.

We reuse the task results in our modules with the following command:

./scripts/inject_RETURN.py ~/.ansible/collections/ansible_collections/vmware/vmware_rest/docs/source/vmware_rest_scenarios/task_outputs/ ~/git_repos/ansible-collections/vmware_rest/ --config-file config/inject_RETURN.yaml