performance: expanduser with pathlib or os.path

Python3 provides a new fancy library to manage pretty much all the Path related operations This is a really welcome improvement since the before that we had to use a long list of unrelated modules.

I recently had to chose between Pathlib and os.path to expand a string in the ~/path format to the absolute path. Since the performance was important I took the time to benchmark the two options:

#!/usr/bin/env python3

import timeit

setup = '''
from pathlib import PosixPath
'''
with_pathlib = timeit.timeit("abs_remote_tmp = str(PosixPath('~/.ansible/tmp').expanduser())", setup=setup)

setup = '''
from os.path import expanduser
'''

with_os_path = timeit.timeit("abs_remote_tmp = expanduser('~/.ansible/tmp')", setup=setup)

print(f"with pathlib: {with_pathlib}\nwith os.path: {with_os_path}")

os.path is just about 4 times faster (x1000000) for this very specific case. The fact we need to instantiate a PosixPath object has an impact. Also, once again we observe a nice performance boost with Python 3.8 onwards.

Ansible collections and venv

I work on a large number of collections and in order to test them properly, I’ve to switch between the Python versions and the associated Pypi dependencies. Nothing special here, this is pretty much the life of all of us who work on the Ansible collections.

Initially, I was maintaining a set of clean Python virtual environments. Basically, one per version of Python. And I was juggling between then. Sadly, it’s easy to lose track of what’s going one. Every time I was switching to a different collection, I had to pull a new set of dependencies and the order was never the same.

I ended up being actually frustrated by the wasted time spent on looking at the pip freeze output to understand some oddity. It’s so easy to mess up the whole cathedral. A good example is that use a lot pip install -e git/something to install a local copy of a library. And as a result, any change there can potentially nuke the fragile little creature.

So now, I use another approach. I’ve got a script that spawn a virtual environment on the light, pull the right dependencies and initialize the shell. It may sounds like a trivial thing, but I actually use it several times every days and I don’t call pip freeze that much.

For instance if I need to work with Ansible 2.10 and Python 3.10, I just need to do:

$ cd .ansible/collections/ansible_collections/vmware/vmware_rest
$ source ~/bin/ansible-venv.fish 3.10 stable-2.10

and I’m ready to run ansible-playbook or ansible-test in my clean environment. And when I want to reinitialize the venv, I’ve just to remove the venv directory.

The script is here and depends on FishShell, my favorite Shell.

Demande de Résidence Permanente rejetée :-(

Notre demande de Résidence Permanente Canadienne a été rejetée par le Fédéral. Voici le courriel que nous avons envoyé a ma députée provinciale et en copie, mon député Fédéral. Je connais plusieurs personnes dans des situations similaires, voir même bien pire. J’ai décidé de partager notre témoignage pour que éventuellement les choses bougent et nous libérer mentalement. Voici donc le courriel avec quelques légèrement modifications.

Bonjour Madame la députée,

Nous sommes une famille Française installée à Verdun depuis 2015. Ma femme est une enseignante au Centre de services Scolaire de Montréal. Je suis ingénieur informaticien. Nous avons 3 enfants.
Nos revenus sont aux dessus de la moyenne Québecoise et nous sommes heureux de participer à l’économie locale et payer nos impots en conséquence.

Nous aimons le Québec, nous voulons y vivre. C’est pourquoi nous avons commencé les démarches pour être Résident Permanent en 2018.

  • 2018-12, demande du CSQ (PEQ)
  • 2019-02, nous avons reçu notre CSQ.
  • 2019-08, nous avons envoyé notre dossier au Fédéral.
  • 2020-03, nous n’avons AUCUNE communication de la part du Fédéral et on demande un permis de travail en urgence pour pouvoir continuer a travailler.
  • 2021-02, un message nous annonce que la preuve de payment qui accompagne notre dossier est manquante. Je réattache la preuve du payment de 960$ qu’on a fait en 2019, conformement aux grilles tarifaire de l’époque. Je suis convaincu que ce document était DÉJA dans le dossier initial.
  • 2021-02: apres 2 ans notre CSQ expire.
  • 2021-03, notre dossier est rejeté avec le message suivant:

ON DOIT TOUT REPRENDRE A ZERO, LE CSQ, PERMIS DE TRAVAIL, ET LA RP.

Nous avons perdu 3 ans pour ça. A aucun moment l’ICSS ne nous a contacté pour nous dire qu’il manque de l’argent. D’ailleurs on n’a toujours aucune idée du montant qui manque. Après 3 ans de procédure, nous aurions bien évidemment payé le plus vite possible pour pouvoir continuer.

D’un point de vu personnel, nous sommes dévastés. Ma femme attend la RP pour aller à l’université et compléter l’équivalance Québecoise de son diplome d’enseignante française. Ces démarches d’immigration sont longues et pénibles, tout reprendre à zéro est épuisant mentalement. Le fait d’être traité avec mépris par l’administration est humiliant. Ce dossier n’est peut être pas important pour eux, mais il est au centre de notre vie pour nous et le sujet
revient régulièrement lorsqu’on parle de notre future au Québec. Ça fait 3 jours que nous ne dormons mal, je suis en congés car je n’arrive pas à travailler.


A cela s’ajoute les temps de traitement que nous constatons. A la vitesse ou vont les choses, ça repouse notre RP de 3 ou 4 ans, notre grande fille sera arrivée au CEGEP et nous aurons toujours un statut précaire! C’est dur de se projeter dans ces conditions.

Mes questions:

  • Le gouvernement cherche à recruter des enseignants étrangers, comment peuvent ils s’assurer que les dossiers d’immigration soit traités rapidement.
  • Pourquoi les dossiers PEQ ne sont pas traités en priorités au niveau Fédéral?
  • L’Immigration Québecois est manifestement mieux organisée, pourquoi il n’y a pas un système d’appel pour ce genre de situation abusive au niveau Fédéral?
  • Pourquoi le gouvernement Québecois ne fait pas pression sur le Féderal pour qu’il améliore la situation?
  • Serait-il possible que le gouvernement du Québec allonge la durée des CSQ déjà produits de 2 à 4 ans pour éviter notre situation où on se retrouve a reprendre l’ensemble des démarches, alors que notre situation Québecois n’a pas changée.
  • Pourquoi le Québec ne participe pas au programme Express Entry?

Merci pour votre temps.
— 
    Gonéri Le Bouder

0ad, Firewall and Fedora33

By default, the firewall will prevent connection to your 0ad server. To adjust that, you need to open up the port 20595 (UDP). This three lines create a Firewalld service called 0ad, attach it to the default zone and reload the firewall:

$ sudo firewall-cmd --permanent --new-service=0ad --set-description="0ad, A free, open-source game of ancient warfare" --add-port=20595/udp
$ sudo firewall-cmd --zone=FedoraWorkstation --add-service=0ad --permanent
$ sudo firewall-cmd --reload

eGPU, Wayland, Gnome3 and Fedora 33

I’ve just got a Razor Core X that I use with a Radeon graphic card. By default Wayland, well Mutter actually, continue to use the Intel card of my T580.

To force it to use the second card, I had to add a udev rules and reboot. And that’s all!

$ cat /etc/udev/rules.d/61-mutter-primary-gpu.rules
ENV{DEVNAME}=="/dev/dri/card1", TAG+="mutter-device-preferred-primary"

note: You need Gnome 3.38.2 for this to work properly. See: https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1562

How to waste a friday…

Yesterday morning I got frustrated by a really slow download speed of some files. What should have taken seconds with my 400mb/s connection actually takes more than 16 minutes. In addition, I’m able to reproduce the problem on my router.

Here curl statistics show a 26:52 minutes long download at 415kb/s:

$ curl -o /dev/null https://s3.us-east-2.amazonaws.com/zuul-images/fedora-open-vm-tools-livecd.iso
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
0 627M 0 2684k 0 0 398k 0 0:26:52 0:00:06 0:26:46 415k^C

Just to be sure, I retry with a lower MTU. The issue remains. Two friends in Montréal also confirm everything works well for them.

My laptop uses a wire connection that I’ve been using for years now. A large part of the population works from home and the Internet is under constant pressure.So I suspected an ISP bandwidth limitation to preserve the Quality of Service. I contacted them and … complaint… a lot.

At some point, the technician manages to get my attention and asks me to connect the modem directly to my laptop.

This is an obviously pointless request since the downloads are also slow from the router. But well, if I want them to act, I also need to be cooperative. I give it a try and… I immediately felt embarrassed. It was damn fast! The download speed is actually even above the 400mb/s ceiling. WTF.

So I start to reconsider my whole life in a deep introspection. How can this be real? My router runs OpenBSD 6.7 with an absolutely basic pf configuration. Its hardware is decent (Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz). The router is connected to the modem with a very common Realtek RTL8125 and the LAN is connected to an Intel I350. I can download large files between my laptop and the router at full speed.

All this takes out of the equation my laptop, the wire, the router’s I350. The last potential culprit is the Realtek NIC. I replace it with another Intel I350 and retried. Same problem. The downloads are still slow on my laptop and the router. The Realtek NIC is innocent.

So I start thinking. Why the rest of the family is not really affected by the poor performances of the Internet connection. And so, I try to download the same file from another computer. And it’s fast… Damn, what’s going on. I switch some ports of the switch just to be sure. The problem is now between my laptop, the wire.

My laptop uses a bonding of the Wifi and the Wire connection. The gives me the ability to move around with the laptop without losing all my open connections. For instance, I can remove my wire in the middle of a meeting and the laptop will use the Wifi seamlessly. But well, I disgress. I remove this special configuration. And, … the problem remains.

But during this last check, I also saw a large number of RX errors associated with the laptop NIC. Interesting, let’s try another NIC. I plug in a USB3 Realtek RTL8156 NIC and this time… it works.

The wire is a CAT6 cable and is not that long (20m), but it sounds like the NIC (Intel I219-LM) of my T580 is a bit picky with the quality of the signal. It can also be a problem with the e1000e drive of the new 5.10 kernel. The cable is good, I’ve just tested it. Anyway I’ve put a switch just before my laptop NIC and everything works great now.

I’m still not sure why by download are still slow on OpenBSD. But this is an adventure for another day. The slow downloads were all with HTTPS sites (S3 and a Caddy website), the DF flag was on (TCP Don’t Fragment) which exaserves the impact of the transmission errors.

I found the whole situation to be interesting. It’s a series of wrong assumptions and the solution is really far from what I would have imagined.

Also, thank you Teksavvy for your great support.

update: this would partially explain the OpenBSD S3 download problem.

update2: I’m now running Linux 5.10.11 and… I don’t see any RX errors anymore! The S3 download is just as fast as it should be. So this was indeed a problem with the e1000e driver.

update3: The problem is back, I’m not sure if it’s an hardware limitation of the NIC itself. I now use the Realtek NIC all the time.

My Github workflow

The Ansible community uses Github to develop ansible-core and most of the Ansible Collections. The only exception I know is the Openstack’s ansible-collection-openstack which uses a Gerrit (ansible-collections-openstack).

So, as an Ansible developer, my normal day-to-day activities involve a lot of GitHub interactions. I review Pull Request (PR) and prepare new PR all the time.

Before joining Ansible, I was working with Gerrit which is a nice alternative solution to collaborate on a stream of patches.

In Gerrit, each patch from a branch is a PR. Everytime we update a patch, its sha2 changes, and so Gerrit tracks them with a dedicated ID called Change-id. It looks like an extra line in the body of the commit message. e.g:

     Change-Id: Ic8aaa0728a43936cd4c6e1ed590e01ba8f0fbf5b

Gerrit provides a tool called git-review to pull and push the patches. When a contributor pushes a series of patches, each patch is correctly tracked by Gerrit and updates the right existing PR. This allows the contributor to reorganize the patches, change the order of series or import a patch from another branch.

With GitHub, a branch is a PR and most of the time, the projects prefer to use the branch to trace the iteration of the PR:

  • my fancy feature
  • fix: correct the test-suite
  • fix: fix the fix
  • fox: typo in previous commit
  • bla

And this is fine, because most of the time, the branch will ultimately be squashed (one branch -> one Git commit) during the final merge.

GitHub workflow is certainly more friendly for newcomers but it tends to be a source of complexity when you want to work on several PR at the same time. For instance, I work on a new feature, but I also want to cherry-pick an experimental commit from a contributor. In this case I must remove this commit before I push my branch back on GitHub, or the extra commit will end-up in my feature branch.

Another example, if I’m working on a feature branch and find an issue with something unrelated, I need to switch to another branch to commit by fix and push it. This is cumbersome and often people just prefer to merge the fix in their feature branch which leads to confusion and questions during the code review.

To simplify, Gerrit allows better code modularity but also implies a better understanding of Git  which is annoying when we try to attract new contributors. This is the reason why we use the current workflow.

To address the problem I wrote a script called push-patch (https://github.com/goneri/push-patch). I use it to push just my commits. For instance, I work on this branch:

  • 1: doc: explain how to do something
  • 2: typo: adjust a little details
  • 3: a workaround for issue #19 that should not be merged

The two first commits are not directly related with the feature I’m implementing. And I would like to submit them immediately.

push-patch will allow me to only push the change 1 and 2 in two dedicated PR. Both branches will be based on main and can be merged independently.

$ push-patch 1
$ push-patch 2

Now, and that’s the cool part 😋! Let’s imagine I want to push another revision of my first patch, I can use “git rebase -i” to adjust this commit and use push-patch again to use the updated patch.

$ vim foo
$ git add foo
$ git rebase --continue
$ ./push-patch 1

Internally push-patch uses git-notes to trace the remote branch of the patch. The Public-Branch field traces the name of the branch in my remote clone of the project and Pr-Url is the URL of the PR in the upstream project. e.g:

commit 1198db8807ebf9f4099598bcd41df25d465cbcae (HEAD -> main)
Author: Gonéri Le Bouder <goneri@lebouder.net>
Date:   Thu Jan 7 11:31:41 2021 -0500

   elb_application_lb: enable the functional test
    
   Remove the `unsupported` aliases for the `elb_application_lb` test.
    
   Use HTTP instead of HTTPS to avoid the dependency on
   `iam:ListServerCertificates` and the other Certificate related operations.

Notes:
   Public-Branch: elb_application_lb-enable-the-functional-test_24328
    
   PR-Url: https://github.com/ansible-collections/community.aws/pull/348

This means that even if the patch content evolves, push-patch will still be able to continue to update the right PR.

In a nutshell, for each patch it will:

  1. clone the project and switch on the main branch
  2. read the patch notes
    1. if a branch name already exists it will use it, otherwise it will create a new one
  3. switch to the branch
  4. cherry-pick the patch
  5. push the branch

push-patch expects just the sha2 of the commit to push. It also accepts a list of sha2. This is the reason why I often type thing like that:

push-patch $(git log -2 –pretty=tformat:%H)

The command passes to push-patch the SHA2 of the two last commits. It will push them in the two associated branches upstream. And at the end, I can use git log, or better tig, to get the URL of the Github review.

Right now, the command is a shell script and depends on the hub command. I would like to rewrite it with a better programming language.

What about you? Do you also use some special tools to handle your PR?

Ansible: How we prepare the vSphere instances of the VMware CI

As explain quickly in CI of the Ansible modules for VMware: a retrospective, the Ansible CI uses OpenStack to spawn ephemeral vSphere labs. Our CI tests are run against them.

A full vSphere deployment is a long process that requires quite a lot of resources. In addition to that, vSphere is rather picky regarding its execution environment.

The CI of the VMware modules for Ansible runs on OpenStack. Our OpenStack providers use kvm based hypervisor. They expect image in the qcow2 format.

In this blog post, we will explain how we prepare a cloud image of vSphere (also called golden image).

a full lab running on libvirt

First thing, get an large ESXi instance

The vSphere (VCSA) installation process depends on an ESXi. In our case we use a script and Virt-Lightning to prepare and run an ESXi image on Libvirt. But you can use your own ESXi node as soon as it respects the following minimal constraints:

  • 12GB of memory
  • 50GB of disk space
  • 2 vCPUs

Deploy the vSphere VM (VCSA)

For this, I use my own role called goneri.ansible-role-vcenter-instance. It delegates to the vcsa-deploy command deployment. As a result, you don’t needany human interaction during the full process. This is handy if you want to deploy your vSphere in a CI environment.

At the end of the process, you’ve got a large VM running on your ESXi node.

In my case, all these steps are handled by the following playbook: https://github.com/virt-lightning/vcsa_to_qcow2/blob/master/install_vcsa.yml

Tune-up the instance

Before you shut down the freshly created VM, you would like to do some adjustment.
I use the following playbook for this: prepare_vm.yml

During this step, I ensure that:

  • Cloud-Init is installed,
  • the root account is enabled with a real shell,
  • the virtio drivers are available

Cloud-Init is the de-facto tool that handle all the post-configuration tasks that we can expect from a Cloud image: inject the user SSH key, resize the filesystem, create an user account, etc.

By default, the vSphere VCSA comes with a gazillion of disks, this is a problem in the case of a cloud environment where an instance is associated with a single disk image.
So I also move the content of the different partitions in the root filesystem and adjust the /etc/fstab to remove all the reference to the other disks. This way I will be able to only maintain on qcow2 image.

All these steps are handled by the following playbook: prepare_vm.yml

Prepare the final Qcow2 image

At this stage, the VM is still running, so I shut it down.
Once this is done, I extract the raw image of the disk using the curl command:

curl -v -k --user 'root:!234AaAa56' -o vCenterServerAppliance.raw 'https://192.168.123.5/folder/vCenter-Server-Appliance/vCenter-Server-Appliance-flat.vmdk?dcPath=ha%252ddatacenter&dsName=l
ocal'
  • root:!234AaAa56 is my login and password
  • vCenterServerAppliance.raw is the name of the local file
  • 192.168.123.5 is the IP address of my ESXi
  • vCenter-Server-Appliance is the name of the vSphere instance vCenter-Server-Appliance-flat.vmdk is the associated raw disk

The local .raw file is large (50GB), ensure you’ve got enough free space.

You can finally convert the raw file to a qcow2 file. You can use Qemu’s qemu-img for that, it will work fine BUT the image will be monstrously large. I instead use virt-sparsify from the libGuestFS project. This command will reduce the size of the image to the bare minimum.

virt-sparsify --tmp tmp --compress --convert qcow2 vCenterServerAppliance.raw vSphere.qcow2

Conclusion

You can upload the image in your OpenStack project with following command:

openstack image create --disk-format qcow2 --file vSphere.qcow2 --property hw_qemu_guest_agent=no vSphere

If your OpenStack provider uses Ceph, you will probably want to reconvert the image to a flat raw file before the upload. With vSphere 6.7U3 and before, you need to force the use of a e1000 NIC. For that, add --property hw_vif_model=e1000 to the command above.

I’ve just done done the whole process with vSphere 7.0.0U1 in 1h30 (Lenovo T580 laptop). I use the ./run.sh script from https://github.com/virt-lightning/vcsa_to_qcow2, which auotmate everything.

The final result is certainly not supported by VMware, but we’ve already run hundreds of successful CI jobs with this kind of vSphere instances. The CI prepares a fresh CI lab in around 10 minutes.