Until recently, I was not really paying attention to the version of Python I was using with Ansible, this as soon as it was Python3. The default version was always good enough for Ansible.
During the last weeks, I spent the majority of my time working on the performance the community.kubernetes collection. The modules of these collection depend on a large library (OpenShift SDK) and Python needs to reload it before every task execution. The goal was to benefit from what is already in place with vmware.vmware_rest: See: my AnsibleFest presentation.
And while working on this, I realized that my metrics were not consistent, I was not able to reproduce some test-cases that I did 2 months ago. After a quick investigation, the Python version matters much more than expected.
To compare the different Python versions, I decided to run some tests.
The target host is a t2.medium instance (2 vCPUS, 4GiB) running on AWS. And the Operating system is Fedora 33, which is really handy for this because it ships all the Python versions from 3.6 to 3.10!
For the first test, I use a Python one-liner to evaluate the time Python takes to load the OpenShift SDK. This is one of the operations that I want to optimize for my work and so it matters a lot to me.
The slide deck of my presentation for AnsibleFest 2020. It focus on the modules designed to interact with a remote service (REST, SOAP, etc). In general these modules just wrap a SDK library, the presentation explains how to improve the performance. I actually use this strategy ( ansible_turbo.module ) with the vmware.vmware_rest collection to speed up the modules.
This post compares the start-up duration of the most popular Cloud images.By start-up, I mean the time until we’ve got an operational SSH server.
For this test, I use a pet project called Virt-Lightning ( https://virt-lightning.org/ ). This tool allow any Linux user to start standard Cloud image locally. It will prepare the meta-data and start a VM in your local libvirt. It’s very handy for people like be, who work on Linux and spend the day starting new VM. The image are in the QCow2 format, and it uses the OpenStack meta-data format. Technically speaking, the performance should match what you get with OpenStack.
Actually, OpenStack is often slightly slower because it does some extra operations. It may need to create a volume on Ceph, or prepare extra network configuration.
The 2.0.0 release of Virt-Lighnint exposes a public API. My test scenario is built on top of that. It uses Python to pull the different images, and creates a VM from it 10 times in a row.
All the images are public, Virt-Lightning can fetch them with the vl pull foo command:
vl pull centos-6
During the boot process, the VM will set-up a static network configuration, resize the filesystem, create an user, and inject a SSH key.
By default, Virt-Lightning uses a static network configuration because it’s faster, and it gives better performance when we start a large number of VM at the same time. I choose to stick with this.
I did my tests on my Lenovo T580, which comes with a NVMe storage, and 32GB of memory. I would be curious to see the results with the same scenario, but on regular spinning disk.
The target images
For this test, I compare the following Linux distributions: CentOS, Debian, Fedora, Ubuntu and OpenSUSE. As far as I know, there is no public Cloud image available for the other common distributions. If you think I’m wrong, please post a comment below.
I also included the last FreeBSD, NetBSD and OpenBSD releases. They don’t provide official Cloud Images. This is the reason why, I reuse the unofficial ones from https://bsd-cloud-image.org/.
The lack of pre-existing Windows image is the reason why this OS is not included.
Debian 10 is by far the fastest image with an impressive 15s on average. Basically, 5s less than any other Cloud Image.
Regarding the BSD, FreeBSD is the only system able to resize the root filesystem without a reboot. Consequently, OpenBSD and NetBSD need to start two times in a row. This explains to big difference. The NetBSD kernel hardware probe is rather slow, for instance it takes 5s to initialize the ATA bus of the CDROM. This is the reason why the results look rather bad.
About Ubuntu, I was surprised by the boot duration of Ubuntu 18.04. It is about two times longer than for 16.04. 20.04 is bit better but still, we are far from the 15s of 14.04. I would be curious to know the origin of this. Maybe AppArmor?
CentOS 6 results are not really consiste. They vary between 17.9s and 25.21s. This is the largest delta if you compare with the other distribution. This being said, CentOS 6 is rather old, and won’t be supported anymore at the end of the year.
All the recent Linux images are based on systemd. It would be great to extract the metrics from systemd-analyze to understand what impact the performance the most.
Most of the time, when I deploy a test VM, the very first thing I do is the installation of some import packages. This scenario may be covered later in another blog post.