Part 2, presenting a pattern for a CI/CD pipeline implementation using GitlabCI for Continuous Integration and Continuous Deployment of Python projectsdistributed as wheel packages using PyPI.

As a general rule in my .gitlab-ci.yml implementations I try to limit bashcode to glue logic; just creating a bridge between the Gitlab pipeline“session” and some utility that forms the core of the job. As a rule of thumbI would say to limit bash code to no more than 5 lines; once you’re over thatthen it’s time to rethink and start implementing the scripting in a morestructured and maintainable language such as Python.

Jun 14, 2020 tests: our Python script to validate YAML syntax will go in here vars: YAML variable files gitlab-ci.yml: pipeline configuration file, this is where we define our stages and jobs. Test:pylint: image: python:3.6 script: - pip install pylint - quiet - pylint - ignored - classes=socketobject.py. Test:pylint is simply the name of the job. You can choose whatever you want. The rest of the code indicates that gitlab-runner should use the docker image python:3.6, and run the mentioned commands.

There are some people out there who have created unit test frameworks for bash,but the fundamental language limitations of bash remain and in my opinion theselimitations create numerous long term problems for maintainability that arebest avoided by the use of a more capable language. My personal preference atthis time is Python, but there are other choices that may be more suitable foryour environment and projects. For example if your projects are based ingolang then that may also be a suitable choice for your tooling, or similarlyfor other languages provided they have plenty of useful libraries for buildingpipeline related utilities. Just keep in mind that you want as few differentlanguages for developers to learn as possible; although this should not beconsidered justification for writing your pipeline tooling in C - that, Ithink, would just be masochism.

One other rule for pipelines is that developers should be able to run the samecode as Gitlab CI to test the pipeline jobs for themselves. This is anotherreason to minimize the lines of code in the job definitions themselves (andthus the bash code). In a Makefile based project it’s easy to set up thepipeline jobs so they just call a make target which is very easy for developersto replicate, although care must be taken with pipeline context in the form ofenvironment variables. In a Python project instead of introducing anotherDocker image dependency to acquire make, in this project I’ve just use Pythonscript files. Another consideration for developers running pipeline jobs that Ihave not explored much, is that Gitlab has the capability torun the pipeline jobs locally.

Project structure

It’s probably useful to have a quick outline of the project folder structure sothat you have an idea of the file system context the pipeline is operating in.

The dist folder is where flit deposits the wheel packages it builds. Thescripts folder is for pipeline related scripts. The tests folder is whereunit and integration tests are located; separate from the main project sourcecode. Having the tests folder separate from the main project source code is apractice I’ve only recently adopted but found it to be quite useful; becausethe tests are separate the import syntax naturally references the projectsource paths explicitly so you usually end up testing the “public” or“exported” interface of modules in the project.In a more complex project italso becomes easier to distinguish and separate unit tests from integrationtests from business tests. Finally packaging becomes easier; when tests areincluded in the main source tree in respective tests folders then flitautomatically packages them in the wheel. The tests are not very useful in thewheel package and make it a little larger (maybe lots larger if your tests havedata files as well).

Pipeline stages

In the download_3gpp Gitlab CI pipelineI’ve defined four stages pre-package, package, test and publish. InGitlab CI stages are executed in sequentially in the defined order and ifnecessary, dependency artifacts can be passed from a preceding stage to asubsequent stage.

The pre-package stage is basically “stuff that needs to be done beforepackaging” because the packaging (or subsequent stages) depends on it. Thetest stage must come after the package stage because the test stageconsumes the wheel package generate by the package stage. Once all the testshave passed then the publish stage can be executed.

Package release identity management

For Python package release identification I use this code snippet in the main__init__.py file

Trigger

The VERSION file is committed to source control with the content “0.0.0”. Thismeans that by default a generated package will not have a formal releaseidentity which is important for reducing ambiguity around which packages areactually formal releases. Developers can generate their own packages manuallyif needed and the package will acquire the default “0.0.0” release id whichclearly communicates that this is not a formal release.

In principal, for any release there must be a single, unique package thatidentifies itself as a release package using the semantic version for thatrelease. PerPEP-491, forPython wheel packages the package name takes the form <package name>-<PEP-440 release id>-<language tag>-<abi tag>-<platform tag>

The update_version job changes the VERSION file content slightly dependingon whether the pipeline is a tagged release, a “non-release” branch (master orfeature branch), or a special “dryrun” release tag. The modified VERSION fileis then propagated throughout the pipeline as a dependency in subsequent stages.

The dryrun tag takes the form <semantic version>.<pipeline number>-dryrun<int>.The intention of the dryrun tag is to enable as much as possible, testing thepipeline code for a release without actually doing the release. In this casethe dryrun tag acquires the semantic version part of the tag and appends thepipeline number as well to indicate that this is not a formal release. Inaddition we will see later that publishing the package does not occur for adryrun tag. The artifacts generated by a dryrun pipeline are still availablefor download and review via theartifacts capabilityof Gitlab.

For any pipeline that is not a tagged release the default “0.0.0” semanticversion is used along with the pipeline number. This clearly indicates that thegenerated package is not a formal release, but is an artifact resulting from apipeline.

Here is alink to update_version.pyfor your review. I’ve tried to keep this script extremely simple so that it iseasy to maintain, and contrary to my normal practice it does not have anytests. If it evolves to be any more complicated than it is now then it willneed unit tests and probably have to be migrated to it’s repo with packaging.

Package generation

The download_3gpp project presently only generates the wheel package forinstallation, but the packaging stage could also include documentation andother artifacts related to the complete delivery of the project. Thedownload_3gpp project only has the README.rst documentation for the moment.

Run tests

In the pipeline, after the package stage is the test stage. In myexperience this tends to be the busiest stage for Python projects as variousdifferent types of tests are run against the package.

Gitlab Pipeline Run Python Script

Test package install

The first test is to try installing the generated package to ensure thatflit is doing it’s job correctly with your pyproject.toml configuration:

It’s a fairly nominal test, but it is a valid test. Considering how many timesnow I’ve seen or heard about C/C++ code being distributed that literallydoesn’t compile I think it’s important to include even the most nominal testsin your pipeline, after all just compiling your code, even once, would be apretty nominal test.

Run unit tests (and integration, business tests)

This project doesn’t have integration or business tests so there is only onejob. If there were integration and business tests then I would create separatejobs for them so that the tests all run in parallel in the pipeline.

Test coverage analysis

I prefer to run the various coverage analyses in separate jobs, although theycould arguably be consolidated into a single job. If I recall correct therun-coverage cannot be consolidated because the stdout that is captured forthe Gitlab coverage reporting is suppressed when outputting to html or xml, sohaving all separate jobs seems cleaner to me.

The run-coverage-threshold-test job fails the pipeline if the test coverageis below a defined threshold. My rule of thumb for Python projects is that testcoverage must be above 90% and preferably above 95%. I usually aim for 95% butdepending on the project sometimes that needs to be relaxed a little to 90%.

Test code style formatting

Here’s where I’ve run into some problems. The implementations of black andisort sometimes havecompeting interpretations of code style formatting.If you’re lucky, your code won’t trigger the interaction, but most likely iteventually will.

Adding the isort configuration to pyproject.toml does help but doesn’t solvethe problem completely in my experience. I’ve ended up having to accept failurein the isort formatting job which further complicates things because caremust also be taken when running black and isort manually to acquire theformatting before commit; the black formatting must be run last to ensurethat it’s formatting takes precedence over isort.

A bit frustrating right now, but hopefully it will be resolved in the not toodistant future.

Publish to PyPi

In the past I’ve had the publish job only run on a release tag using the onlyproperty in the job. The problem with this is that the publish code is onlyrun on release, so if you make changes to the release job in a feature branchthen you won’t really know if the changes are broken until you actuallyrelease. Too many times we’ve had the release meeting, everyoneagrees that we’re ready to release, tag the release and the build fails. Thiscreates considerable sense of urgency at the eleventh hour (as it should)when everyone is now expecting the release artifacts to just emerge. Thiscan be avoided by ensuring that as much as possible of the release code isrun on the feature branch without actually publishing a release.

In the case of this pipeline, the only difference in the publishing job betweena release pipeline and non-release pipeline is the use of the flit publishcommand. In my experience it’s pretty easy to get this right; it’s much morelikely that you’ve forgotten to update job dependencies or some other “pipelinemachinery” which will now be exercised in a feature branch before you get to arelease.

Conclusions

I’ve described a set of Gitlab-CI job patterns for a simple Python project.Hope it’s useful.

For a more complex project that includes docstring documentation and Sphinxdocumentation then other considerations would include:

  • testing docstring style and running docstring test code
  • publishing documentation to readthedocs

I joined iib in January and since the beginning it was clear that I’d have a lot of fun automating a few manual process. The development team is formed by three persons: the CTO, another developer and me. The team is small but step by step we’re going towards the technical excellence we want. :)

Gitlab Ci Cd Manual Trigger

One of the things that needed automation was the release of an internal library. The process to install it was basically git pull + python setup.py install. Since we have many other things to do, this problem was waiting a little longer to be solved. But the things became ugly when I had to setup a test pipeline for Airflow (must say that now I understand why the most famous blog post on this topic is called Data’s Inferno: 7 Circles of Data Testing Hell with Airflow - but this discussion I’ll leave for another day).

At iib we use a self-hosted Gitlab instance, recently updated to the version 13. Gitlab offers a private Package Registry out of the box. The package registry was there, we had our project using poetry and a test pipeline. Publishing the package would be a piece of cake, right?

Where is this Package Registry after all?

I’m going to start by the most obvious thing that I took a few minutes to figure: you have to enable the Package Registry in your project in order to use it. Go to General > Visibility, project features, permissions > Repository > Packages.

Forbidden

As you can imagine, you’ll need the right permissions to upload the package to your own pypi. This can be done using CI_JOB_TOKEN, an variable provided by Gitlab during a pipeline run. Below you can see how we did it:

The username for the token `CI_JOB_TOKEN is gitlab-ci-token. Another important variable provided during the pipeline execution is CI_PROJECT_ID Bluestacks mac os. (you can see the number in the repository page too).

Automation FTW

The thing that got me really excited was the automation flow for our releases. With the job shown previously, we only needed to create a new tag and then a new version would be released. I added BUMP_RULE as an environment variable because we can control the rule used by just changing the repository environment variable. Poetry provides different options (following the SEMVER style).

Poetry publish #fail

Note that we used Twine instead of Poetry. Unfortunately it wasn’t possible to use poetry publish due to a different errors:

  • RuntimeError Repository gitlab is not defined (when having it configured in pyproject.toml)
  • UploadError HTTP Error 404: Not Found (when configuring the repository via CLI)

There are open issues for it, so I hope it will be possible in the future. By now, Twine is doing the trick just fine.

pip install

Since the package is published in the private package registry, we’ll need to adapt the command a little bit.

A trick here is: you’ll need the index URL and the extra index URL for the dependencies that are coming from Pypi. In the end the command will look like this:

Without the --extra-index-url you’ll get errors like ERROR: Could not find a version that satisfies the requirement.

Unfortunately, Gitlab’s documentation is missing some details but you can find their step by step here.

Hope it helps! See ya next time.

What Are Pipelines In Gitlab

Please enable JavaScript to view the comments powered by Disqus.comments powered by

Blueskyjunkie.ca › Python-ci-pipeline-part-2Gitlab CI/CD Pipeline For Open Source Python Projects, Part 2 ..

Disqus