Navigating the Waters of CI/CD: A Journey Through Continuous Integration in GitLab

August 21, 2024, 5:29 pm
Grafana
Grafana
AnalyticsCloudDataDatabaseEnterprisePlatformTechnology
Location: United States, New York
Employees: 201-500
Founded date: 2014
Total raised: $534M
In the fast-paced world of software development, Continuous Integration (CI) has become a lifeline. It’s the bridge that connects code changes to deployment, ensuring that software remains robust and reliable. This article dives into the evolution of CI processes within a monolithic Python repository, exploring the challenges faced and the solutions implemented along the way.

When I joined EXANTE as a Software Development Engineer in Test (SDET), the CI landscape was a bit like a ship caught in a storm. The pipeline had a single stage for tests, where each service's tests were run as separate jobs. The time taken for these jobs varied wildly—from five minutes to an hour. This inconsistency was a ticking time bomb, waiting to disrupt the workflow.

Our team structure was another layer of complexity. Small groups of testers, often just one to four people, were assigned specific services. This division sometimes left less experienced testers struggling, while seasoned colleagues were too swamped to address architectural concerns. It was clear that we needed a more streamlined approach.

**Docker: The Lifeboat**

To tackle compatibility issues across different operating systems, we decided to encapsulate our testing environment within Docker. This decision was akin to building a lifeboat—offering safety and stability amid the turbulent waters of software dependencies.

Creating a Dockerfile was straightforward, but the image size of 1.3 GB was a concern. A leaner image would be ideal, but for now, it sufficed. The Dockerfile included essential components, such as Python 3.9 and necessary libraries, ensuring that our tests could run smoothly in a controlled environment.

We also introduced a mechanism to rebuild the Docker image whenever changes were made to configuration files. This automation ensured that our testing environment was always up-to-date, reducing the manual overhead and minimizing the risk of errors.

**Allure TestOps: The Compass**

Next, we integrated Allure TestOps, a platform for managing and analyzing test results. This tool became our compass, guiding us through the fog of test data. It allowed us to visualize results and track metrics effectively.

To send data to Allure, we had to authenticate and initiate a job run. This process required careful orchestration within our CI pipeline. We created a preparatory stage to handle the necessary setup, ensuring that all data was collected and passed seamlessly to the testing jobs.

**Collect Stage: The Safety Net**

One of the critical issues we faced was ensuring that all tests were operational before merging code changes. Too often, code that worked for one service would break another. To address this, we introduced a "collect" stage, which used pytest’s `--collect-only` option. This stage gathered a list of tests without executing them, acting as a safety net to catch potential issues early.

By implementing this stage, we significantly reduced the time spent fixing trivial errors. It provided immediate feedback, allowing developers to address issues before they escalated.

**Static Code Analysis: The Quality Inspector**

To maintain code quality, we introduced a static analysis stage using linters. This step was crucial in standardizing our coding style and catching common errors. Initially, we set the linter to allow failures, giving the team time to adjust. Over time, we tightened the rules, ensuring that only code meeting our standards could pass through.

The introduction of pre-commit hooks further enhanced our process. These hooks automatically ran linters before code was committed, ensuring that only clean code made it into the repository.

**Smoke Tests: The Early Warning System**

As we refined our CI process, we recognized the need for smoke tests—quick checks to ensure that critical functionalities were intact. Identifying which tests qualified as smoke tests was challenging, given the interconnected nature of our services. However, we managed to tag essential tests and run them in a dedicated stage.

This early warning system allowed us to catch critical failures quickly, significantly speeding up our feedback loop. If a smoke test failed, we knew to halt further testing, saving time and resources.

**Health Checks: The Lifeguard**

With the migration of services to Kubernetes, we implemented health checks to monitor service availability. These checks became our lifeguard, ensuring that we didn’t attempt to run tests against services that were down. By placing health checks before the main testing phase, we could quickly identify whether the testing environment was ready.

This proactive approach reduced the number of false negatives in our testing results, allowing the team to focus on genuine issues rather than infrastructure problems.

**Conclusion: A Steady Course Ahead**

The journey through CI/CD is ongoing. Each step we took—from Docker integration to implementing health checks—has strengthened our process. We’ve transformed a chaotic pipeline into a well-oiled machine, capable of delivering reliable software.

As we continue to navigate these waters, we remain committed to refining our practices. The lessons learned will guide us as we face new challenges, ensuring that our CI/CD process remains robust and responsive to the ever-changing landscape of software development. In this voyage, every wave of change is an opportunity to improve, and every storm is a chance to emerge stronger.