The German Federal Employment Agency is accelerating the digitization of its service offerings to provide even better services to customers in a changing world of work. The agency relies on SUSE solutions for both the high-availability operation of its existing IT applications and the agile deployment of new container-based services. This strategy paid off, especially during the COVID-19 crisis.
Five thousand systems with SUSE Linux Enterprise Server
As an internal service provider, the agency’s IT System House operates one of the largest IT infrastructures in Germany. More than 10,000 servers and over 120 specialized procedures run the agency’s diverse tasks across three data centers designed for redundancy. The majority of these applications were developed in-house — there is simply no off-the-shelf software for the agency’s complex and very specific processes. The agency’s IT infrastructure also includes more than 170,000 networked end devices for employees and visitors at advice and information centers.
“Over the past 10 years, our IT landscape has changed and evolved incredibly,” reports Bayer. “We set the course early on to digitize even more workflows and offer better service for our customers. Open source technology played a central role in this from the very beginning.”
An important step in this IT modernization process was the migration of mission-critical applications from Solaris, HP-UX and Windows to SUSE Linux Enterprise Server (SLES). In 2013, the agency’s IT System House migrated the first Oracle databases and application servers to the SUSE platform. This enabled the IT organization to replace costly proprietary server architectures with standardized x86 systems, significantly reducing the cost of purchasing and maintaining server hardware. The cost of running database servers alone was reduced by more than 80%.
Of the agency’s 10,000 servers, about 5,000 are now running SLES. “The platform has performed extremely well over the past few years,” says Bayer. “SLES is a strategic operating system for us and provides us with a solid foundation for delivering highly available IT services at a low cost of ownership. Our good experiences have also strengthened our resolve to continue to focus specifically on open source technology and to combine this with professional vendor support.”
Introducing the Federal Employment Agency
With skill shortages, demographic changes and digitization, the world of work in Germany is undergoing fundamental change. Germany’s Bundesagentur für Arbeit (Germany’s Federal Employment Agency) plays a central role in meeting these challenges. The agency’s 95,000 employees, headquartered in Nuremberg, accompany people through all stages of their careers, helping them find the right jobs and training. At the same time, the agency provides employers with comprehensive support in training and recruiting skilled workers.
Every day, employees at the agency’s 800 offices and job centers conduct around 14,000 consultations and make 55,000 placement proposals. In addition, there are 95,000 customer telephone calls, and discussions take place with around 15,000 visitors in career information centers.
In normal labor market conditions, the agency does an enormous amount of work for its customers, however, what the agency’s employees faced in the spring of 2020 dwarfed anything that had come before. After the outbreak of the COVID-19 pandemic, the agency was struggling with the biggest crisis for the German labor market since World War II. All efforts were now directed at preserving as many jobs as possible and securing the livelihoods of people and companies.
The most important labor market policy instrument in this regard was short-time work. While, in April 2019, just over 2,400 companies received short-time working benefits for around 50,000 employees, this number exploded a year later as a result of the COVID-19 crisis. In April 2020, more than 610,000 companies in Germany applied for short-time benefits for over 6 million employees. Processing hundreds of thousands of applications in just a few weeks and making all the payments on time was an incredible challenge for the agency.
To complicate things further, the agency was also affected by the pandemic. More than half of the 95,000 employees started working from home at short notice, while still having to be available for customers and companies. At peak times, the agency recorded more than 1 million call attempts on a single day, and many millions of email inquiries had to be answered as quickly as possible during this time, too. Being able to continue working under these conditions was only possible because the agency has an IT infrastructure in the background that employees can rely on at all times.
“Especially in the exceptional situation of the pandemic, it became apparent that we had made many correct decisions regarding our IT in the past years,” says Frank Bayer, senior architect for operating systems and container services at the agency’s IT System House. “On the one hand, we were able to ensure stable operations despite the enormous rush — and on the other hand, we were also able to respond quickly to new requirements.”
“From our point of view, Rancher Prime is clearly the most advanced and comprehensive management tool for managing multiple Kubernetes clusters, especially in an environment with high security requirements.”
More agility with container technology
While the agency’s main focus until 2016 was on standardizing its IT infrastructure and making operations more efficient, a new goal subsequently came into focus: increasing IT agility. “We wanted to shorten project runtimes, accelerate innovations and respond more quickly to new requirements from our customers,” reports Bayer.
To increase the pace of development and deployment of new digital services, the IT System House relied on agile methods and container technology. Monolithic applications were to be replaced by flexibly deployable microservices. The new application architecture was also intended to drastically shorten update cycles. New releases and functions should no longer be published three to four times a year, but rather every 14 days.
To get started with agile software development, those responsible chose a key project right away — the relaunch of the agency’s website and the digital services integrated there. “Our website is now the first point of contact for many of our customers,” emphasizes Bayer. “Younger people in particular don’t want to visit their local employment agency to make an application but instead want to complete as many formalities as possible online. The new Online Access Act will also require us to make a large part of our services available digitally on the web in the future.”
As part of the web relaunch, some 65 digital services were redeveloped as containerized applications. The current website brings these together on a unified and intuitive interface. No matter what topic a visitor is interested in, it takes no more than three clicks to land on a desired service.
To deploy the containers and manage the cluster infrastructure, the agency’s IT System House initially used Apache Mesos and Mesosphere DC/OS. “In 2016, this was a state-of-the-art solution that could meet our requirements,” says Bayer. As a result, the software developers also began modernizing business applications for internal processes and deploying them on the platform. This included, for example, the agency’s placement software, which is used by tens of thousands of users every day. Here, too, the goal was to achieve faster time-to-market for new features and to be able to scale as easily as possible as the load increased.
From Mesos to Kubernetes and Rancher Prime
In 2019, however, the agency’s IT System House undertook a reassessment of its container platform. The market had evolved, and Kubernetes was now established as the de facto standard for container orchestration and management. “There were now also higher requirements in the security area that we could not cover with Mesos by default — for example, in micro segmentation,” explains Bayer.
The agency is responsible for processing highly sensitive social data and has therefore had its information security certified in accordance with ISO 27001 / IT-Grundschutz. As a critical infrastructure operator (CRITIS), the agency also now undergoes a CRITIS audit every year. The IT System House also had to implement the stricter security requirements when operating the containerized applications and comprehensively protect the services from uncontrolled access.
Against this background, the team decided to make a strategic switch to Kubernetes and a suitable management platform. Together with the development department and security officers, the operations team conducted a market survey and ultimately tested six products in detail in proof-of-concept installations. In addition to functionality, cost and market relevance, the team evaluated data protection, security and complexity.
The final choice was Rancher Prime. “From our point of view, Rancher Prime is clearly the most advanced and comprehensive management tool for managing multiple Kubernetes clusters, especially in an environment with high security requirements,” summarizes Bayer. “In addition, the solution was able to convince us economically. The subscription fees for Rancher Prime are about 60% lower than the costs incurred for the previous solution with Mesos.”
During the conceptual design, implementation and internal knowledge transfer, B1 Systems and Fujitsu supported the agency — this ensured a smooth project flow right from the start.
Deploying the Rancher Prime solution on SLES was very easy, and the first clusters were deployed in a very short time. “It was really impressive how quickly we were able to deploy the solution,” says Bayer, “But the low barrier to entry should not obscure the fact that Rancher Prime is a very powerful tool that provides us with a complete technology stack for Kubernetes management.”
Managing a multicluster architecture securely and efficiently
In the agency’s large and complex container landscape, Rancher Prime was able to play to its strengths right from the start. The solution opened up entirely new possibilities for the IT System House, in terms of setting up and managing a multicluster architecture.
An important aspect is the stronger separation of the clusters. Instead of setting up a few large clusters with several thousand nodes, the individual specialist domains are now each provided with their own clusters for their processes and the underlying microservices. If there are problems on a cluster, only a few procedures would now be affected — all other procedures would simply continue to run.
Via Rancher Prime, the agency’s IT System House can centrally monitor and manage all clusters. This minimizes the operational effort considerably — by up to 70%, according to Bayer: “For example, we can apply patches automatically to all clusters in our environment. Previously, we would have had to touch and update each cluster individually to do this.”
With Rancher Prime’s integrated monitoring, the operations team always has an eye on all clusters. When problems occur, they can use Grafana dashboards to immediately pinpoint the error and begin troubleshooting. Is it the node? Or the application? Or was the load too high in the meantime? Rancher Prime provides the operations team with all the relevant information, greatly simplifying troubleshooting.
User authentication and access control can also be managed centrally across all clusters with Rancher Prime. “We used to have to implement separate user management rules for each cluster,” reports Bayer. “Today, we can very easily apply consistent user access policies to all clusters. Rancher Prime accesses our Active Directory and existing user roles directly for this purpose.”
Finally, it is quite crucial that the new environment fully meets the agency’s increased security requirements. The service mesh integrated in Rancher Prime based on Istio plays a key role here. Developers can use it to granularly specify that services from different clients may only communicate with each other after approval. In addition, the service mesh also provides further options for controlling requests to microservices. For example, specific browsers or even specific users can be directed to specific containers based on rules.
“This helps us, for example, to implement pilot projects for selected departments or locations,” says Bayer. “We can test innovative services very easily, without taking risks with security and stability.”
Implementing innovations even faster
The agency completed migrating containerized applications to the new platform at the end of 2021. Around 600 services with 24,000 containers are now running on Rancher Prime and Kubernetes. This includes newly developed applications, such as chatbot tools that help the agency’s website users search for information and apply for benefits.
“Even though the migration has only just been completed, it is already clear that we have chosen the right platform for the future,” summarizes Bayer. “Rancher Prime and SLES are the perfect combination for our requirements. From the operating system to container management, we now get support for the entire technology stack from a single source. SUSE puts us in an excellent position to take the next steps on our digital transformation journey.”