JUMP TO CONTENT
  1. Hybrid
  2. București
  3. Individual Contributor
  4. Procurement & Supply Chain Tech

Site Reliability Engineer - M.Store

Job description

Company Description

Passion for food. Hunger for tech. We make METRO digital.   

Today technology is driving the world. And at METRO.digital we are driving the technology for one of the leading international wholesalers specializing in food - METRO. From e-commerce to checkout, to delivery software, we work on a wide range of products to make each day a success for our customers and colleagues. With passion and ownership, we build the future of wholesale.    

How will you make an impact?   

We are seeking a Site Reliability Engineer with strong experience in building and maintaining scalable, resilient systems. The ideal candidate will have hands-on expertise in cloud-native technologies, infrastructure such as code, observability, and automation, with a focus on Google Cloud Platform (GCP).

Please note that this role is open also for candidates in Brasov and Cluj.


Job Description

Your Responsibilities:   

  • Ensure the stability and reliability of cloud-native applications deployed on GCP, containerized with Docker and orchestrated via Kubernetes.
  • Define, implement, and monitor SLOs, SLAs, and SLIs to measure system performance and user experience.
  • Automate infrastructure provisioning using Terraform and manage Kubernetes configurations with Kustomize and Helm.
  • Develop and maintain monitoring and alerting systems using Datadog and GCP-native tools.
  • Conduct incident analysis and postmortems to drive continuous improvement.
  • Collaborate with development teams to integrate reliability practices into CI/CD pipelines using GitHub Actions.
  • Manage and troubleshoot database systems, particularly PostgreSQL and Cassandra.
  • Apply networking knowledge and Linux system administration skills to troubleshoot and optimize system connectivity and performance.

Qualifications

  • Educational background in Computer Science, Software Engineering, or equivalent practical experience.
  • 5+ years of experience in Site Reliability Engineering.
  • Proven experience designing and operating elastic, resilient systems in cloud environments.
  • Strong understanding of GCP, Kubernetes, and container orchestration.
  • Proficiency in infrastructure as code and configuration management tools (Terraform, Helm, Kustomize).
  • Experience with monitoring and observability tools (Datadog, GCP Monitoring).
  • Solid scripting skills in bash and familiarity with automation frameworks.
  • Experience with CI/CD pipelines, especially using GitHub Actions.
  • Familiarity with networking fundamentals and troubleshooting.
  • Strong coding skills and ability to develop reliability-focused tooling.
  • Strong problem-solving skills and a process-oriented mindset.
  • Ability to work independently and collaboratively in a fast-paced environment.
  • Passion for clean code, automation, and continuous improvement.
  • Experience working within Agile/Scrum development teams.
  • Very good fluency in English (written and spoken).
  • Nice-to-Have: Availability for oncall.

Additional Information

This resonates with you? Apply now! 

What we offer at METRO.digital?   

  • Flexible and remote work: create your own schedule!   

Flexibility defines the way we work and interact with each other. At METRO.digital, you have the possibility to work remotely and adapt your working hours in a very flexible way.   

  • People development: when you grow so do we!    

We want you to become the best version of yourself with individual and company-wide programs and trainings for people development. Focused among other on development, leadership, appreciation ... it´s time to upskill your career.    

  • Support with individual solutions: we are people-caring!   

Life is full of surprises, full of challenges and we want to support you – whenever YOU need - at an individual level and during every stage of your life.  

Want to know more about all our benefits? Discover more here.  

Let´s connect soon. Apply for the role now!

Position grade within our career framework: Site Reliability Engineer T4 (Md8).

  1. Full Time
  2. Hybrid
  3. București
  4. Individual Contributor
  5. Procurement & Supply Chain Tech

Browse Jobs