Site Reliability Engineer Job at The Judge Group, Minneapolis, MN

YmpkbHNXWm5RYzViWWlDR2RyTDkzQWgrTEE9PQ==
  • The Judge Group
  • Minneapolis, MN

Job Description

Job Title: Site Reliability Engineer

Duration: Direct hire

Location: Hybrid Role - must be able to commit to 3 days/week in our Bloomington office

What you’ll be doing:

  • Collaborate with development and operations teams to design, implement, and maintain observability frameworks that provide deep insights into system performance, particularly for data and ML pipelines.
  • Lead the establishment of Service Level Objectives (SLOs) and Service Level Indicators (SLIs), ensuring they align with business goals and drive continuous performance improvements.
  • Partner with stakeholders to understand system performance requirements and translate them into actionable performance engineering strategies.
  • Proactively identify performance bottlenecks and collaborate with teams to implement solutions that enhance system scalability and reliability.
  • Design and execute performance regression test suites, focusing on data-intensive and ML workloads, to ensure continuous performance optimization.
  • Own the reliability and performance metrics of our systems, driving a culture of performance excellence and proactive issue resolution.
  • Collaborate with subject matter experts to gain a deep understanding of domain-specific performance challenges, particularly in data and ML pipelines.
  • Utilize tools like Datadog, Jira, and GitHub to monitor system performance, manage projects, and track issues, with a strong emphasis on performance-related metrics.
  • Define and monitor success metrics, ensuring our systems consistently meet or exceed performance and reliability targets.
  • Actively contribute to the continuous improvement of performance engineering practices across the team, fostering a culture of excellence in observability and system performance.
  • Perform other duties as assigned.

What you’ll bring to us:

  • Bachelor’s degree in computer science, Engineering, or a related field.
  • Five years of experience in a site-reliability-focused role responsible for establishing reliability standards in a cloud-native environment
  • Strong expertise in establishing SLOs/SLIs and building observability frameworks for complex systems.
  • Proficiency with cloud services, particularly AWS, and experience in designing scalable and reliable architectures.
  • Hands-on experience with performance monitoring and observability tools like Datadog.
  • Proficiency in version control systems like Git/GitHub and infrastructure as code tools like Terraform.

Job Tags

3 days per week,

Similar Jobs

ZincFive, LLC

Environmental Health Safety Specialist Job at ZincFive, LLC

 ...The Environmental, Health, and Safety (EHS) Specialist will be responsible for ensuring the safety and well-being of employees, compliance...  ...Working Conditions: This position requires working in both office and manufacturing settings. Occasional fieldwork or plant visits... 

Trailer Rental Company

CDL Driver - Local Semi-Truck Job at Trailer Rental Company

 ...operation. The company has been in operation since 1978 and is locally owned, reinvesting in the local economy. TRAILER RENTAL COMPANY...  ...Description This is a full-time on-site CDL Driver - Local Semi-Truck position located in Salt Lake City, UT. The driver will be... 

Purple Drive

UI Web developers Job at Purple Drive

Front End UI -with ReactJS/HTML exp is must.Full StackNodeJs with basic knowledge.Experience 3-5 years. 1. Should be good in Basic and Advanced(ES6) JavaScript concepts. 2. Apply the knowledge of #1 in problem solving. 3. CSS3/HTML5, how they work. 4. ReactJS...

Golden Corral

Meat Cutter Job at Golden Corral

 ...The Meat Cutters performance and the ability to cut for quality and quantity of product have a significant impact on the guests dining experience. Top quality meat cutting ensures that the restaurant uses its product to optimize profit and that the Golden Corral guest... 

Fathom Mfg

Sheet Metal Account Manager Job at Fathom Mfg

 ...operation, and lead generation. Responsible for sales associated administrative tasks including scheduling meetings with prospective customers, contact management, quoting, generating proposals, processing orders, collecting customer purchase orders, and preparing...