Site Reliability Engineer Job at Altimetrik, Mountain View, CA

YWRCZ2I0ckc2Q0NPeCt1aUZSay9JYm92OFE9PQ==
  • Altimetrik
  • Mountain View, CA

Job Description

Job Description: SRE Support Engineer (L2/L3)

Position: SRE Support Engineer (L2/L3)

Experience: 5+ Years

About the Role:

We are seeking a talented and experienced SRE Support Engineer (L2/L3) to join our dynamic team. This role involves providing operational support, troubleshooting, and ensuring the smooth functioning of our systems. The ideal candidate will have strong expertise in Java, Python, DevOps tools, Groovy scripting, AWS Lambda, and AIOps, with a focus on automation and operational excellence.

Key Responsibilities:

L2/L3 Support

  • Provide advanced troubleshooting for production systems and applications.
  • Resolve complex technical issues escalated from L1 support teams.
  • Perform root cause analysis and implement permanent fixes.

Site Reliability Engineering (SRE)

  • Monitor system performance and proactively address potential issues.
  • Enhance system reliability, availability, and scalability through automation.
  • Design and implement robust incident management processes.

Development & Scripting

  • Write and maintain Java and Python scripts to support operations.
  • Develop Groovy scripts for CI/CD pipelines and automation.
  • Build tools and scripts for system performance optimization.

Cloud & Infrastructure

  • Design and maintain solutions using AWS Lambda and other AWS services.
  • Troubleshoot cloud-based environments and applications.
  • Optimize cloud infrastructure for cost and performance.

AIOps (Nice to Have)

  • Leverage AIOps tools to predict and prevent operational issues.
  • Implement machine learning models to automate routine tasks and identify anomalies.

Required Skills & Experience:

  • Programming Languages: Strong knowledge of Java and Python.
  • Scripting: Hands-on experience with Groovy for automation and CI/CD.
  • DevOps: Familiarity with tools such as Jenkins, Docker, Kubernetes, Git, and Terraform.
  • AWS Expertise: Strong experience with AWS Lambda, EC2, S3, CloudWatch, and IAM.
  • Troubleshooting: Proficient in diagnosing and resolving complex system issues.
  • AIOps: Experience with AIOps tools (e.g., Dynatrace, AppDynamics, Splunk) is a plus.
  • Soft Skills: Strong problem-solving, communication, and collaboration skills.

Preferred Qualifications:

  • Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field.
  • Experience in incident management and on-call rotations.
  • Familiarity with Agile and DevOps methodologies.
  • Certifications in AWS, DevOps, or relevant technologies are a plus.

Job Tags

Permanent employment,

Similar Jobs

PIMCO

Business Strategy & Analytics Associate Job at PIMCO

 ...this has led to PIMCO being recognized as an innovator, industry thought leader and trusted advisor to our clients. The Business Strategy & Analytics team partners with senior leadership and the broader business to drive data-informed decisions, optimize business... 

Pediatrix Medical Group

Neonatal Nurse Practitioner Job at Pediatrix Medical Group

 ...Requisition ID: 2024-46712 Location: US-MO-Springfield Specialty: Neonatal Nurse Practitioner Position Type: Full Time HR Rep / Recruiter: Katherine McPike Contact: ****@*****.*** Overview Neonatal Nurse Practitioner Opportunity... 

Nation Security

Security Operations Manager Job at Nation Security

 ...About the Role Nation Security is seeking a skilled and committed Security Operations Manager in the Orlando area to oversee and lead security teams across multiple client sites. In this critical leadership role, you will drive operational excellence, uphold high...