Site Reliability Engineer (with Kubernetes and Terraform experience)

Doxa7 Solutions, Inc.

Early Applicant

29 days ago
Be among the first 50 applicants

Exp: 0-3 Years

Full time

Taguig, Philippines

Job Description

ROLE SUMMARY

Our client is looking for a Site Reliability Engineer to join the client's rapidly growing company in support of multiple SaaS applications. You will be responsible for cloud infrastructure, availability, reliability, performance, and security of production applications and systems.

SCHEDULE: 9:00 AM 6:00 PM Pacific Daylight Time (12:00 AM 9:00 AM Philippine Standard Time), follows Philippine holidays

POSITION TYPE: Full Time

WORK ARRANGEMENT: Remote

ESSENTIAL FUNCTIONS:

Create, deploy, and maintain production infrastructure within the AWS accounts, using IAC/Terraform
Utilize various AWS services, including EC2, EKS, RDS, RedShift, S3, and IAM
Create, implement, and maintain automated application releases using Bitbucket Pipelines

Create, implement, and maintain application and infrastructure performance monitoring using Datadog or Prometheus/Loki/Grafana

Create, implement, and maintain application and infrastructure availability monitoring using Datadog or Prometheus/Loki/Grafana
Apply security practices and policies to identify and remediate security vulnerabilities
Oversee incident response procedures, including analysis and documentation of incidents to prevent future occurrences

QUALIFICATIONS:

A 4-year college degree (technical or quantitative science) is preferred or equivalent work experience with evidence of proficiency and achievement in virtual infrastructure management

3+ years experience in cloud computing and Infrastructure as Code (IaC) (e.g., Terraform, etc.) or related field

Experience with cloud-native tooling (Helm Charts, ArgoCD, HashiCorp Vault, Harbor, Reloader, Grafana, Prometheus, and Loki) is a plus
Experience with cloud native analytics tools (ElasticSearch, MongoDB, RedShift/SnowFlake, and Looker)
Any AWS certification is a big plus

Proficient in Linux system administration and security

Proficient with containerization technologies, especially Kubernetes
Proficient with code versioning tools (e.g., Git, Bitbucket, etc.)
Proficient with CI/CD tools (e.g., Bitbucket Pipelines, etc.)

Proficient in scripting languages such as Bash and Python

Exposure to Open Telemetry and Distributed Tracing
Awareness of recent industry trends related to observability and monitoring
Strong troubleshooting and problem-solving skills, with the ability to quickly diagnose and resolve complex issues