Functions:
- We are seeking a motivated and knowledgeable Cloud Operations Engineer (L1/L2 Support) with a strong foundation in AWS services and Kubernetes management. The ideal candidate will have hands-on experience with AWS components such as EKS, EC2, RDS, IAM, and CloudWatch, along with basic Linux administration skills.
- This role is critical in maintaining the health, performance, and security of our cloud infrastructure, providing first and second-level support, and ensuring seamless operations.
Key Responsibilities:
AWS Infrastructure Support:
- Monitor and manage AWS services including EKS, EC2, RDS, IAM, VPC, IPSEC VPN, ECS and CloudWatch and other Services.
- Respond to and resolve incidents, service requests, and alerts in a timely manner.
Kubernetes Management:
- Assist in managing EKS clusters, ensuring their availability and performance.
- Perform basic troubleshooting and maintenance tasks on Kubernetes clusters.
- Knowledge of Rancher is an added advantage
System Monitoring and Incident Response:
- Utilize monitoring tools to oversee the health and performance of cloud infrastructure
- Analyze and respond to system alerts, resolving issues to minimize downtime.
Linux Administration:
- Perform basic Linux system administration tasks, including user management, file permissions, and system updates
- Troubleshoot and resolve basic Linux-related issues that impact cloud operations.
Support and Maintenance:
- Provide L1/L2 support, addressing tickets and escalating complex issues to higher-level support when necessary
- Conduct regular system maintenance, updates, and patches to ensure security and compliance
Infrastructure Automation:
- Automate cloud operations using infrastructure-as-code tools such as CloudFormation and Terraform.
- Develop and maintain CI/CD pipelines to streamline deployment processes and enhance efficiency
Documentation and Reporting:
- Maintain detailed documentation of incidents, troubleshooting steps, and resolutions.
- Generate and present regular reports on system performance, incidents, and resolution timelines.
Collaboration and Communication:
- Work closely with other IT and development teams to coordinate and implement cloud solutions
- Communicate effectively with stakeholders regarding the status of issues, incidents, and system health.
Qualifications:
- Kubernetes and Terraform experience is mandatory
- Proficiency in AWS services, including EKS, EC2, RDS, IAM, and CloudWatch
- Basic knowledge of Kubernetes and experience managing EKS clusters.
- Fundamental understanding of Linux administration and basic troubleshooting
- 1-3 years of experience in a cloud operations or similar support role
- Experience with monitoring and incident management tools
- AWS Certified Cloud Practitioner or AWS Certified Solutions Architect Associate preferred.
- Strong analytical and problem-solving skills.
- Excellent verbal and written communication skills.
- Ability to work in a fast-paced environment and manage multiple tasks simultaneously
- Should be okay with rotating shift and on-call set-up