Responsible in working closely with All other teams across the Platform to lead in incident management in Production including Certification and Development environments to optimize performance and security on our infrastructure. Responsible for both maintaining site reliability and Service deployments of Business products. Facilitate service monitoring, application upgrades, building infrastructure enhancements, and managing ongoing tasks.
Create processes designed to measure system effectiveness and identify areas for improvement. Stay abreast of new technologies in the field and provide recommendations to organizational management on new solutions. Oversee the selection of orchestration tooling, as well as compliance audits and reporting. May be responsible for identifying, correcting, and enhancing important software tools. Seek ways to enhance systems operations, with a focus on automation and minimizing cost.
24/7/365 Site Operations supporting New Lexis Platform Applications (Cloud and On-Premise) including upcoming Services and Project that will be migrated to NLP.
Primary on call for any New Lexis Platform related incidents (PROD, CERT and DEV Environments)
Service Restoration
Real-time and proactive monitoring of logs and application performance
System Administration and Operations in Production
Incident and Change Management, Service Catalog / Service Task Fulfillment
Responsible for Problem Management - Problem identification, predictability, prevention and detection that will help to improve Availability and Reliability of the application
Status Reporting
Build / Bake / Deployments and Releases in CERT and PROD Environment for New Lexis Platform Applications and Services in Minor, Major and Emergency Releases
Adaptable in fast phase changing environment and new technologies.
Responsible for installation, maintenance, security, performance and tuning of New Lexis Platform and related software and services.
Ensures close technical interchanges on Platform related issues with SRE, Application Developers, Shared Services and Operations personnel as necessary.
Recommends and aids in the definition of New Lexis Platform strategies, policies, standards, and procedures which are consistent with the Company mission.
Actively participates and often leads team meetings and activities.
Responsible for research, risk assessment, design and validation of emerging and/or improved infrastructure technologies and services related to New Lexis Platform management.
Follow security guidelines for the proper delegation of accounts and privileges.
Participate in Continuous Improvements initiatives using Lean Six Sigma methodologies
Actively participates in team meetings and activities.
Build a solid, positive relationship with development, peers, colleagues and vendors.
3Rs - Respond, React, Resolve
Other duties as assigned.
Qualifications
A bachelor's degree (Information Systems, Computer Science/Engineering) or equivalent experience.
Minimum 2-3 year experience in IT industry or related field
Must have an experience in supporting huge cloud infrastructure (numerous cloud-hosted applications and servers)
Knowledgeable in Cloud concepts and technologies (AWS, Kubernetes and Azure).
Familiar in Cloud application performance monitoring (Splunk, New Relic, Datadog and alike)
Knowledgeable in Continuous Integration / Continuous Delivery - CI/CD tools (Jenkins, Artifactory and alike)
Knowledge and understanding of basic Unix/Linux, Windows server operating systems
Understanding of basic JAVA middleware and web server concepts and technologies.
Knowledgeable in network concepts, configuration, and routing.
Knowledge and understanding of database concepts including basic troubleshooting, high availability, clustering and disaster recovery, including no-SQL database.
Knowledgeable with the management of virtual machines hosted on VMware ESX.
Knowledgeable in using IT Service Management systems (ServiceNow, Freshservice and the likes)
Basic Unix/Linux, Windows, Database and Middleware troubleshooting, and analysis required.
Solid interpersonal, proactive, teamwork, communication and follow up skills (verbal and written) required with different levels of hierarchy.
Ability to monitor, define, analyze and resolve issues both effectively and efficiently in a high-pressure production environment.
Preferred candidates whose location is in close proximity to REPH office or satellite offices but will also consider candidates with backup power supply (able to power up laptop for 8-10h hours) and internet connection ( 20-30 Mbps)