Thursday, November 30, 2023

Principal Site Reliability Engineer (Platforms and Systems Specialist)

 

We are looking for Principal Site Reliability Engineer (Platforms and Systems Specialist)

For further details please drop email with updated profile at aravinthu@tsmspl.com

Location: Remote/Hybrid


General Summary:  The Cloud Operations team provides 24x7x365 support for all Company SaaS & Hosting customers globally. This business unit is responsible for the day-to-day management and support of the cloud operations environment including the uptime, performance and high availability of all customers supporting systems inside of the SaaS & Hosted environments. The SaaS & hosted ecosystem is comprised of multi-tiered applications, microservice architectures, containers & virtual servers as well as large & complex multiterabyte SQL database systems. The SRE Platforms & Systems Specialist will be focused on the orchestration and automation of infrastructure and deployments supporting the lifecycle management processes for Company hosted environments in both public and private cloud settings. The ultimate objective will be to minimize toil as much as possible through automated solutions for the day-to-day maintenance, upkeep, and operations control tasks. This empowers the engineer to focus their efforts on improving the performance of the existing platform and contribute to the design and architecture of new infrastructure and software inside the SaaS ecosystem. This resource should have the skillsets of both a principal systems engineer and a junior to mid-level software developer. This includes a deep understanding of Windows and Linux, AWS, and other cloud platforms.

 

Key Responsibilities

• Develop orchestration and automation for Active Directory, task management systems, secure file transfer systems and other common cloud operations platforms in support of the Company cloud operations SaaS & Hosted environments.

• Codevelop the automation & orchestration framework including establishing design patterns related to the CMDB, config management, password management and other key integrations from the ground up with other SREs

• Implement and maintain CI/CD pipelines for the automation and orchestration of the SaaS & Hosted cloud operations environments.

• Create automation and orchestration for core datacenter cloud operations services.

• Continuous development of systems self-healing automation to reduce toil.

• Partake in a rotation providing incident and request handling support, identifying improvement opportunities where automation or rearchitecting of solutions can improve overall outcomes and reduce toil.

• Will serve as technical lead for Active Directory and other central platform services on major projects inside of hybrid cloud environments

• Responsible for training SRE team members, project engineers, technical support staff and application development staff to better utilize AD & other managed platforms. Professional Skills & Abilities

• Desire and ability to thrive in a fast-paced, highly demanding, dynamic business and cloud operations environment.

• The role requires analytical acumen and solution orientation to probe for understanding and to make appropriate decisions to address the nuances of technical and business challenges to achieve the targeted outcome.

• Strong customer service orientation • Excellent communication skills and experience in driving cross department initiatives to obtain organizational objectives & meet customer needs

• Strong communication, presentation, business, and technical writing skills

• The ability to provide excellent customer service as well as manage and build strong relationships both internally and externally

• Strong interest in further developing and integrating operations with technology in business value creating ways

• Awareness of emerging issues, including regulations, industry practices and technology

• Experience with Kubernetes and Container administration is a plus.

 

Technical Skills & Experience

• 15+ years of experience in job specific skills.

• 8+ years of experience in writing automation scripts in bash, python, or powershell to solve technical and business problems in IT operations.

• 8+ years of experience with Active Directory, DNS, secure file transfer, OS Patching, and various other platform services

• 4+ years of experience in orchestrating automation for cloud operations or managed services environments building runbooks that align with work streams and value streams

• 3+ years of experience direct involvement with datacenter buildouts &/or Disaster Recovery of core platform systems such as Active Directory and other services previously mentioned

• 1+ year of experience automation against AWS APIs for system builds, backups and other system management interactions • Degree in Computer Science or equivalent experience.

No comments:

Post a Comment