Senior Site Reliability Engineer

at T. Rowe Price  
Asset Management Services
  Apply Now
About the job
London, England
HYBRID  
Open to new applications
Full-Time ~ Permanent

5 job requirements

Preview the competition
1 years Ansible experience, used daily Must Have  
1 years Terraform experience, used daily Must Have  
3 years Grafana experience, used daily Must Have  
3 years New Relic experience, used daily Must Have  
3 years Prometheus experience, used daily Must Have  

There is a place for you at T. Rowe Price to grow, contribute, learn, and make a difference. We are a premier asset manager focused on delivering global investment management excellence and retirement services that investors can rely on today and in the future. The work we do matters. We invite you to explore the opportunity to join us and grow your career with us.

Senior Site Reliability Engineer

Overview

The Technology Engineering team is looking for an experienced Site Reliability Engineer to join us as we are reimagining the production application and infrastructure management. The team is responsible for engineering scalable and resilient hybrid cloud solutions (both AWS and On-prem). You will be responsible for creating tooling and software that monitors and improves the reliability of our systems. In this role, you will research problems, evaluate modern technologies, create prototypes, develop (integrated process, automation, define standards) observability tooling, and provide SRE consulting on complex projects.

  • Requires specialized in-depth knowledge and expertise in your own job discipline, Amazon Web Services (AWS) platform and/or other cloud-based platforms and deep experience in integrating related disciplinary knowledge
  • Works independently, receives minimal guidance
  • Accountable for work of yourself and others; sets standards around which others will operate
  • Proactively identifies problems and can present and implement solutions to these problems

Role summary and job responsibilities

  • Design and implement highly automated systems/services that ensure the availability, reliability, and scalability of infrastructure and applications.
  • Build and maintain monitoring and alerting to provide timely feedback on the performance and health of systems, network, and applications.
  • Design and implement automation tools to reduce manual toil, streamline repetitive tasks, and enhance overall operational efficiency.
  • Design and build Service Level Indicator (SLIs) metrics, including but not limited to Service Level Objectives (SLOs), Error Budget, Burn Rate Alerts
  • Work closely with development teams to embed reliability best practices into the software development process. Provide mentorship and training to cross-functional teams on SRE principles, encouraging a shared responsibility for the reliability of our services.
  • Collaborating with our support, operations and engineering teams to investigate and troubleshoot complex problems
  • Observe and monitor systems to make sure you have the insight into system performance, health, availability and what is happening internally in the system.
  • Understands what to monitor based on the system(s) you are managing, how the monitoring data is stored, and how to look at the data to make determinations about future actions.
  • Participates in continuous improvement efforts that span multiple multi-functional domains and informs the generation of new standards
  • Be a part of an on-call rotation, continuously enhance automation & documentation, and mentor others on the standard methodologies of infrastructure automation to encourage adoption.
  • Able to overcome differences of opinion and drive team alignment around a specific goal or solution
  • Holds associates and teams accountable for adhering to practices and policies

Business knowledge

  • Demonstrates deep knowledge of products/flows within supported businesses
  • Decomposes the most complex problems into discrete work units.
  • Identifies non-obvious relationships and anomalies often overlooked by others.
  • Balances strategic and pragmatic concerns when solving problems.
  • Makes sound decisions with limited facts or resources.
  • Makes decisions that are cognizant of the firm’s broader business strategy.
  • Demonstrates deep knowledge of products/flows within the businesses they support.
  • Articulates broader business concerns and/or regulatory landscape, including key risks and controls (e.g., GDPR, MIFID, SOX).

Requirements

  • Strong experience with Monitoring and Alerting tools such as Prometheus, Grafana, New Relic
  • Experience in container orchestration solutions in AWS with ECS, Fargate
  • Docker container development experience
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Skilled in building and maintaining dashboards using tools like Grafana, Prometheus and Statsd to provide critical insights
  • Worked with Service Reliability Engineering team to design SLI and SLO for respective applications
  • Strong experience with AWS cloud infrastructure and container orchestration operating in a GitOps framework
  • A solid core foundation in infrastructure and systems engineering including Unix/Linux compute, networking, storage, and monitoring stacks.
  • Have experience using automation tools such as Terraform, Ansible
  • Excellent written and oral communication skills
  • Strong interpersonal skills, adaptable and able to learn quickly
  • Off-hour implementations are required
  • Ability to build positive working relationships with the business contacts, within our IT team, and other IT departments
  • Ability to identify tasks and help develop project plans for medium and large-scale projects

Preferred

  • College degree in computer science or related technical field with 7+ years of systems design, programming, implementation, and integration experience
  • 3+ years of experience within the Amazon Web Services platform
  • AWS, Kubernetes Certifications
    , Equity, and Inclusion:

We strive for equity, equality, and opportunity for all associates. When we embrace the power of diversity and create an environment where people can bring their authentic and best selves to work, our firm is stronger, and we create greater value for our clients. Our commitment and inclusive programming aim to lift the experience for each associate and builds allies for our global associate community. We know that a sense of belonging is key not only to your success at the firm, but also to your ability to bring your best each day.

TR

T. Rowe Price

Classification:

blurTagText
blurTagText

Enable 1-click access to other sources:
                  and more
Do your research faster with Quick Links
Details and stages

Reporting to: details unknown

the hiring process information will appear here if available.

Job ref blurredText

Posted on blurredText

Last checked on blurredText

Closing on blurredText

1 discussion comments
0 requirements
4 Saved as Applied
Qualify To Apply check results
Total attempts: 22  Unique: 10  Passed: 6
Understand who you are up against with Competitive Insights
Discuss this job anonymously
Share your intel on a job vacancy and help other jobseekers.
Team inkscroll - 0 days ago

pretend that this is a blurredText long comment