Job description
Senior DevOps/SRE
We are looking for a talented Site Reliability Engineering (SRE) with a deep interest in distributed systems, cloud computing and the architecture of large-scale systems. The SRE will ensure our log management services have the reliability and uptime appropriate to our user’s needs. You will consult and collaborate with the different software development teams to understand their requirements and design, build and deploy the infrastructure to support these needs.
About the Team
Our Boston Engineering group is responsible for providing the log management services such as search, alerting and data visualization to security professionals. Our systems ingest large amounts of data that need to be highly available and performant at all times.
Technologies we use include:
Java, Python, Terraform, Jenkins, Artifactory, Chef, Puppet, Ansible, Zookeeper, Docker, AWS (EC2, S3, CloudFormation, etc.), Cassandra, PostgreSQL, Kafka, Datadog, Pagerduty
About the Role
You will work closely with, engineering, architecture, infrastructure and product teams to improve the lifecycle of the Log Management services - from inception, design, deployment, operations, monitoring, security, upgrade and maintenance
In this role, you will:
Support services before they go live through activities such as design, deployment, migration strategy, monitoring, and playbook reviews
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health
Scale systems through automation, driving service and infrastructure improvements as well as other ways
Troubleshoot production issues and liaising with relevant Engineering or Infrastructure team for a resolution
Participating in on-call support, and incident response follow-ups such as post-mortems
The skills you’ll bring include:
5+ years of experience in developing, scaling, deploying and troubleshooting large-scale high-performance systems and infrastructure
Excellent knowledge in AWS services, including EC2, RDS, VPC, networking, S3, MSK, etc.
Experience implementing SRE best practices, including SLIs/SLOs, automation, observability etc.
Experience with object-oriented programming languages such as Java
Understanding of Unix/Linux operating systems
Excellent communication & influencing skills
We know that the best ideas and solutions come from multi-dimensional teams. Teams reflecting a variety of backgrounds and professional experiences. If you are excited about this role and feel your experience can make an impact, please don’t be shy - apply today.
About Rapid7
Rapid7 (NASDAQ: RPD) helps organizations across the globe protect what matters most so innovation can thrive in an increasingly connected world. Our comprehensive technology, services, and community-focused research simplify the complex for security teams, helping them reduce vulnerabilities, monitor for malicious behavior, be in 10 places at once, and shut down attacks. We’re on a mission to make security solutions easier to use and access so we can bring safety and resilience to more people.
With more than 10,000 customers across 140+ countries, Rapid7 is a leader in cybersecurity that has earned numerous industry accolades and recognition for our technology and culture.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.
bryanjabs.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, bryanjabs.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, bryanjabs.com is the ideal place to find your next job.