Job description
Senior Site Reliability Engineer
We are looking for a talented Site Reliability Engineer (SRE) with a deep interest in distributed systems, cloud computing and the architecture of large-scale systems. The SRE lead will ensure our InsightIDR services have the ultra-high reliability and uptime necessary to meet our customers’ needs. As SRE, you will work closely with our engineering team and partner teams throughout Rapid7 to help solve extremely challenging problems at a massive scale.
About the Team
InsightIDR helps identify and address key cybersecurity risks to our customers. We apply machine learning, threat intelligence, and business intelligence to event sources, including desktops, servers, network switches, firewalls, cloud services, directory servers, DHCP servers, and SIEMs in order to distill hundreds or thousands of daily events per customer into the few real, high priority threats that need attention. Our systems ingest large amounts of data that need to be highly available and performant at all times.
Some of the technologies we use include: Java, Python, Cassandra, Dynamo, MySQL/RDS, Redis, ElasticSearch, AWS (EC2, S3, CloudFormation, etc.), Terraform and Jenkins.
At Rapid7, we value intellectual curiosity, problem solving ability, initiative, and team spirit.
About the Role
We are looking for a talented Site Reliability Engineer (SRE) with a deep interest in distributed systems, cloud computing and the architecture of large-scale systems. The SRE lead will ensure our InsightIDR services have the ultra-high reliability and uptime necessary to meet our customers’ needs. As SRE, you will work closely with our engineering team and partner teams throughout Rapid7 to help solve extremely challenging problems at a massive scale.
In this role, you will:
Establish a new Site Reliability Engineering function within Engineering
Work closely with Engineering teams, Architecture, Infrastructure and Product teams to improve the lifecycle of the InsightIDR services - from inception, design, deployment, operations, monitoring, security, upgrade and maintenance
Support services before they go live through activities such as design, deployment, migration strategy, monitoring, and playbook reviews
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health
Scale systems through automation and driving service and infrastructure improvements
Troubleshoot production issues and liaising with relevant Engineering, product deployment, and platform teams for a resolution
Manage and participate in on-call support, and incident response follow-ups such as post-mortems
Mentor and coach team members
The skills you’ll bring include:
Previous experience in a lead engineering role
5+ years of experience scaling SaaS services and infrastructure
Expert knowledge of developing, scaling, automating, and troubleshooting large-scale systems
Expert knowledge of deployment and monitoring frameworks
Ability to debug, optimize code and automate routine tasks
Advanced understanding of System Performance and tuning
Strong knowledge of NoSQL and SQL concepts
Strong knowledge of OOP languages such as Java
Experience with scripting languages such as Shell, Python
Extensive experience with database operation and optimization
Strong knowledge of RESTFul architectures
Understanding of Unix/Linux operating systems
Proficient in AWS services, including EC2, RDS, S3, streaming data, etc.
Systematic problem-solving approach
Excellent communication & influencing skills
Strong technical writing skills
We know that the best ideas and solutions come from multi-dimensional teams. Teams reflecting a variety of backgrounds and professional experiences. If you are excited about this role and feel your experience can make an impact, please don’t be shy - apply today.
About Rapid7
Rapid7 is creating a more secure digital future for all by helping organizations strengthen their security programs in the face of accelerating digital transformation. Our portfolio of best-in-class solutions empowers security professionals to manage risk and eliminate threats across the entire threat landscape from apps to the cloud to traditional infrastructure to the dark web. We foster open source communities and cutting-edge research–using these insights to optimize our products and arm the global security community with the latest in attackers methods. Trusted by more than 10,000 customers worldwide, our industry-leading solutions and services help businesses stay ahead of attackers, ahead of the competition, and future-ready for what’s next.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.
bryanjabs.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, bryanjabs.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, bryanjabs.com is the ideal place to find your next job.