Infinite Blue is a global leading provider of extendable apps for organizational resiliency and low-code development platforms for enterprises and independent software vendors. We are in search of a Site Reliability Engineer (SRE). An SRE is a member of the Infinite Blue DevOps team who is responsible for enabling stable infrastructure for Infinite Blue’s commercial solutions.
The DevOps team is responsible for uninterrupted functioning of client production environments as well as various lower environments for development, testing and staging. The SRE will fulfill the critical role of ensuring our systems are healthy, monitored, and designed to scale. The primary responsibility of the SRE will be to work with our product development teams that are building highly scalable, microservice based solutions to containerize, deploy, configure and maintain them in the various environments. This involves designing and building system infrastructure (infrastructure as code), implementing security policies and using best in class security tools, deploying products, and ensuring uninterrupted service in a client’s production environment. This role will have a strong focus on automation around application infrastructure deployment and operations and its related security and leverages the philosophy of infrastructure is software and treated as a software product with an SDLC. The successful candidate ideally has experience with the infrastructure aspects of software as a service (SaaS). As a member of the SRE team, the candidate will work with development teams to help create automated pipelines and solutions required for continuous delivery in an Agile plus DevOps culture.
This position requires some weekend and out-of-hours availability for on-call rotation, disaster recovery tests, Change Control, and project work.
This position is located in Collegeville, PA
Essential Job Functions and Responsibilities
- Install, maintain, upgrade, and improve application and platform development, testing and production systems
- Develop and maintain automation for building infrastructure and running products
- Build automation to ensure continuous deployment
- Building and maintaining production environments with scalability, reliability, disaster recovery planning, monitoring, security, and high performance
- Create and maintain thorough technical and procedural documentation and adhere to change control process to be SOC 2 audit compliant
- Design and assist in the setup and maintenance of application monitoring and alerting and experience analyzing and mitigating security related issues and threats
- Engage with development teams to ensure best practices are implemented – improve predictability and reliability of software releases, workflows and operating software
- Reduce mean time to recovery (MTTR) by helping troubleshoot, monitor, alert, and automating recovery
- Work with and support various team members across the organization, including engineering, support, and as needed other business areas like sales, marketing, strategic accounts as it relates to infrastructure and security RFPs
- Bachelor’s degree preferably in computer science or equivalent with 5+ years’ experience with IT Infrastructure, Networking & Security
- Candidate must have solid experience with deploying, maintaining, and supporting scalable cloud-based solution in a production environment
- Strong problem solving and troubleshooting skills
- Ability to work on an on-call basis and provide coverage during non-standard business hours including public/market holidays
- Understand security controls
- Experience in analyzing and mitigating security related issues and threats
- Experience with Linux, AWS
- Experience scripting in at least one of the following: PowerShell, BASH, Python
- Experience with DevOps environments and containerization (Docker, Kubernetes)
- Experience with Infrastructure as Code
- Experience with WAF, Alert Logic, LDAP
- Experience with cyber security; understanding of security concepts with hands-on experience in implementing security controls and compliance requirements
- Experience with CI/CD processes and tools (Team City, Gitlab CI)
- Experience with infrastructure monitoring tools (AWS CloudWatch, DataDog)
- Proficient with SSH tunneling and multi-hop configurations
- Understanding of networking and network topologies
- Understanding of microservice architecture
- Familiarity with chaos engineering
- Familiar with GitOps
- Excellent communication skills; collaborative and personable
- Excellent documentation skills for incidents, architecture diagrams, and runbooks/checklists
- Desire to work in a fast paced, evolving, growing, dynamic environment
Infinite Blue has a strong orientation towards these five core values. Successful employees will demonstrate these capabilities:
- Grit – courage and resolve to achieve our goals
- Agile – ability to reassess and adapt quickly
- Trust – confidence in our services and each other
- One Team – strong alignment and collaboration across the company
- Respect – all team members add value
- Generous Vacation Package
- Employee Benefits offered for full time employees and include: Medical/Dental/401K/etc.
Infinite Blue is an Equal Opportunity Employer.
Or email us directly at firstname.lastname@example.org
Infinite Blue is an Equal Opportunity Employer.
Infinite Blue provides a comprehensive low-code development platform and enterprise applications for the business continuity and disaster recovery industry. Infinite Blue is trusted by independent software vendors and enterprises across the globe. Infinite Blue Platform is at the heart of countless business applications running in a wide variety of industries worldwide. The Company was started in 2013, has grown over 250% over the past three years and was recently named to the Inc. 5000 list of America’s fastest growing companies.