Site Reliability Engineer

Infinite Blue is a global leading provider of extendable apps for organizational resiliency and low-code development platforms for enterprises and independent software vendors. We are in search of a Site Reliability Engineer (SRE). An SRE is a member of the Infinite Blue DevOps team who is responsible for maintaining and supporting Infinite Blue’s Business Continuity solutions (BC in the Cloud) and low-code platform (Infinite Blue Platform) in a multiple cloud environment.

The DevOps team is responsible for uninterrupted functioning of client production environments as well as various lower environments for development, testing and staging.  The SRE will fulfill the critical role of ensuring our systems are healthy, monitored, and designed to scale.  The primary responsibility of the SRE will be to work with our product development teams that are building highly scalable, microservice based solutions to containerize, deploy, configure and maintain them in the various environments. This involves designing and building system infrastructure (infrastructure as code), implementing security policies, deploying products, and ensuring uninterrupted service in a client’s production environment. This role will have a strong focus on automation around application infrastructure deployment, and operations and leverages the philosophy of infrastructure is software and treated as a software product with an SDLC.  The successful candidate ideally has experience with the infrastructure aspects of software as a service (SaaS).  As a member of the SRE team, the candidate will work with development teams to help create automated pipelines and solutions required for continuous delivery in an Agile plus DevOps culture.

This position requires some weekend and out-of-hours availability for on-call rotation, disaster recovery tests, Change Control, and project work.

This position is located in Collegeville, PA

Responsibilities
  • Install, maintain, upgrade, and improve application and platform development, testing and production systems
  • Develop and maintain automation for building infrastructure and running products
  • Build automation to ensure continuous deployment
  • Building and maintaining production environments with scalability, reliability, disaster recovery planning, monitoring, security, and high performance
  • Create and maintain thorough technical and procedural documentation and adhere to change control process to be SOC 2 audit compliant
  • Design and assist in the setup and maintenance of application monitoring and alerting
  • Engage with development teams to ensure best practices are implemented – improve predictability and reliability of software releases, workflows and operating software
Requirements
  • Bachelor’s degree preferably in computer science or equivalent with 5+ years’ experience with IT Infrastructure, Networking & Security
  • Candidate must have solid experience with deploying, maintaining, and supporting scalable cloud-based solution in a production environment
  • Strong problem solving and troubleshooting skills
  • Ability to work on an on-call basis and provide coverage during non-standard business hours including public/market holidays
Skills
  • Experience with Linux
  • Experience with AWS and Azure; strength in AWS over Azure
  • Experience scripting in at least one of the following: PowerShell, BASH, Python
  • Experience with DevOps environments and containerization (Docker, Kubernetes)
  • Experience with Infrastructure as Code
  • Experience with CI/CD processes and tools (Team City, Gitlab CI)
  • Experience with infrastructure monitoring tools (AWS CloudWatch)
  • Proficient with SSH tunneling and multi-hop configurations
  • Understanding of networking and network topologies
  • Understanding of microservice architecture
  • Familiarity with chaos engineering
  • Familiar with GitOps
  • Excellent communication skills; collaborative and personable
  • Excellent documentation skills for incidents, architecture diagrams, and runbooks/checklists
  • Understanding general InfoSec principles, a plus but not required
  • Linux server administration a plus but not required
  • Desire to work in a fast paced, evolving, growing, dynamic environment
Core Values

Infinite Blue has a strong orientation towards these five core values.  Successful employees will demonstrate these capabilities:

  • Grit – courage and resolve to achieve our goals
  • Agile – ability to reassess and adapt quickly
  • Trust – confidence in our services and each other
  • One Team – strong alignment and collaboration across the company
  • Respect – all team members add value
Company Perks
  • Generous Vacation Package
  • Employee Benefits offered for full time employees and include: Medical/Dental/401K/etc.

Infinite Blue is an Equal Opportunity Employer. 

About Company

Infinite Blue provides a comprehensive low-code development platform and enterprise applications for the business continuity and disaster recovery industry. Infinite Blue is trusted by independent software vendors and enterprises across the globe. Infinite Blue Platform is at the heart of countless business applications running in a wide variety of industries worldwide. The Company was started in 2013, has grown over 250% over the past three years and was recently named to the Inc. 5000 list of America’s fastest growing companies.