Direct Supply is building the future of senior living technology, helping connect the spectrum of healthcare in order to improve the lives of millions of seniors!

As the Manager of the Site Reliability Engineering (SRE) group, you will keep an eye on systems capacity and performance, as we strive to build and run large-scale, massively distributed, fault-tolerant systems. You’ll use your technical experiences and expertise to ensure DS Systems have appropriate reliability to meet customer expectations and continually strive to improve resiliency and reduce risk. You’ll lead a team and be responsible for products, providing technical leadership to key projects and empowering and developing teams to do the same.

SRE’s culture of diversity, intellectual curiosity, problem-solving, and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences, and perspectives. We encourage them to collaborate, think big, and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while also striving to create an environment that provides the support and mentorship needed to learn and grow.

What You’ll Do and Impact:

  • Lead a team of Software/Systems Engineers on projects for users and drive an accountability and ownership environment for improved uptime, disaster recovery, and key performance metrics.
  • Manage end-to-end availability and performance of key services and build automation to prevent problem recurrence. Automate response to all non-exceptional service conditions.
  • Design, write and deliver software to improve the availability, scalability, latency, and efficiency of Direct Supply services.
  • Work closely with engineering leaders, architects, and product managers across the business to proactively expose risks and influence reliability improvements.
  • Lead incident response, diagnosis, and follow-up on system alerts and outages across Direct Supply’s production environment.
  • Manage on-call rotations to ensure 24x7x365 coverage.

What You’ll Need:

  • Bachelor’s degree in Computer Science, Software Engineering, Computer Engineering, similar technical field of study, or equivalent practical experience.
  • Experience in software development in one or more of the following: .NET C#, C++, Java, Windows, Linux
  • Prior hands-on experience with software or DevOps engineering (within the last 3 years preferred)
  • Aptitude for automation and streamlining of tasks
  • Experience managing an engineering team on projects with technical deep-dives into code, networking, operating systems, and/or storage.
  • Ability to lead and develop technology talent
  • Demonstrated ability to execute technical assignments while meeting scope, schedule, cost, and quality goals
  • Strong oral and written communication skills
  • Sense of urgency and ownership

Additional Items of Interest:

  • Strong emphasis on SRE as an engineering discipline with a focus on automation
  • Experience with distributed (multi-tiered) systems, algorithms, and database technologies
  • Experience supporting infrastructure and services in *nix and public cloud environments (AWS, Azure, etc..)
  • Experience supporting containerized application technologies including Docker, Kubernetes
  • Proficiency working with algorithms, data structures, and production troubleshooting.
  • Expertise in problem-solving and analyzing large-scale distributed systems.

Tagged as: AWS, Azure, C#, Cloud, design, Dev, DevOps, Go, iOS, Java, linux, Reliability, Scala, Senior, Site Reliability, UI, UX


Job Overview
We use cookies to improve your experience on our website. By browsing this website, you agree to our use of cookies.

Sign in

Sign Up

Forgotten Password