HashiCorp
6 days ago
About the team
HashiCorp Boundary aims to provide a seamless, just-in-time remote access experience for customers to their infrastructure and other web applications without having to worry about passwords, certificates or other credentials. Boundary is offered as a Cloud platform and this role will be part of the Boundary Enterprise Enablement team whose primary focus will be scale and reliability to enable hypergrowth among medium and large enterprises.
What you’ll do (responsibilities)
As an engineer on the Boundary Product Reliability team, you will:
- Develop a deep understanding on how customers use Boundary Cloud and enhance their experience through reliability
- Drive service reliability by developing tooling that enables metric visibility using SLIs, SLOs, and SLAs
- Champion incident management processes that directly impact customer experience
- Reduce the operational overhead of HashiCorp Boundary product and leverage data to understand the largest source of reliability risk
- Deploy, manage, monitor a large scale Boundary Cloud
- Predict our future failures and work proactively to mitigate them
- Have a passion for developer productivity to make other engineers lives better
- Empowering engineers to troubleshoot their own issues by developing tools, frameworks and guardrails for safety
- Advocate and implement reliable design patterns (circuit breakers, graceful degradation, Zero-Downtime Upgrades etc.)
- Partner with the broader HashiCorp organization to learn from incidents through a blameless postmortem process
- Collaborate across teams to improve our tools based on experiences found from running our own software in production
- Participate in a 24/7 on-call rotation that supports our production services
What you’ll need (basic qualifications)
- 5+ years of handling production applications at scale: Backend applications written in Golang, Databases, Observability, and AWS Primitives
- Strive for quality through maintainable code and comprehensive testing from development to deployment
- Clear communication skills while remaining empathetic and kind
- An eagerness to learn through humility and reflection
- Experience debugging performance bottlenecks for live services and systems
Whats nice to have (preferred qualifications)
- Working knowledge of industry best practices related to information security
- Working knowledge on AWS Aurora or postgres, Nomad or other orchestration platforms, Traefik or other load balancing technologies
- Experience or willingness to conceive, document and advocate for best practices
## LI-Remote
Individual pay within the range will be determined based on job related-factors such as skills, experience, and education or training.