Senior Software Engineer, Site Reliability Tooling

Filled
December 31, 2025

Job Description

Who is Recruiting from Scratch: Recruiting from Scratch is a specialized talent firm dedicated to helping companies build exceptional teams. We partner closely with our clients to deeply understand their needs, then connect them with top-tier candidates who are not only highly skilled but also the right fit for the company’s culture and vision. Our mission is simple: place the best people in the right roles to drive long-term success for both clients and candidates.
https://www.recruitingfromscratch.com/
Title of Role: Senior Software Engineer, Site Reliability Tooling
Location: United States (Remote, with quarterly onsite sessions in San Mateo, Columbus, or Austin)
Company Stage of Funding: Public / Late-Stage
Office Type: Digital-First (Remote with Quarterly Onsites)
Salary: $163,600 – $226,400 + Bonus + EquityCompany Description

Our client is a leading AI-driven lending marketplace transforming how banks and credit unions evaluate and approve borrowers. Their platform delivers higher approval rates, lower loss rates, and a seamless digital-first experience—enabling more than 80% of applicants to be automatically approved without document uploads.

They operate as a digital-first company with hubs across the U.S., and employees join because they’re motivated by the mission: increasing access to fair, effortless credit by leveraging modern AI and real-time data.
What You Will Do

As a Senior Software Engineer focused on Site Reliability Tooling, you will play a key role in the reliability, resilience, and observability of large-scale production systems. You’ll design and build tools that empower engineering teams to maintain uptime, deploy safely, and understand system performance across complex microservice architectures.

You will:
• Champion SRE principles across engineering and promote a strong culture of service ownership and reliability.
• Build internal tooling from scratch to improve observability, monitoring, alerting, and operational workflows.
• Implement standards to monitor microservices, web apps, mobile apps, machine learning systems, databases, and Kubernetes clusters.
• Improve incident response processes, including on-call workflows, retrospectives, and reliability reporting.
• Automate toil through infrastructure tooling, scripts, and scalable platform services.
• Help define the long-term strategy for reliability, disaster preparedness, and operational risk mitigation.
• Collaborate across multiple engineering groups to deliver enterprise-wide reliability initiatives.

Ideal BackgroundMinimum Requirements
• 6+ years combined experience in Software Engineering, Site Reliability Engineering, and/or DevOps.
• Strong proficiency in Python, Go, and/or JavaScript/TypeScript.
• Hands-on experience with Infrastructure-as-Code (Terraform, CDK, CloudFormation).
• Proven background building internal tooling and applying strong software engineering fundamentals (architecture, testing, TDD).
• Strong grounding in data structures and algorithms.
• Experience with on-call, incident response, and incident management workflows.
• Experience with modern observability tools such as Datadog, Prometheus, Grafana, CloudWatch.
• Experience supporting high-scale SaaS systems in microservice cloud environments.
• Ability to work cross-functionally to drive large engineering initiatives.
• Data-driven mindset focused on metrics, reliability, and continuous improvement.

Preferred Qualifications
• Experience with service mesh technologies.
• Full-stack engineering capabilities.
• Background building tooling for observability or monitoring platforms.
• Experience leveraging LLMs / GenAI to improve SRE workflows (chatops, auto-remediation, alert summarization, etc.).

Compensation and Benefits
• Base Salary: $163,600 – $226,400
• Bonus: Target bonus included
• Equity: Included
• Comprehensive medical, dental, and vision coverage with HSA contributions
• 401(k) with 100% match up to $4,500 (immediate vesting)
• Employee Stock Purchase Plan
• Life and disability insurance
• Flexible vacation, holidays, sick leave, and safety leave
• Parental, family care, and military leave
• Annual wellness, technology, and ergonomic reimbursements
• Team events, ERGs, volunteer groups
• When onsite: catered lunches, snacks, and drinks
• Quarterly team onsite sessions (travel covered)

Salary Range: $142,000-$196,000 base. https://www.recruitingfromscratch.com/

Employment Type: FULL_TIME