Founding Site Reliability Engineer

6 Minutes ago • 5 Years + • $150,000 PA - $300,000 PA
Devops

Job Description

Reducto helps AI teams ingest real-world enterprise data with state-of-the-art accuracy. As the first dedicated SRE, you will architect and scale resilient systems for AI and ML workloads, automate cloud infrastructure, and implement monitoring and incident response. This role demands technical leadership, hands-on systems engineering, and collaboration with founders and product teams to ensure reliability and rapid product delivery.
Good To Have:
  • Prior experience founding a company or building products/infrastructure in early-stage environments.
  • Excited about automating incident management with LLMs/AI.
  • Driven, ambitious, and care about technical excellence and collaboration.
  • Keep up with the latest trends in cloud, observability, and SRE best practices.
  • Passionate about open-source and contributed to reliability communities.
  • Built or optimized monitoring, incident response, or high-performance computing for AI/ML, fintech, or enterprise clients.
Must Have:
  • Design, build, and maintain highly available, scalable infrastructure for AI/ML workloads.
  • Implement robust monitoring, alerting, and observability systems.
  • Debug, optimize, and automate infrastructure for fast iteration and deployment.
  • Proactively identify, investigate, and resolve incidents.
  • Collaborate with engineers, ML specialists, and founders on strategy.
  • Have 5+ years of experience in production-grade infrastructure and reliability.
  • Be comfortable with Python or similar languages.
  • Be exceptional at cloud platforms, container orchestration (Kubernetes), networking, and storage.
  • Build tools to diagnose and address reliability problems.
  • Bring a quantitative, hands-on approach to system operations and automation.
Perks:
  • Unlimited PTO
  • Free daily lunch at the office
  • Reimbursed transportation costs
  • Generous health insurance (medical, dental, vision)
  • Health and Wellness Budget ($150/month reimbursement)
  • Parental Leave

Add these skills to join the top 1% applicants for this job

problem-solving
budget-management
oops
game-texts
networking
incident-response
kubernetes
python

About Reducto

Reducto helps AI teams ingest real world enterprise data with state of the art accuracy.

The vast majority of enterprise data — from financial statements to health records — is locked in unstructured file formats like PDFs and spreadsheets. We train vision models to read those documents the way a human would, and make it possible to build products, train models, and automate processes at scale.

We’ve grown incredibly quickly, growing revenue by 7x YOY, and now work with hundreds of companies ranging from leading AI teams (Harvey, Vanta, Scale), through to enterprise (FAANG, top 3 trading firm).

We've raised over 100M from world-class investors like A16z, Benchmark, and First Round Capital, and are hiring a founding Site Reliability Engineer.

The Opportunity

As the first dedicated SRE at Reducto, influencing every aspect of our infrastructure from the ground up. You can expect to architect and scale resilient systems for AI and ML workloads, automate cloud infrastructure, and implement monitoring and incident response practices that set the standard for reliability. This role demands technical leadership, hands-on systems engineering, and strong collaboration with our founders and product teams as we build a company around reliability, rapid iteration, and high-impact product delivery.

The core work will include:

  • Designing, building, and maintaining highly available, scalable infrastructure to support intensive AI/ML workloads and real-time model deployments.
  • Implementing robust monitoring, alerting, and observability systems to ensure system health, performance, and uptime across cloud and on-prem environments.
  • Debugging, optimizing, and automating infrastructure for fast iteration and rapid deployment cycles, focusing on both reliability and developer velocity.
  • Proactively identifying, investigating, and resolving incidents to minimize downtime and maintain world-class service levels for enterprise customers.
  • Collaborating closely with engineers, ML specialists, and founders to shape product, infrastructure, and security strategies.

We would love to meet you if you:

  • Are your own worst critic—have an extremely high bar for quality and always aim for robust solutions rather than quick fixes.
  • Have 5+ years of hands-on experience in building or supporting production-grade infrastructure and reliability processes for high-throughput systems.
  • Are comfortable with Python or similar languages, and exceptional at working across cloud platforms, container orchestration (e.g., Kubernetes), networking, and storage technologies.
  • Build your own tools on the fly to diagnose, experiment, and address reliability problems—whether it's an internal dashboard or an automated remediation workflow.
  • Bring a quantitative, hands-on approach to system operations, automation, and continuous improvement.

Bonus points if you:

  • Have prior experience founding a company or building products/infrastructure in early-stage, high-growth environments.
  • Are excited about automating incident management processes with LLMs/AI.
  • Are driven, ambitious, and deeply care about both technical excellence and collaborative problem-solving.
  • Keep up with the latest trends in cloud, observability, and SRE best practices.
  • Are passionate about open-source and have contributed tools or automation to reliability communities.
  • Have built or optimized monitoring, incident response, or high-performance computing systems for demanding AI/ML, fintech, or enterprise clients.

This is an in person role at our office in SF. We’re an early stage company which means that the role requires working hard and moving quickly. Please only apply if that excites you.

About Reducto

Nearly 80% of enterprise data is in unstructured formats like PDFs

PDFs are the status quo for enterprise knowledge in nearly every industry. Insurance claims, financial statements, invoices, and health records are all stored in a structure that’s simply impractical for use in digital workflows. This isn’t an inconvenience—it’s a critical bottleneck that leads to dozens of wasted hours every week.

Traditional approaches fail at reliably extracting information in complex PDFs

OCR and even more sophisticated ML approaches work for simple text documents but are unreliable for anything more complex. Text from different columns are jumbled together, figures are ignored, and tables are a nightmare to get right. Overcoming this usually requires a large engineering effort dedicated to building specialized pipelines for every document type you work with.

Reducto breaks document layouts into subsections and then contextually parses each depending on the type of content. This is made possible by a combination of vision models, LLMs, and a suite of heuristics we built over time. Put simply, we can help you:

  • Accurately extract text and tables even with nonstandard layouts
  • Automatically convert graphs to tabular data and summarize images in documents
  • Extract important fields from complex forms with simple, natural language instructions
  • Build powerful retrieval pipelines using Reducto’s document metadata
  • Intelligently chunk information using the document’s layout data

Benefits at Reducto

At Reducto, we’re invested in the well-being and growth of our team. Here’s what we currently offer:

  • Unlimited PTO: We believe great work requires recharging.
  • Lunch: Receive a free lunch to eat with your teammates daily at the office
  • Reimbursed Transportation: Provide us with your receipts and we’ll take care of the costs
  • Insurance: Generous health insurance covering medical, dental, and vision.
  • Health and Wellness Budget: We provide up to $150/mo reimbursement for health and wellness spending, such as gym memberships, fitness classes, or similar.
  • Parental Leave: Work with us to build a leave schedule that works for you and your family

Reducto is an Equal Opportunity Employer committed to diversity and inclusion in the workplace. All qualified applicants will receive consideration for employment without regard to sex, race, color, age, national origin, religion, physical and mental disability, genetic information, marital status, sexual orientation, gender identity/assignment, citizenship, pregnancy or maternity, protected veteran status, or any other status prohibited by applicable national, federal, state or local law.

Compensation Range: $150K - $300K

Apply for this Job

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Set alerts for more jobs like Founding Site Reliability Engineer
Set alerts for new jobs by Reducto
Set alerts for new Devops jobs in United States
Set alerts for new jobs in United States
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙