Search by job, company or skills

Caw Networks

Software Developer in Test (Backend)

Save
  • Posted 7 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We are seeking a skilled Senior QA Engineer with (2–5 years) experience and a strong foundation in backend API testing, AI system evaluation, and production-quality test automation. The ideal candidate will have at least 2 years of hands-on backend/API testing, strong coding skills in Python, TypeScript, or Java, with a passion for building eval infrastructure for AI systems.

Key Responsibilities

  • Design, develop, and execute eval datasets and regression harnesses for production AI systems - voice agents and enterprise chat platforms.
  • Collaborate with AI engineering teams to embed quality gates into PR workflows - eval scores before merge, not after.
  • Build and own LLM-as-judge harnesses, golden datasets, and prompt regression suites.
  • Write and maintain automated test frameworks using Pytest, REST Assured, or equivalent coded frameworks.
  • Perform API and backend testing across microservices and async LLM pipelines.
  • Design observability dashboards so anyone can answer did the AI get worse this week with a chart, not gut feel.
  • Partner with engineering on red-teaming - adversarial datasets covering PII, jailbreaks, and prompt injection.
  • Continuously research and recommend new eval tooling and testing strategies to improve AI system quality.

Key Requirements

  • 4–7 years of experience in QA / SDET / Quality Engineering.
  • At least 1.5–2 years in backend / API / systems testing.
  • 2+ years of strong coding in Python, TypeScript, or Java.
  • 2+ years with modern test frameworks - Pytest / REST Assured / JUnit / Vitest / Jest.
  • Hands-on with microservices, async pipelines, and event-driven architecture.
  • Experience with CI/CD integration and test infrastructure.
  • Builds automation frameworks from scratch - not just uses tools.
  • Exposure to AI/LLM eval tooling: Langfuse, LangSmith, RAGAS, DeepEval, or equivalent (preferred).

Preferred Qualifications

  • Strong systems thinking - reasons about contracts, retries, latency, and failure modes, not just UI surfaces.
  • Experience with observability tooling - OpenTelemetry, Datadog, or Honeycomb.
  • Familiarity with voice/telephony testing, ASR/TTS evaluation, or regulated-domain QA (PII, audit trails, compliance).
  • Excellent communication and collaboration skills.
  • Ability to work independently and take full ownership of quality engineering.

As the ladder goes up, the expectations rise too, providing more responsibility and opportunities for growth.

Why Join Us

  • Greenfield eval infrastructure - build quality systems for production AI, not maintain legacy test suites.
  • Real stakes: regulated industries, real customers, real money flows. Hallucinations are not allowed.
  • Embedded in design from day one - eval scores in PR descriptions before merge, not a downstream gate.
  • Work alongside modern AI coding tools (Claude Code, Codex) as part of normal development.
  • Collaborative team with a strong emphasis on engineering rigor and continuous improvement.

Skills:- Automated testing, Python, pytest, Unit testing, API Testing, Rest Assured, Microservices, Object Oriented Programming (OOPs), RESTful APIs, Robot Framework and Test Automation (QA)

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 149261175