Software Developer in Test (Backend)

Caw Networks

Hyderabad, India

4-7 Years

Save

Posted 7 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

We are seeking a skilled Senior QA Engineer with (2–5 years) experience and a strong foundation in backend API testing, AI system evaluation, and production-quality test automation. The ideal candidate will have at least 2 years of hands-on backend/API testing, strong coding skills in Python, TypeScript, or Java, with a passion for building eval infrastructure for AI systems.

Key Responsibilities

Design, develop, and execute eval datasets and regression harnesses for production AI systems - voice agents and enterprise chat platforms.
Collaborate with AI engineering teams to embed quality gates into PR workflows - eval scores before merge, not after.
Build and own LLM-as-judge harnesses, golden datasets, and prompt regression suites.
Write and maintain automated test frameworks using Pytest, REST Assured, or equivalent coded frameworks.
Perform API and backend testing across microservices and async LLM pipelines.
Design observability dashboards so anyone can answer did the AI get worse this week with a chart, not gut feel.
Partner with engineering on red-teaming - adversarial datasets covering PII, jailbreaks, and prompt injection.
Continuously research and recommend new eval tooling and testing strategies to improve AI system quality.

Key Requirements

4–7 years of experience in QA / SDET / Quality Engineering.
At least 1.5–2 years in backend / API / systems testing.
2+ years of strong coding in Python, TypeScript, or Java.
2+ years with modern test frameworks - Pytest / REST Assured / JUnit / Vitest / Jest.
Hands-on with microservices, async pipelines, and event-driven architecture.
Experience with CI/CD integration and test infrastructure.
Builds automation frameworks from scratch - not just uses tools.
Exposure to AI/LLM eval tooling: Langfuse, LangSmith, RAGAS, DeepEval, or equivalent (preferred).

Preferred Qualifications

Strong systems thinking - reasons about contracts, retries, latency, and failure modes, not just UI surfaces.
Experience with observability tooling - OpenTelemetry, Datadog, or Honeycomb.
Familiarity with voice/telephony testing, ASR/TTS evaluation, or regulated-domain QA (PII, audit trails, compliance).
Excellent communication and collaboration skills.
Ability to work independently and take full ownership of quality engineering.

As the ladder goes up, the expectations rise too, providing more responsibility and opportunities for growth.

Why Join Us

Greenfield eval infrastructure - build quality systems for production AI, not maintain legacy test suites.
Real stakes: regulated industries, real customers, real money flows. Hallucinations are not allowed.
Embedded in design from day one - eval scores in PR descriptions before merge, not a downstream gate.
Work alongside modern AI coding tools (Claude Code, Codex) as part of normal development.
Collaborative team with a strong emphasis on engineering rigor and continuous improvement.

Skills:- Automated testing, Python, pytest, Unit testing, API Testing, Rest Assured, Microservices, Object Oriented Programming (OOPs), RESTful APIs, Robot Framework and Test Automation (QA)