Knowledge of vector databases (e.g., milvus, qdrant, Pinecone) in RAG pipelines.
Responsibilities
Design and deploy GenAI workloads at scale, including LLMs and multimodal models using container orchestration platforms like Kubernetes, Ray, or KServe.
Own the development of the AIOps pipeline, including data ingestion, feature engineering, model training, validation, deployment, and monitoring.
Demonstrated experience with LLMOps, including prompt management, routing, guardrails, and logic fallback mechanisms.
Size and allocate GPU/TPU resources for use cases such as RAG, chatbots, image/video generation, and LLM fine-tuning or inference.
Collaborate with research teams to automate LLM pipelines from training to deployment using MLflow, and manage large-scale training jobs using schedulers like Kubeflow or SLURM.
Engineer highly available and scalable infrastructure across cloud and hybrid platforms (AWS, GCP, Azure).