Machine Learning Engineer (LLM Architect - Target ID)

Insilico Medicine

Abu Dhabi, United Arab Emirates

3-5 Years

Save

Posted 9 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About Insilico

Insilico Medicine is an end-to-end, artificial intelligence (AI) -driven pharma- biotechnology company with a mission to accelerate drug discovery and development by leveraging our rapidly evolving, proprietary platform across biology, chemistry, and clinical development.

For more info, visit ourwebsitehttps://insilico.com

About Role

We are currently seeking an exceptional and mathematically grounded Machine Learning Engineer to join our Target ID team. Your primary focus will be the architectural design and training of next-generation Large Language Models (LLMs) specifically tailored for biological discovery. Unlike standard implementations, you will be expected to propose and engineer novel architectural components and optimization strategies based on deep mathematical principles. You will bridge the gap between theoretical deep learning and practical biotechnology, creating models capable of reasoning over complex biological data to identify novel therapeutic targets. A strong background in mathematics and the ability to write custom model implementations from scratch are essential for this role.

**You must be based in Abu Dhabi for this role. If you're currently in Dubai, you must be willing to relocate, as commuting between the two cities won't be feasible for this position**

Place of work

Level 6, Unit 08, Block A, IRENA HQ Building Masdar City, Abu Dhabi United Arab Emirates

Reports to

Team Lead

Responsibilities

Design, propose, and implement novel neural network architectures beyond standard Transformers (e.g., modifying attention mechanisms or positional encodings).

Derive and implement custom loss functions and optimization algorithms based on mathematical first principles to improve model convergence on biological data.

Lead the end-to-end training lifecycle of domain-specific LLMs, including Pre-training, Supervised Fine-Tuning (SFT), and Reinforcement Learning (RLHF).

Collaborate with the Target ID team to translate complex biological questions into precise mathematical formulations solvable by AI.

Develop strategies to ground LLM generation in factual data, mitigating hallucinations in high-stakes drug discovery contexts.

Optimize training pipelines for high-performance computing clusters using distributed training techniques.

Stay up-to-date with the latest advancements in Generative AI, Information Theory, and Computational Biology.

Work collaboratively with biologists and bioinformatics specialists to validate model hypotheses.

GeneralRequirements:

I.Education

Master's or Ph.D. degree in Mathematics, Computer Science, Physics, Machine Learning, or a related quantitative field.

II.Experience and Skills

3-4 years of experience in Machine Learning;

Deep theoretical understanding of Linear Algebra, Calculus, Probability Theory, and Information Theory.

Expert proficiency in Python and deep learning frameworks (PyTorch, Transformers), with the ability to implement custom layers and training loops from scratch.

Proven experience in training and architecting Transformer-based models and LLMs.

Ability to read and implement methods from the latest AI research papers.

Familiarity with biological data types (sequences, structures, pathways) is a significant advantage, but not mandatory if the mathematical foundation is strong.

Strong problem-solving skills and a passion for algorithmic innovation.

Excellent written and oral communication skills for explaining complex mathematical concepts to cross-functional teams.