Job Description
We’re seeking a highly skilled, hands-on Data Scientist with 4-10 years of experience in applied AI/ML to join our fast-paced team. This role requires deep expertise in transformer architectures and strong fundamentals in model training, fine-tuning, and optimisation. You’ll work across modalities (text, audio, video), with the flexibility to specialise in one domain but the adaptability to experiment across others.
The ideal candidate thrives in a startup-style, high-velocity R& D environment, is execution-focused, and demonstrates ownership from architecture to deployment. You’ll run rapid experiments, iterate on state-of-the-art models, and push the boundaries of generative AI in lip-sync, character consistency, audio realism, and video quality with a research-first, problem-solving mindset.
Responsibilities:
• Model Development and Fine-tuning: Run end-to-end experiments on transformer-based architectures (LLMs, Whisper, diffusion, LoRA, RLHF/SFT, multimodal models).
• Audio: Lip-sync, emotional delivery (shouting, whispering, crying), regional language support.
• Video: Scene/character consistency, quality benchmarks comparable to Veo3/Sora.
• Text: Extend LLMs to handle regional languages and domain-specific adaptation.
• Evaluation and Optimisation: Design automated evaluation frameworks for objective quality scoring (images, video frames, audio clips). Balance trade-offs in speed, quality, and efficiency.
• Cross-Modality Integration: Experiment with audio-video synchronisation, background score integration, and text-to-video alignment.
• Research and Experimentation: Stay ahead of rapidly evolving models and tools, testing architectural variations and scaling solutions for production use.
• Ownership and Execution: Drive initiatives independently with strong problem-solving, accountability, and first-principles thinking.
Requirements:
• Experience: 4-10 years in applied Data Science/ML with a strong focus on generative AI.
• Core Fundamentals: Solid grasp of transformer architectures, LLMs, training dynamics, and optimisation techniques.
• Modality Depth: Expertise in at least one modality (text, audio, or video), with demonstrable end-to-end project experience.
• Hands-On Skills: Strong coding and debugging ability in Python, with deep learning frameworks (PyTorch, TensorFlow).
• Deployment Knowledge: Experience with ML pipelines (FastAPI or similar) for inference and deployment.
• Evaluation Metrics: Proven ability to design/implement automated evaluation methods for generative outputs.
• Adaptability: Ability to experiment quickly with new tools, libraries, and models in a dynamic environment.