As a Research Engineer Intern – Vision-Language Models for E2E Autonomous Driving, you’ll explore the potential of vision-language models to enhance reasoning, scene understanding, and interpretability in end-to-end autonomous driving. You’ll have the opportunity to work towards a publication at a top tier venue by contributing to key areas of model development, including curating both real-world and synthetic training data, fine-tuning foundational vision-language models, and designing robust evaluation frameworks.
Responsibilities:Lead model development efforts using vision-language models for end-to-end autonomous driving systemsCurate high-quality training datasets from both real-world trips and synthetic sourcesOptimize model architectures and fine-tune pre-trained foundational models to enhance performance and adapt to specific challengesDesign and implement evaluation frameworks to rigorously assess model performance in real-world driving environmentsRequired Skills:Pursuing MS or PhD in CS, EE, mathematics, statistics or related fieldThorough understanding of deep learning principles and familiarity with vision language models2-3 years experience with implementing and training deep learning models in at least one deep learning framework (PyTorch, Tensorflow, Jax)Preferred Skills: Past experiences in projects involving design, training or fine-tuning of vision language models and familiarity with knowledge distillation, quantization, vLLMPast experiences in deep learning projects related to autonomous driving Publication record in relevant venues (CVPR, ICLR, ICCV, ECCV, NeurIPS, AAAI, SIGGRAPH)