Research Engineer Intern - Perception, Vision Language Models

Plusai (View all Jobs)

Santa Clara, CA

As a Research Engineer Intern – Vision-Language Models for E2E Autonomous Driving, you’ll explore the potential of vision-language models to enhance reasoning, scene understanding, and interpretability in end-to-end autonomous driving. You’ll have the opportunity to work towards a publication at a top tier venue by contributing to key areas of model development, including curating both real-world and synthetic training data, fine-tuning foundational vision-language models, and designing robust evaluation frameworks.

Responsibilities:

Lead model development efforts using vision-language models for end-to-end autonomous driving systems

Curate high-quality training datasets from both real-world trips and synthetic sources

Optimize model architectures and fine-tune pre-trained foundational models to enhance performance and adapt to specific challenges

Design and implement evaluation frameworks to rigorously assess model performance in real-world driving environments

Required Skills:

Pursuing MS or PhD in CS, EE, mathematics, statistics or related field

Thorough understanding of deep learning principles and familiarity with vision language models

2-3 years experience with implementing and training deep learning models in at least one deep learning framework (PyTorch, Tensorflow, Jax)

Preferred Skills:

Past experiences in projects involving design, training or fine-tuning of vision language models and familiarity with knowledge distillation, quantization, vLLM

Past experiences in deep learning projects related to autonomous driving

Publication record in relevant venues (CVPR, ICLR, ICCV, ECCV, NeurIPS, AAAI, SIGGRAPH)