Machine Learning Performance Software Engineer, Autopilot AI (Dojo)

Tesla

The Role:

As a member of the Dojo machine learning team, you will be responsible for enabling Tesla’s neural networks to train efficiently on our upcoming in-house custom-silicon supercomputer systems. Join a small team of experienced developers in optimizing and scaling the deployment of our Pytorch-derived neural networks on Tesla’s custom massively-parallel Dojo accelerators. Work with many of the same great engineers who delivered Tesla’s custom FSD computer. The ideal candidate has experience with writing software for large distributed systems.

 

Responsibilities:

  • Understand and model the end to end training performance of the Autopilot SW team’s Pytorch-derived neural networks on the Dojo system.
  • Develop software that scales and improves training performance based on your analysis of bottlenecks
  • Collaborate with the Dojo HW team to understand current HW architecture and propose future improvements

Requirements:

  • BS in relevant field (Computer Science, Computer Engineering, etc.) or relevant work experience 
  • Comfortable with C++ and Python
  • Capable of delivering results with minimal oversight
  • Good communication skills

Nice to Have:

  • Experience scaling neural network training systems or other large distributed systems
  • Familiarity with the internals of Pytorch
  • Performance analysis experience
  • Experience coding parallel programs
  • Able to work from Deer Creek (Palo Alto) office
  • Ready to start soon

 

Responsibilities

Requirements