The Role:
As a member of the Dojo machine learning team, you will be responsible for enabling Tesla’s neural networks to train efficiently on our upcoming in-house custom-silicon supercomputer systems. Join a small team of experienced developers in optimizing and scaling the deployment of our Pytorch-derived neural networks on Tesla’s custom massively-parallel Dojo accelerators. Work with many of the same great engineers who delivered Tesla’s custom FSD computer. The ideal candidate has experience with writing software for large distributed systems.
Responsibilities:
- Understand and model the end to end training performance of the Autopilot SW team’s Pytorch-derived neural networks on the Dojo system.
- Develop software that scales and improves training performance based on your analysis of bottlenecks
- Collaborate with the Dojo HW team to understand current HW architecture and propose future improvements
Requirements:
- BS in relevant field (Computer Science, Computer Engineering, etc.) or relevant work experience
- Comfortable with C++ and Python
- Capable of delivering results with minimal oversight
- Good communication skills
Nice to Have:
- Experience scaling neural network training systems or other large distributed systems
- Familiarity with the internals of Pytorch
- Performance analysis experience
- Experience coding parallel programs
- Able to work from Deer Creek (Palo Alto) office
- Ready to start soon