NVIDIA banner
NVIDIA logo

Senior AI Software Development Engineer, TensorRT-LLM

NVIDIA logo NVIDIA
๐Ÿ‡ฎ๐Ÿ‡ฑ Israel
Contract Full Time
Experience Level Senior (5+ years)
Published Date

We are now looking for a TensorRT-LLM Software Development Engineer!

NVIDIA is hiring software engineers for its TensorRT-LLM team. Academic and commercial groups around the world are using GPUs to power a revolution in deep learning-powered AI, enabling breakthroughs in areas like LLM, ChatGPT and Generative AI that have put DL at the โ€œiPhone momentโ€ for AI. Join the team which is building the inferencing software which is foundational to product lines within NVIDIA and across the industry! The ability to work on a fast-paced delivery-focused team is required and excellent interpersonal skills are a must.

What you'll be doing:

  • Craft and develop robust inference software that can be scaled to multiple platforms for functionality and performance
  • Performance analysis, optimization, and tuning for Large Language Models (LLMs)
  • Conduct unit tests and performance tests for different stages of the inference pipeline.
  • Closely follow academic developments in the field of artificial intelligence and feature update TensorRT-LLM
  • Write safe, scalable, modular, and high-quality (C++/Python) code for our core backend software for LLM inference.
  • Collaborate across the company to guide the direction of deep learning inference, working with software, research and product teams

What we need to see:

  • Bachelors, Masters or higher degree in Computer Engineering, Computer Science, Applied Mathematics or related computing focused degree (or equivalent experience).
  • 5+ years of relevant software development experience.
  • Excellent Python programming skills, software design, and software engineering skills
  • Awareness of the latest developments in LLM architectures and LLM inference techniques
  • Experience working with deep learning frameworks like PyTorch and HuggingFace
  • Proactive and able to work without supervision
  • Excellent written and oral communication skills in English

Ways to stand out from the crowd:

  • Prior experience with a LLM inference framework (TensorRT-LLM, SGLang, vLLM, etc.) or a DL compiler in inference, deployment, algorithms, or implementation
  • Prior experience with performance modeling, profiling, debug, and code optimization of a DL/HPC/high-performance application
  • Excellent C/C++ programming and software design skills, including debugging, performance analysis, and test design.
  • Architectural knowledge of CPU and GPU
  • GPU programming experience (CUDA or OpenCL)
Featured Jobs
More Jobs
Latest News
More News