Senior HPC Engineer

Onsite
[
New Delhi
]

About the Role

We’re building compute infrastructure for high-throughput, low-latency workloads that push modern hardware to its limits. As a Senior HPC Engineer, you will design, optimize, and operate large-scale compute systems powering AI inference, simulation, and data-intensive workloads.

This role is for engineers who excel in parallel computing, performance engineering, and systems-level optimization across heterogeneous hardware.


What You’ll Do

  • Design and optimize HPC systems and clusters for performance, scalability, and reliability
  • Develop and tune parallel applications using MPI, OpenMP, CUDA, or hybrid models
  • Optimize workloads across CPU, GPU, and accelerator-based architectures
  • Profile and analyze performance bottlenecks (compute, memory, I/O, interconnect)
  • Architect high-performance networking (InfiniBand, RDMA) and storage pipelines
  • Improve job scheduling, resource utilization, and queue efficiency
  • Lead benchmarking, capacity planning, and performance regression testing
  • Build automation for cluster provisioning, monitoring, and fault recovery
  • Mentor engineers and establish best practices for performance engineering
  • Collaborate with ML, inference, and infra teams on cross-domain optimization

Required Skills

  • Strong experience in C/C++ (required); Python for orchestration and tooling
  • Deep knowledge of parallel computing models (MPI, OpenMP, CUDA)
  • Strong understanding of CPU/GPU architecture, memory hierarchy, and NUMA
  • Hands-on experience with HPC clusters and workload managers (Slurm, PBS)
  • Experience with high-performance networking (InfiniBand, RDMA, NCCL)
  • Strong Linux systems knowledge and performance debugging skills
  • Proven experience operating or optimizing production-scale HPC environments

Nice to Have

  • Experience with AI/ML workloads on HPC systems
  • Familiarity with GPU-direct storage, NVMe-oF, or Lustre/GPFS
  • Knowledge of power, thermal, and performance tuning at rack or node level
  • Experience with containerized HPC (Apptainer/Singularity, Docker)
  • Contributions to HPC tooling, benchmarks, or open-source projects

What You’ll Gain

  • Ownership of high-performance compute infrastructure at scale
  • Direct impact on throughput, efficiency, and infrastructure cost
  • Opportunity to influence hardware selection and system architecture
  • Work on some of the most demanding compute workloads in production