Senior HPC Engineer

Onsite

[

New Delhi

]

About the Role

We’re building compute infrastructure for high-throughput, low-latency workloads that push modern hardware to its limits. As a Senior HPC Engineer, you will design, optimize, and operate large-scale compute systems powering AI inference, simulation, and data-intensive workloads.

This role is for engineers who excel in parallel computing, performance engineering, and systems-level optimization across heterogeneous hardware.

What You’ll Do

Design and optimize HPC systems and clusters for performance, scalability, and reliability
Develop and tune parallel applications using MPI, OpenMP, CUDA, or hybrid models
Optimize workloads across CPU, GPU, and accelerator-based architectures
Profile and analyze performance bottlenecks (compute, memory, I/O, interconnect)
Architect high-performance networking (InfiniBand, RDMA) and storage pipelines
Improve job scheduling, resource utilization, and queue efficiency
Lead benchmarking, capacity planning, and performance regression testing
Build automation for cluster provisioning, monitoring, and fault recovery
Mentor engineers and establish best practices for performance engineering
Collaborate with ML, inference, and infra teams on cross-domain optimization

Required Skills

Strong experience in C/C++ (required); Python for orchestration and tooling
Deep knowledge of parallel computing models (MPI, OpenMP, CUDA)
Strong understanding of CPU/GPU architecture, memory hierarchy, and NUMA
Hands-on experience with HPC clusters and workload managers (Slurm, PBS)
Experience with high-performance networking (InfiniBand, RDMA, NCCL)
Strong Linux systems knowledge and performance debugging skills
Proven experience operating or optimizing production-scale HPC environments

Nice to Have

Experience with AI/ML workloads on HPC systems
Familiarity with GPU-direct storage, NVMe-oF, or Lustre/GPFS
Knowledge of power, thermal, and performance tuning at rack or node level
Experience with containerized HPC (Apptainer/Singularity, Docker)
Contributions to HPC tooling, benchmarks, or open-source projects

What You’ll Gain

Ownership of high-performance compute infrastructure at scale
Direct impact on throughput, efficiency, and infrastructure cost
Opportunity to influence hardware selection and system architecture
Work on some of the most demanding compute workloads in production