GPU Infra Management

IT

location_onNoida, Uttar Pradesh
acuteTemporary

Copy Linklink

About Company:

Largest product development center with S/W development ownership of all major mobile models launched across globe i.e. EUR/CIS, Middle East, South West Asia, and North America.

An Ideal Candidate:

Role:          GPU Infra Management Expert

Payroll:     Adecco

Duration -12 + Months (Extendable)

Work Mode: WFO – 5 Days

Location: Sector 126 – Noida

Interview – F2F

Key Competencies:


Role Title

Skills Required

Roles and Responsibilities

GPU Infra Management Expert

 . GPU System Engineer to manage high-performance GPU systems and develop software solutions to optimize GPU resource utilization for scheduling-based applications.

. Well versed with Nvidia Legacy & New GPU systems (RTX, v100, A100 etc) with MLOPs, Orchestration, Kubernetes, Slurm, or similar systems  is must.

 1. GPU System Management:

- Deploy, configure, and maintain GPU hardware and systems.

- Monitor GPU resource usage, health, and performance for various workloads.

- Manage GPU clusters to ensure optimal availability, stability, and scalability.

-  MLOps GPUs infra setup and management



2. Software Development:

- Design and develop GPU management software to automate scheduling and resource allocation.

- Build tools and scripts for GPU workload optimization, resource monitoring, and job scheduling.

- Develop effective GPU scheduling algorithms for multi-application environments.

- Integrate scheduling tools with containerized or virtualized environments like Kubernetes, Slurm, or similar systems.



3. Software Deployment:

- GPU Orchestration S/W deployment

- Configuration Setup and update



4. Performance Optimization:

- Analyze GPU performance metrics and fine-tune systems for improved efficiency.

- Optimize GPU utilization for AI/ML, rendering, and high-performance computing workloads.



5. Troubleshooting and Support:

- Diagnose and resolve GPU hardware/software issues.



6. Documentation and Reporting:

- Document GPU architecture, processes, and management tools.

- Provide detailed reports on GPU utilization, performance trends, and improvements.

 

 


Ref: JN-012026-933107