3 months ago

Logo of Mistral AI

HPC System Engineer - France

Mistral AI

Remote HybridFrance
About Mistral 
- At Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world. Our mission is to make AI ubiquitous and open. 
- We are creative, low-ego, team-spirited, and have been passionate about AI for years.
- We hire people who thrive in competitive environments, because they find them more fun to work in.
- We hire passionate women and men from all over the world.
- Our teams are distributed between France, UK and USA.

Role Summary 
- We work at the cutting edge of science and technology, combining modern cloud environments with High-Performance Computing (HPC) standards. Our clusters use the latest available GPUs at large scales.
- As an HPC System Administrator, you will be responsible for managing our clusters, ensuring their smooth operations for all users and all sites.
- You will be the interface between our research and production users as well as various cloud providers to address node issues, capacity requests, image upgrades and tooling needs. 
- Location: France.

Key Responsibilities 
- Oversee the strategic design, system performance, resource allocation, configuration management and operational support for both our hardware and software systems.
- Solve and troubleshoot complex technical problems with a proactive approach to system optimization and issue resolution.
- Ensure HPC standard security practices and compliance.
- Maintain comprehensive documentation for infrastructure, configurations and procedures.
- Collaborate effectively across teams of engineers and researchers.

Qualifications & profile

We’re looking for a blend of experience with:
- High-performance networking
- GPUs in large scale distributed networks
- Large scale distributed storage file systems and providers
- Linux Kernel and OS
- Virtualization and container architecture in cloud environments (Docker, Kubernetes, OpenStack…)
- SLURM
- Software, hardware and network failure troubleshooting

Now, it would be ideal if you had experience with : 
- LLM training
- AI/ML frameworks
- GPU programming (CUDA)

We’re also looking for people who are:
- Passionate
- Self-directed 
- Low-ego 
- Team player

Benefits

- Daily lunch vouchers 
- Contribution to a Gympass subscription 
- Monthly contribution to a mobility pass 
- Full health insurance for you and your family 
- Generous parental leave policy