High-performance computing & infrastructure engineer

From the GPU kernel
to the cluster in production.

I build high-performance computing tools, and I run the infrastructure that ships them: GPU energy measurement for Kokkos at Oak Ridge, HPC for nuclear simulation at EDF, and a five-node cluster running about 20 services in production, operated end to end from Docker to CI/CD.

ORNL · EDF · 5-node production cluster

GitHub LinkedIn ORCID

Selected work

/01 Selected work

Things I built, and what they cost

GPU energy tooling for Kokkos at Oak Ridge, merged upstream and shown as a poster at SMC25. Three years of HPC for nuclear simulation at EDF. A five-node production cluster I run end to end. Case studies below, the flagship first.

Flagship

Measuring where the energy goes on the GPU

Energy-measurement tooling for Kokkos, the US Department of Energy's performance-portability framework. Connectors merged into Kokkos Tools, plus an analysis dashboard.

The problem

Kokkos lets one C++ source run across NVIDIA, AMD, and Intel GPUs, which is exactly why energy is hard to reason about: the same kernel draws different power on every backend, and application teams had no portable way to see it. On DOE machines, where power is now a first-class constraint, that blind spot matters.

What I built

A set of Kokkos Tools connectors that sample power while kernels run and attribute the integrated energy to the Kokkos regions that caused it: an NVML backend for NVIDIA GPUs, a Variorum backend for node-level power, a background daemon sampling on a fixed interval, and CSV export. On top, a Python dashboard turns that output into per-kernel energy analysis. It hooks the Kokkos profiling interface, so application code is untouched.

Where it stands

The periodic-sampling daemon is merged into kokkos-tools, written up in an ORNL report, and presented as a poster, 'Understanding GPU Energy Dynamics in HPC Applications', at the 2025 Smoky Mountains Conference. The NVML and Variorum connectors are in review, with ROCm SMI sketched for AMD.

HPC for nuclear simulation

A three-year apprenticeship building C++ performance tooling for COCAGNE, EDF's reactor-core simulation platform: a scientific codebase of more than 500,000 lines.

The context

EDF's ASICS group develops the scientific computing that nuclear simulation depends on. Alongside my engineering degree, I spent three years on COCAGNE, a reactor-core simulation platform of more than 500,000 lines of C++, on the performance and tooling that keep a codebase that size measurable.

What I built

Two internal C++ performance-analysis tools: a memory-profiling library that intercepts allocation through LD_PRELOAD, and a hierarchical CPU-timing tool with Python bindings via PyBind11. I took part in refactoring the neutronic solvers toward a Ports and Components model, and built the Debian packaging pipeline on GitLab CI/CD and Jenkins.

A three-year industrial apprenticeship. The work above is cleared for public mention; the rest stays under confidentiality.

Running my own production

A five-node Proxmox cluster, sentinel, hosting around 20 publicly reachable services on hardware I run and automate myself.

The setup

Five Proxmox nodes (cerberus, echelon, mikoshi, cynosure, ultron) behind a VyOS edge over a WireGuard uplink. One Traefik terminates Let's Encrypt TLS for around 20 services under kerboul.me: a Gitea forge, a Coolify PaaS, Nextcloud, a media stack, and the apps I deploy, including this site. The cluster's runbooks and automation are themselves a repo.

Why it's here

It is the DevOps and SRE half of the profile, and it is real: uptime, backups, certificate renewal, monitoring, and the unglamorous failure modes you only learn by being on call for your own infrastructure. The site you're reading ships to it through a CI/CD pipeline that builds a versioned image, scans it for vulnerabilities, and rolls back automatically on a failed health check.

sentinel, live

querying the cluster…

Live from the cluster's own Proxmox API. The infrastructure on this page is online as you read it.

Delivering an event for 120+ players

Opération Endgame, a DCS World operation I've run yearly since 2020: planning, real-time coordination, and logistics for 120+ simultaneous players, 150+ registered this edition.

The other kind of systems

Opération Endgame is a large multiplayer DCS World operation I have designed and delivered yearly since 2020: a fixed-deadline four-hour event with planning, briefing, and real-time coordination of 120+ simultaneous players (150+ registered this edition), split into pilot, JTAC, AWACS/GCI, and logistics cells. It is project management under load: scoping the objective, holding a deadline, and aligning a large team in real time, proof of delivery and leadership an employer can read directly.

Mapping the French DCS scene

A live directory of French-speaking DCS World communities I built and host, with stats and infographics on the scene.

What it is

Commus indexes the French-speaking DCS World communities, 57 of them, with filtering, comparison, and a set of infographics: a periodic table of modules, a timeline, an activity pulse. A Vue front end I host, kept current by a small updater service. It is the data-and-interface counterpart to the leadership side of Opération Endgame.

/02 Expertise

By domain, each tied to proof

Five areas, each tied to the project that demonstrates it.

GPU & HPC

Proven by ORNL × Kokkos

Writing for the GPU and reasoning about what it costs, in time and now in energy.

CUDA
OpenMP & MPI
Kokkos & performance portability
GPU power & energy telemetry

Performance engineering & tooling

Proven by EDF · ASICS

Internal C++ tools that keep a large scientific codebase measurable, and the build pipeline around them.

Memory profiling (LD_PRELOAD)
CPU timing & instrumentation (PyBind11)
C++ build systems (CMake)
Debian packaging & CI (GitLab CI/CD, Jenkins)

Infrastructure & DevOps

Proven by sentinel cluster

The full path from a commit to a request served, and the reliability work behind it, on hardware I'm accountable for.

Proxmox VE clustering
Kubernetes / K3s
Traefik, TLS & reverse proxy
Docker & Gitea CI/CD

Full-stack & real-time

Proven by commus

Interfaces and live systems, including the one rendering this page.

Vue 3 / Nuxt 3
TypeScript
Self-hosting & deployment
Astro

Security

Proven by sentinel cluster

The defensive basics a self-hosted, internet-facing cluster forces you to get right.

TLS & PKI (Let's Encrypt, ACME)
Network segmentation (VLAN, WireGuard)
Edge & reverse-proxy hardening
Secrets & access hygiene

/03 Writing

Notes from the bench

Write-ups on HPC, GPU computing, infrastructure, and the projects behind them.

Read all posts

/04 Trajectory

Polytech → EDF → Oak Ridge

Ethan Puyaubreau a.k.a. Kerboul · DaKerboul Paris, France

It all comes from one place: a cluster I built at home, Sentinel. Ethan measures GPU energy and runs the production on it; Kerboul hosts his films and his community on it. Both ride the same hardware, which I run alone, from on-call to certificates. This site and its live readout come off it.

I work two tracks at once. One is high-performance computing: the GPU and performance work that makes scientific code fast. The other is the infrastructure that puts software into production and keeps it there: containers, pipelines, reverse proxies, and the cluster underneath. For a team, the rarer and more useful thing is someone credible at both: who tunes a GPU kernel and stays on call for the cluster that runs it.

At Oak Ridge National Laboratory I built GPU energy-measurement tooling for Kokkos, the US Department of Energy's performance-portability framework. The periodic-sampling daemon is merged upstream into Kokkos Tools, and the work became a poster at the 2025 Smoky Mountains Conference. It is what I would point a reviewer to first.

Alongside that I spent three years as an apprentice on HPC for nuclear simulation at EDF, and I run a five-node production cluster of my own: around twenty services behind Traefik and TLS, deployed with Docker and CI/CD, with image scanning and automatic rollback, and I am on call for the uptime, backups, and certificates. I have shipped HPC tooling and operated real infrastructure, not just studied them.

I finish my engineering degree at Polytech Paris-Saclay in September 2026 and am open to roles from January 2027. HPC labs are a natural fit, the Bay Area (Berkeley Lab, LLNL) and Paris with the CEA among them, but I am just as interested in infrastructure, DevOps, SRE, and platform engineering, on-prem or in the cloud: I would rather a role use both halves of this page than only one. Off the clock I have flown Kerbal Space Program since 2011, and I care about self-hosting and owning my data, which is why this site and the cluster behind it run on hardware I keep myself. If your team needs someone who can make a GPU kernel fast and keep the production cluster that runs it healthy, full-time, write to me: it is all at the bottom of the page.

/05 Contact

For recruiters, in one screen

Open to HPC, infrastructure & DevOps roles from January 2027

For HPC labs or infrastructure and platform teams, in the Bay Area or Paris.

The fastest way to reach me

Get in touch ethan.puyaubreau@gmail.com

From the GPU kernelto the cluster in production.

Measuring where the energy goes on the GPU

The problem

What I built

Where it stands

HPC for nuclear simulation

The context

What I built

Running my own production

The setup

Why it's here

Delivering an event for 120+ players

The other kind of systems

Mapping the French DCS scene

What it is

GPU & HPC

Performance engineering & tooling

Infrastructure & DevOps

Full-stack & real-time

Security

From the GPU kernel
to the cluster in production.