GPU Engineer

Paris, FranceRemotedata Today

We are hiring a GPU Engineer to work on the fastest LLM inference engine on standard datacenter GPUs.
You would own low-level kernel work in CUDA/PTX or HIP/CDNA ISA, the monokernel pipeline, profiling infrastructure inside it, scaling to the frontier MoE models that run in production, and building our own agents that optimize kernels and inference autonomously.
We generate 3,000 tokens/s per request on 8x AMD MI300X and 2,100 on 8x NVIDIA H200, at batch size 1, FP16, no speculative decoding.
At batch size 1, the decode is GEMV, so it is memory bandwidth bound, and MBU is what counts.
We rewrote the whole hot path ourselves, from the assembly on the chip up to the Transformer we designed around it, with the full decode running as a single persistent GPU kernel.
Try it at https://playground.kog.ai
Showing your code is part of the process.
If you are outside a Europe-compatible timezone, relocation to one is required.
Apply: https://jobs.ashbyhq.com/kog/e3950334-a2a6-43cc-a744-df6c386...

Opens the company's application page

About the company

Kog

All open roles

Listed via

Findwork

findwork.dev

Similar roles

Data Engineer

Plexus

United KingdomRemote £100k – £180k/yr

Azure Data Engineer

Ashdown Group

London, UKHybrid £70k – £80k/yr

Sr Engagement Manager - AI Practice – EMEA

Workday

Nederland, NLOn-site

Internship: AI Tool Development (non-thesis)

Philips

Nederland, NLOn-site

Design & Tech

Related reads from TCHNX

View all →

The Inference Economy: Why Running AI Models Just Got Cheaper Than Training Them

The Inference Economy: Why AI’s Biggest Cost Shift Is Happening After Training

A major shift in AI economics is reshaping the industry. As training frontier models becomes more expensive and inference becomes dramatically cheaper, companies are being forced to rethink how they build, deploy, price, and monetise intelligent systems.

tchnx.com

The Emergence of Small Language Models: Why Efficiency Is Overtaking Scale

As the AI industry confronts computational costs and environmental concerns, a new generation of compact models is proving that bigger isn't always better. Small language models are reshaping enterprise AI deployment.

tchnx.com

Algorithmic Bias in Design Systems: Why Your AI-Generated UI Might Exclude Users

As AI tools increasingly generate interface components, they're embedding biases that systematically exclude users. Understanding how machine learning models inherit prejudice is essential for creating truly inclusive design systems.

tchnx.com