GPU Engineer
KogYou would own low-level kernel work in CUDA/PTX or HIP/CDNA ISA, the monokernel pipeline, profiling infrastructure inside it, scaling to the frontier MoE models that run in production, and building our own agents that optimize kernels and inference autonomously.
We generate 3,000 tokens/s per request on 8x AMD MI300X and 2,100 on 8x NVIDIA H200, at batch size 1, FP16, no speculative decoding.
At batch size 1, the decode is GEMV, so it is memory bandwidth bound, and MBU is what counts.
We rewrote the whole hot path ourselves, from the assembly on the chip up to the Transformer we designed around it, with the full decode running as a single persistent GPU kernel.
Try it at https://playground.kog.ai
Showing your code is part of the process.
If you are outside a Europe-compatible timezone, relocation to one is required.
Apply: https://jobs.ashbyhq.com/kog/e3950334-a2a6-43cc-a744-df6c386...
Opens the company's application page
Listed via
Findwork
findwork.dev
Similar roles
Design & Tech
Related reads from TCHNX

The Inference Economy: Why AI’s Biggest Cost Shift Is Happening After Training
A major shift in AI economics is reshaping the industry. As training frontier models becomes more expensive and inference becomes dramatically cheaper, companies are being forced to rethink how they build, deploy, price, and monetise intelligent systems.

The Emergence of Small Language Models: Why Efficiency Is Overtaking Scale
As the AI industry confronts computational costs and environmental concerns, a new generation of compact models is proving that bigger isn't always better. Small language models are reshaping enterprise AI deployment.

Algorithmic Bias in Design Systems: Why Your AI-Generated UI Might Exclude Users
As AI tools increasingly generate interface components, they're embedding biases that systematically exclude users. Understanding how machine learning models inherit prejudice is essential for creating truly inclusive design systems.

