

Eivind Liland
Parallel Accelerator Specialist
Hardware & Software Consultant
I work across the entire stack: from RTL design and verification to low-level software and parallel algorithms. Whether architecting out-of-order SIMT cores and parallel memory subsystems for AI and compute, or writing highly optimized GPU compute shaders and bare-metal rendering engines, I love the challenge of finding trade-offs that maximize throughput with minimal power.
Video introduction
What I Do
Parallel Architectures
Compute architecture for GPUs, accelerators, and AI processors. The core problems in modern AI hardware — memory bandwidth walls, dataflow scheduling, keeping arithmetic units fed — are problems I've been working on since before they had an AI label. That covers systems from single embedded processors to thousands of parallel execution units. Available for hands-on design or high-level advisory.
Software Development
I've built software at every level of abstraction — from bare-metal C/C++, assembly, and extreme constraint optimization, to GPU-accelerated computing and algorithms in Python. Whether the problem is wringing cycles out of constrained hardware or writing procedural generators for massively parallel systems, I can help.
RTL Design
Digital design in SystemVerilog. I write RTL with verification in mind from the start, with constrained-random testbenches alongside each module, so what I hand over is already substantially validated.
Verification
UVM-based verification from module-level constrained-random to system-level integration. If you're building custom silicon or FPGA systems and need verification methodology or execution, I can step in at any level.
Things I've Built
GPU Architecture
- Designed out-of-order WARP scheduling logic for a mobile GPU core
- Built custom memory hierarchy for bandwidth-constrained AI accelerator
- Prototyped novel register file design reducing area by 30%
RTL & Silicon
- Full RTL implementation of a RISC-V vector extension subset
- UVM testbench infrastructure for multi-million gate SoC
- FPGA prototype of custom matrix multiply unit
Software & Algorithms
- Procedural real-time rendering algorithms under extreme memory/size constraints
- GPU-accelerated computational fluid dynamics solver
- Custom shader compiler backend for proprietary GPU ISA
- Real-time signal processing pipeline on embedded DSP
Systems & Integration
- End-to-end verification environment for PCIe Gen4 controller
- Driver stack for custom AI inference accelerator
- Performance modeling framework for early-stage architecture exploration
These are placeholders — replace with your actual projects and accomplishments.
Where I've Been
ARM Mali GPU
Early employee at Falanx Microsystems, a startup in Norway that built the Mali GPU from scratch. Contributed across the stack, spanning RTL design and verification of the GPU to writing bare-metal software and pre-silicon tech demos for early FPGA prototypes. ARM acquired us in 2007 — Mali powers billions of devices today.
Swarm64
Co-founded Swarm64. We built FPGA-based hardware that accelerated database computation — massively parallel, deployed in the cloud. Partnered with Intel and Xilinx. Acquired by ServiceNow.
Orbital Machines
Founded Orbital Machines, a sociocratic newspace startup that contributed to a number of space industry vehicles. Wrote Python software for designing and optimizing 3D propellant pump geometries for rocket engines.
How I Work
I partner with teams on a contract basis — whether that means short-term consulting, extended project work, or strategic advisory.
I'm available for remote work globally, or on-site in the Berlin area. I am always happy to start with a quick conversation to find the engagement model that best suits your needs.
Ready to talk?
From high-level architectural guidance to hands-on implementation and verification, let's discuss how I can help accelerate your next project.
Get in Touch