Eivind Liland

Parallel Accelerator Specialist

Silicon & Software Architecture

I work on either side of parallel compute. Whether that is designing and verifying out-of-order SIMT cores and coherent memory subsystems, or translating theoretical parallel algorithms into viable microarchitecture. I bridge the gap between physical silicon constraints and high-performance software.

Early engineer at Falanx — the original Mali GPU (acquired by ARM, 2006). Co-founder of Swarm64 — FPGA database accelerator (acquired by ServiceNow, 2021).

Get in touch

Global Remote

On-site Berlin

What I Do

Parallel Architectures

Compute architecture for GPUs, accelerators, and AI processors. Memory bandwidth walls, dataflow scheduling, keeping arithmetic units fed - I've been solving these across single-core to massively parallel systems since before they carried an AI label.

Hardware-Aware Software

I write software that exploits the underlying architecture, from graphics and compute shaders in Vulkan to parallel algorithms designed for specific systems. I specialize in wringing cycles out of constrained hardware.

III

RTL Design

Digital design in SystemVerilog. I write RTL with verification in mind from the start, with testbenches alongside each module, so what I hand over is already substantially validated.

Modeling & Verification

I build models for design exploration, pre-silicon software development, and as golden references for RTL verification. I write constrained-random testbenches and coverage points, and hunt down the bugs hiding in deep pipelines and coherent caches.

Things I've Built

Architecture

Massively multicore FPGA processing platform2012–2016Chief architect, Swarm64, 2012–2016. A fabric of many 4-core clusters sharing L1 cache and an ALU pool per cluster. Each core ran barrel-threaded, multi-lane SIMT warps with scoreboarded out-of-order completion. Parallel execution at every level — barrel threading, scoreboarding, caches, interconnect — sustaining thousands of in-flight transactions across the fabric. Built on Xilinx FPGAs.
2D interconnect for manycore processor2012–2016Lead architect, Swarm64, 2012–2016. High-bandwidth topology chosen for FPGA routing efficiency, sustaining thousands of simultaneous out-of-order requests across the multicore fabric.
Hierarchical tiler for mobile GPU2007–2009Co-developed architecture with new ARM Cambridge team, 2007–2009. Spent half a year in Cambridge helping establish the team post-acquisition. Key stage in tile-based deferred rendering — sorting and binning geometry into screen-space tiles before rasterization.
Five patents in graphics and parallel accelerators2006–2018Co-inventor. Falanx/ARM and Swarm64, 2006–2018. Patents covering graphics processing systems, hierarchical tiling, graphics rendering, and graphics pipeline innovations.
FPGA-based GPU for Gameboy Advance (not completed)~2006Sole developer, Personal project, ~2006. A programmable GPU on a low-cost FPGA inside a GBA cartridge, providing a powerful upgrade — the console has no graphics hardware beyond a simple 2D sprite/tile engine.

Software

Pump geometry tools for rocket engines and zero-emission aircraft2019–2022Co-developer, Orbital Machines, 2019–2022. Python tools for parametric generation and optimization of centrifugal pump impeller and volute geometry.
Vulkan drivers for ARM Mali2017–2018Team contributor. ARM, 2017–2018. Low-level driver development for the Mali GPU series.
Texas — real-time 3D in under 4,096 bytes. 1st place NVScene, Scene.org Award2008Coder, Keyboarders, 2008. Built on the then-new GeForce 8 — NVIDIA's first SIMT architecture with a unified shader pipeline. DX10 geometry shaders, screen-space ambient occlusion, procedural geometry, audio remixed from a Vista system sample. Pouët all-time #42.
Five Finger Discount — 3D engine for Gameboy Advance. Scene.org Award nominations2005–20063D engine programmer (two-person coding team). Shitfaced Clowns, 2005–2006. Software rendering, fixed-point math, everything from scratch — no OS, no libraries. 2nd place Breakpoint 2006. Nominated for Scene.org Awards in best demo and best effects.

RTL Design

Barrel-threaded OoO SIMT processor2012–2016Lead designer. Swarm64, 2012–2016. Thousands-thread RISC SIMT core in SystemVerilog with deeply barrel-threaded 16-lane warps and scoreboarded out-of-order completion.
Fixed- and floating-point arithmetic units2012–2016Lead designer. Swarm64, 2012–2016. Fixed- and floating-point ALU pool with LUT units for special functions, shared across the four cores in each cluster.
2D interconnect and coherent caches2012–2016Lead designer on interconnect, contributor on caches. Swarm64, 2012–2016. SystemVerilog implementation of the mesh interconnect; design and validation support on the L1/L2 hierarchy with support for thousands of outstanding transfers, masking latency to SSD.
Hierarchical triangle rasterizer~2005Personal project, ~2005. High-density, high-bandwidth design for hierarchical sub-pixel accurate, perspective correct, UV mapped triangle rasterizer for GBA FPGA GPU prototype.

Verification

Swarm64 SIMT cores, ALUs, caches and interconnect2012–2016Verification engineer. Swarm64, 2012–2016. Constrained-random verification of the parallel processor fabric I helped design.
Swarm64 CI pipeline for hardware regression and validation2012–2016Swarm64, 2012–2016. Automated coverage tracking, regression, build, and validation flows built on open-source tooling (Verilator-based) - inspired by software methods for CI, which at the time were further along than the typical UVM methodology.
Mali GPU shader engine, texture mapper, tile buffer and caches2003–2009Verification engineer. Falanx/ARM, 2003–2009. Verification, debugging, and bug-fixing across the Mali GPU pipeline: VLIW shader processor, texture mapper, tile buffer with resolver, and caches with massive support for outstanding transfers. Live-lock prevention in the shader engine and texture mapper was a particular focus, with complex interactions across multiple clock domains and deep pipelines. UVM-based testbenches at the module level, with system-level validation on FPGA prototypes.
Mali GPU tech demos for marketing and validation2003–2006Lead developer. Falanx, 2003–2006. Non-interactive game-style 3D demos in OpenGL ES running on FPGA prototypes at a fraction of final silicon speed, delivering visual quality beyond what audiences expected from shipping GPUs at the time.

Where I've Been

ARM Mali GPU

In 2003, I took a job beside my studies at Norwegian start-up Falanx Microsystems, developing pre-silicon 3D tech demos for early prototypes of the Mali GPU running on FPGA. Over the coming years, I contributed across software, RTL design, verification and hardware architecture. ARM acquired us in 2006, and Mali powers billions of devices today.

Swarm64

Co-founded Swarm64. We built FPGA-based hardware that accelerated database computation — massively parallel, deployed in the cloud. Partnered with Intel and Xilinx. Acquired by ServiceNow in 2021.

Orbital Machines

Founded Orbital Machines, a sociocratic newspace startup that scaled to 15 employees. Worked on propellant pump designs for several customers in the launcher, lander, and zero-emission aviation space. Contributed to the Python tooling for parametric pump geometry design and optimization.

Flux & Flow AS (present)

Independent consulting through my own firm since 2023 — continuing in aerospace with propellant systems and pump design before returning to parallel architecture, RTL, verification and programming for GPUs.

Loads from YouTube — Google will receive your IP. Privacy notice

How I Work

I partner with teams on a contract basis, billed hourly. I typically structure my work around 2-to-6-month dedicated phases. This provides the runway to deeply integrate with your architecture, solve fundamental structural bottlenecks, and execute a clean handover.

I'm available for remote work globally, or on-site in the Berlin area. Happy to start with a short call to see if it's a fit.

Ready to talk?

If you're designing hardware — or developing software for a system with massively out-of-order parallel compute and coherent multi-level memory hierarchies — let's talk.

Get in Touch