webgpu-dawn: Haskell bindings to WebGPU Dawn for GPU computing and graphics

[ gpu, graphics, library, mit, program ] [ Propose Tags ] [ Report a vulnerability ]

This package provides Haskell bindings to Google's Dawn WebGPU implementation, enabling GPU computing and graphics programming from Haskell. It wraps the gpu.cpp library which provides a high-level C++ interface to Dawn.

[Skip to Readme]

Flags

Manual Flags

Name	Description	Default
glfw	Enable GLFW support for windowed graphics applications	Enabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

webgpu-dawn-0.1.1.0.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

junjihashimoto

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1.0.0, 0.1.1.0
Dependencies	aeson (>=2.0 && <2.3), base (>=4.14 && <5), base64-bytestring (>=1.0 && <1.3), binary (>=0.8 && <0.9), bytestring (>=0.10 && <0.13), clock (>=0.8 && <0.9), containers (>=0.6 && <0.8), filepath (>=1.4 && <1.6), mtl (>=2.2 && <2.4), stm (>=2.5 && <2.6), text (>=1.2 && <2.1), transformers (>=0.5 && <0.7), unordered-containers (>=0.2.14 && <0.3), vector (>=0.12 && <0.14), webgpu-dawn [details]
License	MIT
Author	Junji Hashimoto
Maintainer	junji.hashimoto@gmail.com
Category	Graphics, GPU
Home page	https://github.com/junjihashimoto/webgpu-dawn
Source repo	head: git clone https://github.com/junjihashimoto/webgpu-dawn
Uploaded	by junjihashimoto at 2025-12-30T08:54:13Z
Distributions
Executables	bench-async-matmul, bench-optimized-matmul, bench-subgroup-matmul, bench-linear, async-pipeline-demo, chrome-tracing-demo, high-level-api, struct-field-offset, particle-system, kernel-fusion, layout-demo, struct-generics-dsl, vector-add-dsl, matmul-subgroup-dsl, shared-memory-reduction
Downloads	3 total (3 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs uploaded by user Build status unknown [no reports yet]

Readme for webgpu-dawn-0.1.1.0

[back to package description]

webgpu-dawn

High-level, type-safe Haskell bindings to Google's Dawn WebGPU implementation.

This library enables portable GPU computing with a Production-Ready DSL designed for high-throughput inference (e.g., LLMs), targeting 300 TPS (Tokens Per Second) performance.

⚡ Core Design Principles

To achieve high performance and type safety, this library adheres to the following strict patterns:

Type-Safe Monadic DSL: No raw strings. We use ShaderM for composability and type safety.
Natural Math & HOAS: Standard operators (+, *) and Higher-Order Abstract Syntax (HOAS) for loops (loop ... $ \i -> ...).
Profile-Driven: Performance tuning is based on Roofline Analysis.
Async Execution: Prefer AsyncPipeline to hide CPU latency and maximize GPU occupancy.
Hardware Acceleration: Mandatory use of Subgroup Operations and F16 precision for heavy compute (MatMul/Reduction).

🏎️ Performance & Profiling

We utilize a Profile-Driven Development (PDD) workflow to maximize throughput.

1. Standard Benchmarks & Roofline Analysis

Run the optimized benchmark to determine TFLOPS and check the Roofline classification (Compute vs Memory Bound).

# Run 2D Block-Tiling MatMul Benchmark (FP32)
cabal run bench-optimized-matmul -- --size 4096 --iters 50

Output Example:

[Compute]  137.4 GFLOPs
[Memory]   201.3 MB
[Status]   COMPUTE BOUND (limited by GPU FLOPs)
[Hint]     Use F16 and Subgroup Operations to break the roofline.

2. Visual Profiling (Chrome Tracing)

Generate a trace file to visualize CPU/GPU overlap and kernel duration.

cabal run bench-optimized-matmul -- --size 4096 --trace

Load: Open chrome://tracing or ui.perfetto.dev
Analyze: Import trace.json to identify gaps between kernel executions (CPU overhead).

3. Debugging

Use the GPU printf-style debug buffer to inspect values inside kernels.

-- In DSL:
debugPrintF "intermediate_val" val

🚀 Quick Start

1. High-Level API (Data Parallelism)

Zero boilerplate. Ideal for simple map/reduce tasks.

import WGSL.API
import qualified Data.Vector.Storable as V

main :: IO ()
main = withContext $ \ctx -> do
  input  <- toGPU ctx (V.fromList [1..100] :: V.Vector Float)
  result <- gpuMap (\x -> x * 2.0 + 1.0) input
  out    <- fromGPU' result
  print out

2. Core DSL (Explicit Control)

Required for tuning Shared Memory, Subgroups, and F16.

import WGSL.DSL

shader :: ShaderM ()
shader = do
  input  <- declareInputBuffer "in" (TArray 1024 TF16)
  output <- declareOutputBuffer "out" (TArray 1024 TF16)
   
  -- HOAS Loop: Use lambda argument 'i', NOT string "i"
  loop 0 1024 1 $ \i -> do
    val <- readBuffer input i
    -- f16 literals for 2x throughput
    let res = val * litF16 2.0 + litF16 1.0
    writeBuffer output i res

📚 DSL Syntax Cheatsheet

Types & Literals

Haskell Type	WGSL Type	Literal Constructor	Note
`Exp F32`	`f32`	`litF32 1.0` or `1.0`	Standard float
`Exp F16`	`f16`	`litF16 1.0`	Half precision (Fast!)
`Exp I32`	`i32`	`litI32 1` or `1`	Signed int
`Exp U32`	`u32`	`litU32 1`	Unsigned int
`Exp Bool_`	`bool`	`litBool True`	Boolean

Casting Helpers: i32(e), u32(e), f32(e), f16(e)

Control Flow (HOAS)

-- For Loop
loop start end step $ \i -> do ...

-- If Statement
if_ (val > 10.0) 
    (do ... {- then block -} ...) 
    (do ... {- else block -} ...)

-- Barrier
barrier  -- workgroupBarrier()

🧩 Kernel Fusion

For maximum performance, fuse multiple operations (Load -> Calc -> Store) into a single kernel to reduce global memory traffic.

import WGSL.Kernel

-- Fuse: Load -> Process -> Store
let pipeline = loadK inBuf >>> mapK (* 2.0) >>> mapK relu >>> storeK outBuf

-- Execute inside shader
unKernel pipeline i

📚 Architecture & Modules

Execution Model (Latency Hiding)

To maximize GPU occupancy, encoding is separated from submission.

WGSL.Async.Pipeline: Use for main loops. Allows CPU to encode Token N+1 while GPU processes Token N.
WGSL.Execute: Low-level synchronous execution (primarily for debugging).

Module Guide

Feature	Module	Description
Subgroup Ops	`WGSL.DSL`	`subgroupMatrixLoad`, `mma`, `subgroupMatrixStore`
F16 Math	`WGSL.DSL`	`litF16`, `vec4<f16>` for 2x throughput
Structs	`WGSL.Struct`	`Generic` derivation for `std430` layout compliance
Analysis	`WGSL.Analyze`	Roofline analysis logic

📦 Installation

Pre-built Dawn binaries are downloaded automatically during installation.

cabal install webgpu-dawn

License

MIT License - see LICENSE file for details.

Acknowledgments

Dawn (Google): Core WebGPU runtime.
gpu.cpp (Answer.AI): High-level C++ API wrapper inspiration.
GLFW: Window management.

Contact

Maintainer: Junji Hashimoto junji.hashimoto@gmail.com