Compute Shaders#
WebGPU allows running compute shaders on the GPU for general-purpose computation (currently single precision only). This is useful for parallel algorithms that don’t need to render anything.
Vector addition#
A minimal example: add two float arrays on the GPU and read the result back.
[1]:
import numpy as np
import webgpu.jupyter # sets up browser connection for GPU access
from webgpu.utils import (
get_device, buffer_from_array, uniform_from_array,
BufferBinding, UniformBinding, BufferUsage,
run_compute_shader, read_buffer,
)
device = get_device()
a = np.array([1, 2, 3], dtype=np.float32)
b = np.array([4, 5, 6], dtype=np.float32)
N = a.size
mem_size = a.size * a.itemsize
a_gpu = buffer_from_array(a)
b_gpu = buffer_from_array(b)
res_gpu = device.createBuffer(mem_size, BufferUsage.STORAGE | BufferUsage.COPY_SRC)
uniform_N = uniform_from_array(np.array([N], dtype=np.uint32))
bindings = [
BufferBinding(101, a_gpu),
BufferBinding(102, b_gpu),
BufferBinding(103, res_gpu, read_only=False),
UniformBinding(104, uniform_N),
]
shader_code = """
@group(0) @binding(101) var<storage> vec_a : array<f32>;
@group(0) @binding(102) var<storage> vec_b : array<f32>;
@group(0) @binding(103) var<storage, read_write> vec_res : array<f32>;
@group(0) @binding(104) var<uniform> N : u32;
@compute @workgroup_size(256, 1, 1)
fn main(@builtin(global_invocation_id) gid: vec3<u32>) {
let tid = gid.x;
if (tid < N) {
vec_res[tid] = vec_a[tid] + vec_b[tid];
}
}
"""
run_compute_shader(shader_code, bindings, n_workgroups=((N + 255) // 256, 1, 1))
result = read_buffer(res_gpu, np.float32)
print(f"a = {a}")
print(f"b = {b}")
print(f"a + b = {result}")
a = [1. 2. 3.]
b = [4. 5. 6.]
a + b = [5. 7. 9.]
Key concepts#
Workgroups and threads#
@workgroup_size(256, 1, 1)defines 256 threads per workgroup.n_workgroupssets how many workgroups to dispatch — hereceil(N / 256).Each thread gets a unique
global_invocation_id(gid.x= workgroup index × 256 + local index).The
if (tid < N)guard prevents out-of-bounds access when N isn’t a multiple of 256.
Buffer types#
Function |
Usage |
Description |
|---|---|---|
|
|
Read-only storage buffer initialized from a numpy array |
|
manual |
Raw buffer — use |
|
|
Small constant data (e.g. problem size N) |
Bindings#
BufferBinding(slot, buf)— read-onlyvar<storage>by default.BufferBinding(slot, buf, read_only=False)— read-writevar<storage, read_write>.UniformBinding(slot, buf)—var<uniform>.
Reading results#
read_buffer(gpu_buf, dtype) copies data back from GPU to CPU as a numpy array. The buffer must have COPY_SRC usage for this to work.