{ "cells": [ { "cell_type": "markdown", "id": "a7000001", "metadata": {}, "source": [ "# Compute Shaders\n", "\n", "WebGPU allows running **compute shaders** on the GPU for general-purpose computation\n", "(currently single precision only). This is useful for parallel algorithms that\n", "don't need to render anything." ] }, { "cell_type": "markdown", "id": "a7000002", "metadata": {}, "source": [ "## Vector addition\n", "\n", "A minimal example: add two float arrays on the GPU and read the result back." ] }, { "cell_type": "code", "execution_count": null, "id": "a7000003", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import webgpu.jupyter # sets up browser connection for GPU access\n", "from webgpu.utils import (\n", " get_device, buffer_from_array, uniform_from_array,\n", " BufferBinding, UniformBinding, BufferUsage,\n", " run_compute_shader, read_buffer,\n", ")\n", "\n", "device = get_device()\n", "\n", "a = np.array([1, 2, 3], dtype=np.float32)\n", "b = np.array([4, 5, 6], dtype=np.float32)\n", "\n", "N = a.size\n", "mem_size = a.size * a.itemsize\n", "\n", "a_gpu = buffer_from_array(a)\n", "b_gpu = buffer_from_array(b)\n", "res_gpu = device.createBuffer(mem_size, BufferUsage.STORAGE | BufferUsage.COPY_SRC)\n", "uniform_N = uniform_from_array(np.array([N], dtype=np.uint32))\n", "\n", "bindings = [\n", " BufferBinding(101, a_gpu),\n", " BufferBinding(102, b_gpu),\n", " BufferBinding(103, res_gpu, read_only=False),\n", " UniformBinding(104, uniform_N),\n", "]\n", "\n", "shader_code = \"\"\"\n", "@group(0) @binding(101) var vec_a : array;\n", "@group(0) @binding(102) var vec_b : array;\n", "@group(0) @binding(103) var vec_res : array;\n", "@group(0) @binding(104) var N : u32;\n", "\n", "@compute @workgroup_size(256, 1, 1)\n", "fn main(@builtin(global_invocation_id) gid: vec3) {\n", " let tid = gid.x;\n", " if (tid < N) {\n", " vec_res[tid] = vec_a[tid] + vec_b[tid];\n", " }\n", "}\n", "\"\"\"\n", "\n", "run_compute_shader(shader_code, bindings, n_workgroups=((N + 255) // 256, 1, 1))\n", "result = read_buffer(res_gpu, np.float32)\n", "print(f\"a = {a}\")\n", "print(f\"b = {b}\")\n", "print(f\"a + b = {result}\")" ] }, { "cell_type": "markdown", "id": "a7000004", "metadata": {}, "source": [ "## Key concepts\n", "\n", "### Workgroups and threads\n", "\n", "- `@workgroup_size(256, 1, 1)` defines 256 threads per workgroup.\n", "- `n_workgroups` sets how many workgroups to dispatch — here `ceil(N / 256)`.\n", "- Each thread gets a unique `global_invocation_id` (`gid.x` = workgroup index × 256 + local index).\n", "- The `if (tid < N)` guard prevents out-of-bounds access when N isn't a multiple of 256.\n", "\n", "### Buffer types\n", "\n", "| Function | Usage | Description |\n", "|----------|-------|-------------|\n", "| `buffer_from_array(arr)` | `STORAGE \\| COPY_SRC` | Read-only storage buffer initialized from a numpy array |\n", "| `device.createBuffer(size, usage)` | manual | Raw buffer — use `STORAGE \\| COPY_SRC` for read-write output |\n", "| `uniform_from_array(arr)` | `UNIFORM` | Small constant data (e.g. problem size N) |\n", "\n", "### Bindings\n", "\n", "- `BufferBinding(slot, buf)` — read-only `var` by default.\n", "- `BufferBinding(slot, buf, read_only=False)` — read-write `var`.\n", "- `UniformBinding(slot, buf)` — `var`.\n", "\n", "### Reading results\n", "\n", "`read_buffer(gpu_buf, dtype)` copies data back from GPU to CPU as a numpy array.\n", "The buffer must have `COPY_SRC` usage for this to work." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.13.11" } }, "nbformat": 4, "nbformat_minor": 5 }