{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a7000001",
   "metadata": {},
   "source": [
    "# Compute Shaders\n",
    "\n",
    "WebGPU allows running **compute shaders** on the GPU for general-purpose computation\n",
    "(currently single precision only). This is useful for parallel algorithms that\n",
    "don't need to render anything."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7000002",
   "metadata": {},
   "source": [
    "## Vector addition\n",
    "\n",
    "A minimal example: add two float arrays on the GPU and read the result back."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a7000003",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import webgpu.jupyter  # sets up browser connection for GPU access\n",
    "from webgpu.utils import (\n",
    "    get_device, buffer_from_array, uniform_from_array,\n",
    "    BufferBinding, UniformBinding, BufferUsage,\n",
    "    run_compute_shader, read_buffer,\n",
    ")\n",
    "\n",
    "device = get_device()\n",
    "\n",
    "a = np.array([1, 2, 3], dtype=np.float32)\n",
    "b = np.array([4, 5, 6], dtype=np.float32)\n",
    "\n",
    "N = a.size\n",
    "mem_size = a.size * a.itemsize\n",
    "\n",
    "a_gpu = buffer_from_array(a)\n",
    "b_gpu = buffer_from_array(b)\n",
    "res_gpu = device.createBuffer(mem_size, BufferUsage.STORAGE | BufferUsage.COPY_SRC)\n",
    "uniform_N = uniform_from_array(np.array([N], dtype=np.uint32))\n",
    "\n",
    "bindings = [\n",
    "    BufferBinding(101, a_gpu),\n",
    "    BufferBinding(102, b_gpu),\n",
    "    BufferBinding(103, res_gpu, read_only=False),\n",
    "    UniformBinding(104, uniform_N),\n",
    "]\n",
    "\n",
    "shader_code = \"\"\"\n",
    "@group(0) @binding(101) var<storage> vec_a : array<f32>;\n",
    "@group(0) @binding(102) var<storage> vec_b : array<f32>;\n",
    "@group(0) @binding(103) var<storage, read_write> vec_res : array<f32>;\n",
    "@group(0) @binding(104) var<uniform> N : u32;\n",
    "\n",
    "@compute @workgroup_size(256, 1, 1)\n",
    "fn main(@builtin(global_invocation_id) gid: vec3<u32>) {\n",
    "    let tid = gid.x;\n",
    "    if (tid < N) {\n",
    "        vec_res[tid] = vec_a[tid] + vec_b[tid];\n",
    "    }\n",
    "}\n",
    "\"\"\"\n",
    "\n",
    "run_compute_shader(shader_code, bindings, n_workgroups=((N + 255) // 256, 1, 1))\n",
    "result = read_buffer(res_gpu, np.float32)\n",
    "print(f\"a = {a}\")\n",
    "print(f\"b = {b}\")\n",
    "print(f\"a + b = {result}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7000004",
   "metadata": {},
   "source": [
    "## Key concepts\n",
    "\n",
    "### Workgroups and threads\n",
    "\n",
    "- `@workgroup_size(256, 1, 1)` defines 256 threads per workgroup.\n",
    "- `n_workgroups` sets how many workgroups to dispatch — here `ceil(N / 256)`.\n",
    "- Each thread gets a unique `global_invocation_id` (`gid.x` = workgroup index × 256 + local index).\n",
    "- The `if (tid < N)` guard prevents out-of-bounds access when N isn't a multiple of 256.\n",
    "\n",
    "### Buffer types\n",
    "\n",
    "| Function | Usage | Description |\n",
    "|----------|-------|-------------|\n",
    "| `buffer_from_array(arr)` | `STORAGE \\| COPY_SRC` | Read-only storage buffer initialized from a numpy array |\n",
    "| `device.createBuffer(size, usage)` | manual | Raw buffer — use `STORAGE \\| COPY_SRC` for read-write output |\n",
    "| `uniform_from_array(arr)` | `UNIFORM` | Small constant data (e.g. problem size N) |\n",
    "\n",
    "### Bindings\n",
    "\n",
    "- `BufferBinding(slot, buf)` — read-only `var<storage>` by default.\n",
    "- `BufferBinding(slot, buf, read_only=False)` — read-write `var<storage, read_write>`.\n",
    "- `UniformBinding(slot, buf)` — `var<uniform>`.\n",
    "\n",
    "### Reading results\n",
    "\n",
    "`read_buffer(gpu_buf, dtype)` copies data back from GPU to CPU as a numpy array.\n",
    "The buffer must have `COPY_SRC` usage for this to work."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.13.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}