{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Compute Shader\n", "\n", "In this tutorial, we learn how to run simple compute shaders.\n", "\n", "We start by importing `slangpy` and `numpy`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import slangpy as spy\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we create a `Device` instance.\n", "This object is used for creating and managing resources on the GPU." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "device = spy.Device()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most objects in `slangpy` will display useful information when being printed:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Device(\n", " type = d3d12,\n", " adapter_name = \"NVIDIA GeForce RTX 4090\",\n", " adapter_luid = 00000000000000000000000000000000,\n", " enable_debug_layers = false,\n", " supported_shader_model = sm_6_7,\n", " shader_cache_enabled = false,\n", " shader_cache_path = \"\"\n", ")\n" ] } ], "source": [ "print(device)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At a glance we can see what type of underlying graphics API is being used, if debug layers are enabled, the default shader model being used etc.\n", "\n", "Next, we write a simple slang compute kernel that adds two floating point arrays. We mark our shader entry point using the `[[shader(\"compute\")]]` attribute. This will allow the slang compiler to find the entry point by name.\n", "\n", "[comment]: <> (embed compute_shader.slang)\n", "```C#\n", "// compute_shader.slang\n", "\n", "[shader(\"compute\")]\n", "[numthreads(32, 1, 1)]\n", "void main(\n", " uint tid: SV_DispatchThreadID,\n", " uniform uint N,\n", " StructuredBuffer a,\n", " StructuredBuffer b,\n", " RWStructuredBuffer c\n", ")\n", "{\n", " if (tid < N)\n", " c[tid] = a[tid] + b[tid];\n", "}\n", "```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can load the shader program using `load_program`, passing in the shader module name and the entry point name.\n", "Once we have the program loaded, we can create a new compute kernel using the loaded program." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "program = device.load_program(module_name=\"compute_shader.slang\", entry_point_names=[\"main\"])\n", "kernel = device.create_compute_kernel(program=program)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We continue to create buffers to pass to our compute shader. Buffers `a` and `b` will be used as input only, while buffer `c` will be used as an output.\n", "We create all three buffers as _structured buffers_, using the kernels reflection data to determine the size of each element in the buffer.\n", "Buffers `a` and `b` are initialized with linear sequences using `numpy.linspace`.\n", "Buffer `c` is not initialized, but we have to set its `usage` to `sp.BufferUsage.unordered_access` in order to allow GPU side writes.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "buffer_a = device.create_buffer(\n", " element_count=1024,\n", " struct_type=kernel.reflection.main.a,\n", " usage=spy.BufferUsage.shader_resource,\n", " data=np.linspace(0, 1, 1024, dtype=np.float32),\n", ")\n", "buffer_b = device.create_buffer(\n", " element_count=1024,\n", " struct_type=kernel.reflection.main.b,\n", " usage=spy.BufferUsage.shader_resource,\n", " data=np.linspace(1, 0, 1024, dtype=np.float32),\n", ")\n", "buffer_c = device.create_buffer(\n", " element_count=1024,\n", " struct_type=kernel.reflection.main.c,\n", " usage=spy.BufferUsage.unordered_access,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now dispatch the compute kernel. We first specify the number of threads to run using `thread_count=[1024, 1, 1]`. This will automatically be converted to a number of thread groups to run based on the thread group size specified in the shader (`[numthreads(32,1,1)]`). We pass the entry point parameters using additional `kwargs`." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "kernel.dispatch(thread_count=[1024, 1, 1], N=1024, a=buffer_a, b=buffer_b, c=buffer_c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After the dispatch, we can read back the contents of the `c` buffer to a numpy array and print it." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 1. 1. ... 1. 1. 1.]\n" ] } ], "source": [ "data = buffer_c.to_numpy().view(np.float32)\n", "print(data)\n", "assert np.all(data == 1.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### See also\n", "\n", "- [slangpy.Device][1]\n", "- [slangpy.SlangModule][2]\n", "- [slangpy.ComputeKernel][3]\n", "\n", "[1]: ../api_reference.html#slangpy.Device\n", "[2]: ../api_reference.html#slangpy.SlangModule\n", "[3]: ../api_reference.html#slangpy.ComputeKernel" ] } ], "metadata": { "kernelspec": { "display_name": "sgl", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 2 }