Changelog¶

SlangPy uses a semantic versioning policy for its API.

Version 0.42.0 (May 28, 2026)¶

Add enable_cuda_launch_from_gfx and enable_ray_tracing device options for Vulkan devices. These allow applications to opt out of CUDA launch and ray-tracing extensions that can interfere with concurrent cuDNN usage on some driver/GPU pairs.
Add the free-standing device API for creating and managing resources without going through the legacy object-oriented Device wrappers. (PR #960)
Add external ImGui draw data rendering, imgui_bundle integration, docking support, and UI/window improvements. (PR #858, PR #944, PR #949, PR #957)
Add more input event bindings for tests and scripted event creation. (PR #952)
Add hit_group_names to ray_tracing() and fix ray-tracing pipeline entry point setup. (PR #975, PR #974)
Add support for preludes in generated kernels. (PR #980)
Automatically marshal Python objects that expose get_this(). (PR #961)
Support structured numpy dtypes in NDBuffer and Tensor. (PR #636, PR #907)
Add generic bool reflection support. (PR #978)
Expose function user attributes in reflection. (PR #983)
Update slang-rhi and expose callback and feature-list APIs, CUDA fp8 reporting, Vulkan native texture handle support, barrier fixes, acceleration-structure fixes, task-pool improvements, and optional Vulkan extension gating. (PR #988, PR #973, PR #951)
Add cursor reinterpretation support and allow reading pointer values as uint. (PR #789, PR #817)
Add ASTC format support to the format enumeration and info table. (PR #953)
Improve SHA1 performance. (PR #971)
Add build metrics tooling and precompiled-header build optimizations. (PR #969)
Fix SlangSession::load_module_from_source. (PR #966)
Fix handling of layout exceptions, overloads that return None from reflection, generic parameter forward references, and generated Slang address-of syntax. (PR #956, PR #976, PR #945, PR #955)
Fix flip_y producing non-contiguous arrays in load_buffer_data_from_image. (PR #937)
Tweak the torch fallback path. (PR #946)
Fix macOS and Windows compiler warnings and compatibility issues. (PR #911, PR #947)
Fix Linux aarch64 wheel builds under QEMU. (PR #942)
Update GitHub Actions, add remote formatting and pre-commit acknowledgement workflows, and configure CodeRabbit review behavior. (PR #964, PR #981, PR #982, PR #984, PR #985, PR #972)

Version 0.41.0 (April 15, 2026)¶

Rewrite of Tensors and removal of NDBuffer in favour of unified Tensor type. (PR #697)
Kernel generation overhaul: Rewrote kernel generation with direct binding, entry point arguments, and removal of trampoline functions for cleaner and more efficient generated shaders. (PR #863, PR #870, PR #876, PR #879)
Move cached function call path from Python to C++ for significantly reduced per-call overhead. (PR #869)
Native PyTorch autograd integration: Full native torch autograd support with retain_graph, proper VRAM lifecycle management, and torch.nn.parameter.Parameter compatibility. (PR #816, PR #781, PR #921, PR #891)
CUDA performance optimization: Reduced CUDA context management overhead by ~20× by removing per-call context push/pop from slang-rhi. When using PyTorch interop, the shared primary context is already set by PyTorch, so no user action is typically required. For edge cases, new APIs are exposed:
- device.set_cuda_context_current() - Set context for this thread (multi-GPU, multi-threading)
- device.cuda_context_scope() - Context manager for temporary context switching
(PR #774)
Dispatch hot path optimizations: Eliminate heap allocations from cached dispatch, pack/unpack optimization, optimized value types, explicit shader object binding, block allocator, cached device addresses, short vector for shader object refs, and optimised uniform setting of tensors. (PR #872, PR #815, PR #814, PR #812, PR #707, PR #709, PR #708, PR #741, PR #712)
Add DiffTensorView<T> and TensorView<T> support in slangpy with _threadcount and float<N> support. (PR #775, PR #818)
Add loadOnce / loadUniform to DiffTensor for optimized backward pass memory access. (PR #910)
Support reinterpreting torch.Tensor as Tensor<StructType, N> for structured GPU data. (PR #906)
Add torch.bool support for TensorView. (PR #898)
PyTorch interop optimizations including faster numpy array detection and optimized tensor marshalling. (PR #759, PR #802)
Add SlangSession::compose_modules API for programmatic module composition. (PR #894)
Add Bitmap::resample() functions and reconstruction filters. (PR #926)
Add sample function to Tensor. (PR #809)
Support for combined texture/sampler descriptor handles. (PR #765)
Add TextureLoader::load_texture overloads for multiple options and format_callback for texture conversion. (PR #767, PR #737)
Add support for specifying sampler when creating textures and texture views (CUDA). (PR #748)
Add enable_experimental_features option to SlangCompilerOptions. (PR #771)
Support generic entrypoints in the functional API. (PR #670)
Cooperative Vector improvements. (PR #699)
Complete slangpy matrix multiplication support. (PR #674)
Extend Window properties for resizing and positioning. (PR #698)
Add DescriptorHandle default constructor. (PR #897)
Add write function for binding. (PR #893)
Add [Differentiable] to getters so they satisfy differentiability constraints for interface requirements. (PR #895)
Add spaceship operator (<=>) for quaternion, matrix, and vector types. (PR #927)
Add std::hash specializations for vector, matrix, and quaternion types. (PR #889, PR #888)
Add comparator to TypeConformances. (PR #871)
Add SGL_ENUM_FLAGS_INFO for improved enum flag introspection. (PR #932)
Expose debug options in device constructor. (PR #710)
Configure the SPIRV_DIS downstream compiler path. (PR #701)
Add Aftermath flag for GPU crash debugging on supported platforms. (PR #785)
Improve static_vector and short_vector containers. (PR #752)
Crashpad integration for automated crash reporting. (PR #726, PR #729)
Initialize logger on first use to avoid initialization order issues. (PR #931)
Reduce logging output for cleaner runtime experience. (PR #890)
Filter unicode in source files for broader platform compatibility. (PR #930)
Wrap remaining Slang API calls with SGL_CATCH_INTERNAL_SLANG_ERROR for consistent error handling. (PR #857)
Fix scalar DiffPair backward pass codegen. (PR #917)
Fix slangpy.Tensor backward pass through DiffTensorView. (PR #920)
Fix crash and incorrect exception with null gradients. (PR #882)
Fix zero-size dispatch causing CUDA SIGABRT. (PR #905)
Fix array-of-vector return types for numpy and torch. (PR #873)
Fix array-type returns. (PR #676)
Fix type resolution for arrays of StructuredBuffer parameters. (PR #792)
Fix Texture3D parameters failing with “invalid dimensionality 1”. (PR #754)
Fix float3 alignment bug on Metal for gradient accumulation. (PR #713)
Fix Blitter and module name issues in the presence of multiple Blitters. (PR #877, PR #878)
Fix blit function to use destination texture size for dispatch size calculations. (PR #669)
Fix KeyCode WORLD_1 / WORLD_2 in Python bindings. (PR #677)
Fix LMDBCache eviction and related cache issues. (PR #739, PR #743)
Fix ShaderCursor::set to be const. (PR #764)
Fix like functions on Tensor to correctly copy usage and other properties. (PR #880)
Fix torch bridge copy to/from buffer functions in fallback mode. (PR #794)
Accept tensors with null data pointers. (PR #675)
Fix handling crashpad reports on POSIX. (PR #734)
Update Slang to version 2026.5.2. (PR #903, PR #856, PR #813, PR #796, PR #772, PR #745)
Update slang-rhi submodule with PyTorch-style caching allocator and other improvements. (PR #887, PR #798, PR #747, PR #705)
Update nanobind. (PR #700)
Add wheels-dev workflow for dev/release wheel publishing to internal Artifactory. (PR #861, PR #849)
Fix Linux wheel builds for aarch64 and missing build dependencies. (PR #852, PR #851, PR #850)
Support cross-repo CI testing from Slang PRs. (PR #780)

This version carries with it some breaking changes, please see the migration guide here for details.

Version 0.40.1 (January 7, 2026)¶

Rebuild of 0.40.0 due to failed PyPI push.

Version 0.40.0 (January 7, 2026)¶

Update to Slang version 2025.24.3 with latest shader compilation improvements and bug fixes. (PR #678, PR #673)

Update slang-rhi submodule to latest version with improved stability and performance. (PR #682, PR #662, PR

#659, PR #647) - Add Windows ARM64 platform support for improved cross-platform compatibility.

(PR #567)

Introduce SGL_SLANG_VERSION CMake cache variable for better build configuration management. (PR #680)

Add float8 data type support for enhanced precision options in GPU computations. (PR #649)

Add rhi.slang module for improved hardware abstraction layer access. (PR #653)

Significant refactor of type inference system for better handling of generics and complex types. (PR #652)

Refactor cooperative vector API for improved performance and usability. (PR #645)

Add support for assigning objects with to_cursor to cursor objects for enhanced data manipulation. (PR #651)

Fix Buffer::get_element() method for proper buffer element access. (PR #661)

Fix module linking to preserve module order when making links unique. (PR #657)

Fix mouse position inclusion in button events for improved UI interaction. (PR #660)

Sort EXR channels when writing via tinyexr for consistent image output format. (PR #531)

Move vcpkg buildtrees to build directory for cleaner project organization. (PR #650)

Disable compiler warnings for cleaner build output. (PR #656)

Fix incorrect Tensor constructor API documentation in autodiff examples. (PR #628)

Version 0.39.0 (November 17, 2025)¶

Update to Slang version 2025.22.1 with latest shader compilation improvements and bug fixes. (PR #642)
Add scalar and vector select intrinsic functions for conditional value selection. (PR #641)
Add support for precompiled modules to enable faster shader loading and compilation. (PR #637)
Update to Slang version 2025.22 with CUDA 12.2 support and improved platform compatibility. (PR #640)
Add separate module cache from shader cache for improved caching and compilation performance. (PR #635)
Add test for extension cache update issue to ensure proper module extension handling. (PR #631)
Add Texture::descriptor_handle getters based on default texture views for improved bindless texture support. (PR #627)
Update RayTracingPipelineFlags with new flag values for enhanced ray tracing configuration. (PR #634)
Update slang-rhi submodule to latest version with improved stability. (PR #633)
Add GitHub release upload capability to wheels workflow for automated release artifact distribution. (PR #618)

Version 0.38.1 (November 10, 2025)¶

Update to Slang version 2025.21.2 with latest shader compilation improvements and bug fixes.
Optimize PyTorch tensor marshalling to significantly reduce CPU overhead and kernel launch latency when using PyTorch tensors with SlangPy. (PR #625)
Fix AccelerationStructureBuildDescConverter for improved ray tracing acceleration structure handling. (PR #626)
Fix asmjit usage on older x86_64 processors by improving detection and fallback paths for instruction generation. (PR #624)
Verify wheel builds before upload to PyPI to improve package quality and reliability. (PR #623)
Sign versioned .so files for improved security and deployment. (PR #621)
Update to Slang version 2025.21.1 with additional improvements. (PR #620)
Update slang-rhi submodule to latest version with improved stability. (PR #619)
Update to Slang version 2025.21 with latest shader compilation improvements and bug fixes. (PR #615)
Update slang-rhi submodule to latest version with improved stability and performance. (PR #612, PR #596, PR #592, PR #579)
Add support for new acceleration structure types for improved ray tracing capabilities. (PR #607)
Implement initial capability support system for better hardware feature detection. (PR #598)
Add bindless configuration support for more flexible resource binding. (PR #597)
Add labels to SlangPy generated kernels for improved debugging and profiling. (PR #584)
Refactor UI API for better usability and consistency. (PR #591)
Add support for macOS file dialogs in UI components. (PR #568)
Replace BS::thread_pool with nanothread for improved threading performance. (PR #564)
Add ability to control per-thread printing for better debugging in multi-threaded scenarios. (PR #587)
Add handling of YA bitmaps (found in vMaterials) by extending support to RGBA format. (PR #588)
Update SlangPy for library rename and versioning improvements. (PR #606)
Fix texture subresource handling when pitches are not provided. (PR #586)
Fix blit functionality and improve reliability. (PR #593, PR #583)
Remove obsolete Slang math code for cleaner codebase. (PR #602)
Add setuptools to requirements for improved build compatibility. (PR #601)
Enable Linux aarch64 pip packaging support. (PR #549)
Improve test infrastructure with performance labels and PyTorch version locking. (PR #613, PR #611, PR #605)
Fix Slang compiler DLL copying for improved deployment. (PR #609)
Cleanup pathtracer example and improve code formatting standards. (PR #590, PR #589)

Version 0.38.0 (November 3, 2025)¶

Yanked due to twine check failures.

Version 0.37.0 (October 15, 2025)¶

Update to Slang version 2025.19 with latest shader compilation improvements and bug fixes. (PR #572, PR #560)
Update slang-rhi submodule to latest version with improved stability and bug fixes. (PR #569, PR #550, PR #541)
Add persistent shader cache implementation based on LMDB for improved compilation performance and caching across sessions. (PR #561, PR #555)
Implement string printing support in shaders for improved debugging capabilities. (PR #566)
Add support for calling interface parameters with implementing types. (PR #562)
Add nanothread library and improve threading support. (PR #563)
Fix import determinism to ensure consistent code generation for shader cache compatibility. (PR #565)
Fix texture loader for CUDA and improve platform compatibility. (PR #545, PR #552)
Fix compute blit functionality and various bug fixes. (PR #503, PR #546, PR #554, PR #553)

Version 0.36.0 (September 30, 2025)¶

Update to Slang version 2025.18 with latest shader compilation improvements and bug fixes.
Update slang-rhi submodule to latest version with improved dependency handling. (PR #533)

Version 0.35.0 (September 18, 2025)¶

Add initial support for ray tracing pipelines, enabling hardware-accelerated ray tracing workflows. (PR #502)
Update to latest Slang version (2025.17) with improved shader compilation and platform support. (PR #507)
Add helper function to create homogeneous 4x4 transformation matrices from 3x4 matrices. (PR #506)
Add new load_from_file and load_from_numpy functions for improved data loading workflows. (PR #513)
Fix hot reload functionality for built-in reflection data to ensure proper shader recompilation. (PR #514)
Fix memory stream loading issues and improve data loading reliability. (PR #513)
Rename getter methods throughout the API to follow consistent coding conventions. (PR #505)

Version 0.34.0 (September 9, 2025)¶

Add Device.report_heaps() method to query internal memory heap status and allocation information.
Update to latest Slang version (2025.16.0) with improved CUDA and Metal support. (PR ```#493 <https://github.com/shader-slang/slangpy/pull/493>```__)
Add GPU clock locking for consistent benchmark results and implement trimmed mean calculation for more accurate performance measurements. (PR ```#484 <https://github.com/shader-slang/slangpy/pull/484>```__, PR ```#480 <https://github.com/shader-slang/slangpy/pull/480>```__, PR ```#472 <https://github.com/shader-slang/slangpy/pull/472>```__)
Support passing call data as entry point parameters on CUDA for improved performance. (PR ```#481 <https://github.com/shader-slang/slangpy/pull/481>```__)
Fix multiple memory leaks related to Python object references and improve resource cleanup. (PR ```#488 <https://github.com/shader-slang/slangpy/pull/488>```__)
Add benchmark comparison and delta reporting functionality with GPU information in reports. (PR ```#471 <https://github.com/shader-slang/slangpy/pull/471>```__, PR ```#456 <https://github.com/shader-slang/slangpy/pull/456>```__)
Rename `command_buffer` to `command_encoder` for API consistency. (PR ```#487 <https://github.com/shader-slang/slangpy/pull/487>```__)
Add `PassEncoder::write_timestamp` and timestamp support in `ComputeKernel::dispatch`. (PR ```#473 <https://github.com/shader-slang/slangpy/pull/473>```__)
Optimize `write_from_numpy` performance with faster copy options. (PR ```#455 <https://github.com/shader-slang/slangpy/pull/455>```__)
Fix PyTorch examples and improve integration. (PR ```#459 <https://github.com/shader-slang/slangpy/pull/459>```__)
Add support for platform-specific test isolation via environment variables. (PR ```#478 <https://github.com/shader-slang/slangpy/pull/478>```__)
Fix module linking for layout when using `link` modules. (PR ```#449 <https://github.com/shader-slang/slangpy/pull/449>```__)
Add string conversion functions for slangpy types and improve debugging capabilities. (PR ```#463 <https://github.com/shader-slang/slangpy/pull/463>```__, PR ```#464 <https://github.com/shader-slang/slangpy/pull/464>```__)

Version 0.33.1 (August 25, 2025)¶

Include the missing Slang binary file into the package. (PR #445)
Introduce benchmark plugin and testing infrastructure with MongoDB integration for automated performance tracking. (PR #452)
Add support for bindless storage buffers in GPU abstraction layer. (PR #421).
Fix copy_from_torch() for CUDA devices and resolve PyTorch integration issues. (PR #391).
Introduce unified slangpy.testing module consolidating all testing utilities and pytest plugin system. (PR #448).
Force release all slang-rhi resources during shutdown to prevent memory leaks and segfaults on Linux. (PR #426).
Rename DeviceResource to DeviceChild for consistency with slang-rhi. (PR #425).
Enable more tests across platforms: Linux, CUDA, and Metal support improvements. (PR #429).
Fix race condition in hot reload test and improve shader change detection. (PR #433).
Force unroll small fixed size loops and globally disable warning 30856 for better compilation. (PR #437).

Version 0.33.0 (August 12, 2025)¶

Update to slang version 2025.14.3. (PR #409).
Fix tensor alignment issue when copying data to GPU tensors with vector element types. Metal platform now handles vector alignment correctly to match other platforms. (PR #418).
Update samples. (PR #413).

Version 0.32.0 (August 8, 2025)¶

Update to slang version 2025.14.
Improve CUDA support.
Improve Metal support.
Improve PyTorch support. (PR #362).
Add support for pointers. (PR #323, PR #326).
Add SGL_SLANG_DEBUG_INFO cmake variable to enable downloading Slang debug info (enabled by default). (PR #296).
Add sgl::CommandEncoder::generate_mips() (slangpy.CommandEncoder.generate_mips()) to generate mipmaps for textures. (PR #293).
Add optional _append_to argument to slangpy call functions to append commands to an existing command encoder. (PR #287).
Allow creating Bitmap from non-contiguous arrays. (PR #282).

Version 0.31.0 (June 5, 2025)¶

Update to slang version 2025.10.1.
Add support for vectorizing against Python lists.
Make NDBuffer and Tensor empty / zeros APIs consistent.
Added load_from_image for NDBuffer and Tensor.
Fix typings for float2x3, float3x2, float4x2 and float4x3.

Version 0.30.0 (May 27, 2025)¶

Update slang-rhi to latest version. Improve CUDA error reporting. Improve debug marker support and add WinPixEventRuntime. Fix resource lifetime tracking for entry point arguments. (PR #236).

Version 0.29.0 (May 22, 2025)¶

Update slang-rhi to latest version. Make enum infos constexpr. (PR #234).
Fix sgl::Feature (slangpy.Feature) to include missing value. (PR #233).
Fix registered matrix types in PYTHON_TYPES. (PR #232).

Version 0.28.0 (May 21, 2025)¶

Load PyTorch module lazily to avoid overhead when PyTorch is not used. (PR #184).
Improve warning when tev image viewer is not running. (PR #216).
Report correct LUID in sgl::DeviceInfo::adapter_luid (slangpy.DeviceInfo.adapter_luid). (PR #215).

Version 0.27.0 (May 9, 2025)¶

Package and distribute pytest tests. Fix deploying .pyi files in wheels + other minor fixes. (PR #197).
Introduce basic support for bindless textures and samplers. Currently only supported on D3D12. Add sgl::Feature::bindless (slangpy.Feature.bindless) to detect bindless support. Add sgl::DescriptorHandle (slangpy.DescriptorHandle) to represent bindless descriptor handles. Add sgl::Sampler::descriptor_handle() (slangpy.Sampler.descriptor_handle) to get the descriptor handle for a sampler. Add sgl::Texture::descriptor_handle_ro() (slangpy.Texture.descriptor_handle_ro) to get the read-only descriptor handle for a texture. Add sgl::Texture::descriptor_handle_rw() (slangpy.Texture.descriptor_handle_rw) to get the read-write descriptor handle for a texture. (PR #196).
Rename sgl::Struct to sgl::DataStruct to match slangpy.DataStruct. Rename sgl::StructConverter to sgl::DataStructConverter and slangpy.StructConverter to slangpy.DataStructConverter. (PR #185).

Version 0.26.0¶

Port samples to use new combined SlangPy/SGL API
CUDA and Metal fixes
Initial deployment of wheels for macOS

Version 0.25.0¶

Fix deploying slangpy shader files

Version 0.24.0¶

Merge SGL (https://github.com/shader-slang/sgl) into SlangPy.

Version 0.23.0¶

Require SGL v0.15.0
Refactor of NDBuffer and Tensor to share some underlying type
NDBuffer and Tensor support indexing

Version 0.22.0¶

Requre new SGL v0.14.0 with switch to Slang-RHI

Version 0.21.1¶

Fix to numpy version requirement
Fixes to examples
Add neural network example
Require SGL v0.13.1

Version 0.21.0¶

Full Jupyter notebook support
Lots of fixes for edge-case hot reload crashes
Significantly more robust wang hash and rand float generators
Direct return of structs from scalar calls
Add diff splatting sample
Fix for rare issue involving lookup order of generic functions vs generic types
Require SGL v0.13.0

Version 0.20.1¶

Fix scalar wang-hash arg types

Version 0.20.0¶

Add SDF example
Transpose vector coordinates

Version 0.19.5¶

Documentation for generators
Extra fixes for grid

Version 0.19.4¶

Fix grid issue

Version 0.19.3¶

Update SGL -> 0.12.4
Significant improvements to generator types
Support textures as output type

Version 0.19.2¶

Update SGL -> 0.12.3
Better error messages during generation
Fix corrupt error tables
Restore detailed error information during dispatch

Version 0.19.1¶

Update SGL -> 0.12.2
Fix major issue with texture transposes

Version 0.19.0¶

Add experimental grid type

Version 0.18.2¶

Update SGL -> 0.12.1
Rename from_numpy to buffer_from_numpy

Version 0.18.1¶

Fix Python 3.9 typing

Version 0.18.0¶

Long file temp filenames fix
Temp fix for resolution of types that involve generics in multiple files
Support passing 1D NDBuffer to structured buffer
Fix native buffer not being passed to bindings
Missing slang field check
Avoid synthesizing store methods for none-written nested types

Version 0.17.0¶

Update to latest nv-sgl with CoopVec support
Native tensor implementation
Linux crash fix

Version 0.16.0¶

Native texture and structured buffer implementations
Native function dispatches
Lots of bug fixes

Version 0.15.2¶

Correctly package slang files in wheel

Version 0.15.0¶

Native buffer takes full reflection layout
Add uniforms + cursor api to native buffer
Update required version of nv-sgl to 0.9.0

Version 0.14.0¶

Update required version of nv-sgl to 0.8.0
Substantial native + python optimizations

Version 0.13.0¶

Update required version of nv-sgl to 0.7.0
Native SlangPy backend re-enabled
Conversion of NDBuffer to native code
PyTorch integration refactor

Version 0.12.0¶

Update required version of nv-sgl to 0.6.2
Re-enable broken Vulkan tests

Version 0.12.0¶

Update required version of nv-sgl to 0.6.1

Version 0.10.0¶

Initial test release