Changelog

SlangPy uses a semantic versioning policy for its API.

Version 0.41.0 (April 15, 2026)

  • Rewrite of Tensors and removal of NDBuffer in favour of unified Tensor type. (PR #697)

  • Kernel generation overhaul: Rewrote kernel generation with direct binding, entry point arguments, and removal of trampoline functions for cleaner and more efficient generated shaders. (PR #863, PR #870, PR #876, PR #879)

  • Move cached function call path from Python to C++ for significantly reduced per-call overhead. (PR #869)

  • Native PyTorch autograd integration: Full native torch autograd support with retain_graph, proper VRAM lifecycle management, and torch.nn.parameter.Parameter compatibility. (PR #816, PR #781, PR #921, PR #891)

  • CUDA performance optimization: Reduced CUDA context management overhead by ~20× by removing per-call context push/pop from slang-rhi. When using PyTorch interop, the shared primary context is already set by PyTorch, so no user action is typically required. For edge cases, new APIs are exposed:

    • device.set_cuda_context_current() - Set context for this thread (multi-GPU, multi-threading)

    • device.cuda_context_scope() - Context manager for temporary context switching

    (PR #774)

  • Dispatch hot path optimizations: Eliminate heap allocations from cached dispatch, pack/unpack optimization, optimized value types, explicit shader object binding, block allocator, cached device addresses, short vector for shader object refs, and optimised uniform setting of tensors. (PR #872, PR #815, PR #814, PR #812, PR #707, PR #709, PR #708, PR #741, PR #712)

  • Add DiffTensorView<T> and TensorView<T> support in slangpy with _threadcount and float<N> support. (PR #775, PR #818)

  • Add loadOnce / loadUniform to DiffTensor for optimized backward pass memory access. (PR #910)

  • Support reinterpreting torch.Tensor as Tensor<StructType, N> for structured GPU data. (PR #906)

  • Add torch.bool support for TensorView. (PR #898)

  • PyTorch interop optimizations including faster numpy array detection and optimized tensor marshalling. (PR #759, PR #802)

  • Add SlangSession::compose_modules API for programmatic module composition. (PR #894)

  • Add Bitmap::resample() functions and reconstruction filters. (PR #926)

  • Add sample function to Tensor. (PR #809)

  • Support for combined texture/sampler descriptor handles. (PR #765)

  • Add TextureLoader::load_texture overloads for multiple options and format_callback for texture conversion. (PR #767, PR #737)

  • Add support for specifying sampler when creating textures and texture views (CUDA). (PR #748)

  • Add enable_experimental_features option to SlangCompilerOptions. (PR #771)

  • Support generic entrypoints in the functional API. (PR #670)

  • Cooperative Vector improvements. (PR #699)

  • Complete slangpy matrix multiplication support. (PR #674)

  • Extend Window properties for resizing and positioning. (PR #698)

  • Add DescriptorHandle default constructor. (PR #897)

  • Add write function for binding. (PR #893)

  • Add [Differentiable] to getters so they satisfy differentiability constraints for interface requirements. (PR #895)

  • Add spaceship operator (<=>) for quaternion, matrix, and vector types. (PR #927)

  • Add std::hash specializations for vector, matrix, and quaternion types. (PR #889, PR #888)

  • Add comparator to TypeConformances. (PR #871)

  • Add SGL_ENUM_FLAGS_INFO for improved enum flag introspection. (PR #932)

  • Expose debug options in device constructor. (PR #710)

  • Configure the SPIRV_DIS downstream compiler path. (PR #701)

  • Add Aftermath flag for GPU crash debugging on supported platforms. (PR #785)

  • Improve static_vector and short_vector containers. (PR #752)

  • Crashpad integration for automated crash reporting. (PR #726, PR #729)

  • Initialize logger on first use to avoid initialization order issues. (PR #931)

  • Reduce logging output for cleaner runtime experience. (PR #890)

  • Filter unicode in source files for broader platform compatibility. (PR #930)

  • Wrap remaining Slang API calls with SGL_CATCH_INTERNAL_SLANG_ERROR for consistent error handling. (PR #857)

  • Fix scalar DiffPair backward pass codegen. (PR #917)

  • Fix slangpy.Tensor backward pass through DiffTensorView. (PR #920)

  • Fix crash and incorrect exception with null gradients. (PR #882)

  • Fix zero-size dispatch causing CUDA SIGABRT. (PR #905)

  • Fix array-of-vector return types for numpy and torch. (PR #873)

  • Fix array-type returns. (PR #676)

  • Fix type resolution for arrays of StructuredBuffer parameters. (PR #792)

  • Fix Texture3D parameters failing with “invalid dimensionality 1”. (PR #754)

  • Fix float3 alignment bug on Metal for gradient accumulation. (PR #713)

  • Fix Blitter and module name issues in the presence of multiple Blitters. (PR #877, PR #878)

  • Fix blit function to use destination texture size for dispatch size calculations. (PR #669)

  • Fix KeyCode WORLD_1 / WORLD_2 in Python bindings. (PR #677)

  • Fix LMDBCache eviction and related cache issues. (PR #739, PR #743)

  • Fix ShaderCursor::set to be const. (PR #764)

  • Fix like functions on Tensor to correctly copy usage and other properties. (PR #880)

  • Fix torch bridge copy to/from buffer functions in fallback mode. (PR #794)

  • Accept tensors with null data pointers. (PR #675)

  • Fix handling crashpad reports on POSIX. (PR #734)

  • Update Slang to version 2026.5.2. (PR #903, PR #856, PR #813, PR #796, PR #772, PR #745)

  • Update slang-rhi submodule with PyTorch-style caching allocator and other improvements. (PR #887, PR #798, PR #747, PR #705)

  • Update nanobind. (PR #700)

  • Add wheels-dev workflow for dev/release wheel publishing to internal Artifactory. (PR #861, PR #849)

  • Fix Linux wheel builds for aarch64 and missing build dependencies. (PR #852, PR #851, PR #850)

  • Support cross-repo CI testing from Slang PRs. (PR #780)

This version carries with it some breaking changes, please see the migration guide here for details.

Version 0.40.1 (January 7, 2026)

  • Rebuild of 0.40.0 due to failed PyPI push.

Version 0.40.0 (January 7, 2026)

  • Update to Slang version 2025.24.3 with latest shader compilation improvements and bug fixes. (PR #678, PR #673)

  • Update slang-rhi submodule to latest version with improved stability and performance. (PR #682, PR #662, PR

#659, PR #647) - Add Windows ARM64 platform support for improved cross-platform compatibility.

(PR #567)

  • Introduce SGL_SLANG_VERSION CMake cache variable for better build configuration management. (PR #680)

  • Add float8 data type support for enhanced precision options in GPU computations. (PR #649)

  • Add rhi.slang module for improved hardware abstraction layer access. (PR #653)

  • Significant refactor of type inference system for better handling of generics and complex types. (PR #652)

  • Refactor cooperative vector API for improved performance and usability. (PR #645)

  • Add support for assigning objects with to_cursor to cursor objects for enhanced data manipulation. (PR #651)

  • Fix Buffer::get_element() method for proper buffer element access. (PR #661)

  • Fix module linking to preserve module order when making links unique. (PR #657)

  • Fix mouse position inclusion in button events for improved UI interaction. (PR #660)

  • Sort EXR channels when writing via tinyexr for consistent image output format. (PR #531)

  • Move vcpkg buildtrees to build directory for cleaner project organization. (PR #650)

  • Disable compiler warnings for cleaner build output. (PR #656)

  • Fix incorrect Tensor constructor API documentation in autodiff examples. (PR #628)

Version 0.39.0 (November 17, 2025)

  • Update to Slang version 2025.22.1 with latest shader compilation improvements and bug fixes. (PR #642)

  • Add scalar and vector select intrinsic functions for conditional value selection. (PR #641)

  • Add support for precompiled modules to enable faster shader loading and compilation. (PR #637)

  • Update to Slang version 2025.22 with CUDA 12.2 support and improved platform compatibility. (PR #640)

  • Add separate module cache from shader cache for improved caching and compilation performance. (PR #635)

  • Add test for extension cache update issue to ensure proper module extension handling. (PR #631)

  • Add Texture::descriptor_handle getters based on default texture views for improved bindless texture support. (PR #627)

  • Update RayTracingPipelineFlags with new flag values for enhanced ray tracing configuration. (PR #634)

  • Update slang-rhi submodule to latest version with improved stability. (PR #633)

  • Add GitHub release upload capability to wheels workflow for automated release artifact distribution. (PR #618)

Version 0.38.1 (November 10, 2025)

  • Update to Slang version 2025.21.2 with latest shader compilation improvements and bug fixes.

  • Optimize PyTorch tensor marshalling to significantly reduce CPU overhead and kernel launch latency when using PyTorch tensors with SlangPy. (PR #625)

  • Fix AccelerationStructureBuildDescConverter for improved ray tracing acceleration structure handling. (PR #626)

  • Fix asmjit usage on older x86_64 processors by improving detection and fallback paths for instruction generation. (PR #624)

  • Verify wheel builds before upload to PyPI to improve package quality and reliability. (PR #623)

  • Sign versioned .so files for improved security and deployment. (PR #621)

  • Update to Slang version 2025.21.1 with additional improvements. (PR #620)

  • Update slang-rhi submodule to latest version with improved stability. (PR #619)

  • Update to Slang version 2025.21 with latest shader compilation improvements and bug fixes. (PR #615)

  • Update slang-rhi submodule to latest version with improved stability and performance. (PR #612, PR #596, PR #592, PR #579)

  • Add support for new acceleration structure types for improved ray tracing capabilities. (PR #607)

  • Implement initial capability support system for better hardware feature detection. (PR #598)

  • Add bindless configuration support for more flexible resource binding. (PR #597)

  • Add labels to SlangPy generated kernels for improved debugging and profiling. (PR #584)

  • Refactor UI API for better usability and consistency. (PR #591)

  • Add support for macOS file dialogs in UI components. (PR #568)

  • Replace BS::thread_pool with nanothread for improved threading performance. (PR #564)

  • Add ability to control per-thread printing for better debugging in multi-threaded scenarios. (PR #587)

  • Add handling of YA bitmaps (found in vMaterials) by extending support to RGBA format. (PR #588)

  • Update SlangPy for library rename and versioning improvements. (PR #606)

  • Fix texture subresource handling when pitches are not provided. (PR #586)

  • Fix blit functionality and improve reliability. (PR #593, PR #583)

  • Remove obsolete Slang math code for cleaner codebase. (PR #602)

  • Add setuptools to requirements for improved build compatibility. (PR #601)

  • Enable Linux aarch64 pip packaging support. (PR #549)

  • Improve test infrastructure with performance labels and PyTorch version locking. (PR #613, PR #611, PR #605)

  • Fix Slang compiler DLL copying for improved deployment. (PR #609)

  • Cleanup pathtracer example and improve code formatting standards. (PR #590, PR #589)

Version 0.38.0 (November 3, 2025)

  • Yanked due to twine check failures.

Version 0.37.0 (October 15, 2025)

  • Update to Slang version 2025.19 with latest shader compilation improvements and bug fixes. (PR #572, PR #560)

  • Update slang-rhi submodule to latest version with improved stability and bug fixes. (PR #569, PR #550, PR #541)

  • Add persistent shader cache implementation based on LMDB for improved compilation performance and caching across sessions. (PR #561, PR #555)

  • Implement string printing support in shaders for improved debugging capabilities. (PR #566)

  • Add support for calling interface parameters with implementing types. (PR #562)

  • Add nanothread library and improve threading support. (PR #563)

  • Fix import determinism to ensure consistent code generation for shader cache compatibility. (PR #565)

  • Fix texture loader for CUDA and improve platform compatibility. (PR #545, PR #552)

  • Fix compute blit functionality and various bug fixes. (PR #503, PR #546, PR #554, PR #553)

Version 0.36.0 (September 30, 2025)

  • Update to Slang version 2025.18 with latest shader compilation improvements and bug fixes.

  • Update slang-rhi submodule to latest version with improved dependency handling. (PR #533)

Version 0.35.0 (September 18, 2025)

  • Add initial support for ray tracing pipelines, enabling hardware-accelerated ray tracing workflows. (PR #502)

  • Update to latest Slang version (2025.17) with improved shader compilation and platform support. (PR #507)

  • Add helper function to create homogeneous 4x4 transformation matrices from 3x4 matrices. (PR #506)

  • Add new load_from_file and load_from_numpy functions for improved data loading workflows. (PR #513)

  • Fix hot reload functionality for built-in reflection data to ensure proper shader recompilation. (PR #514)

  • Fix memory stream loading issues and improve data loading reliability. (PR #513)

  • Rename getter methods throughout the API to follow consistent coding conventions. (PR #505)

Version 0.34.0 (September 9, 2025)

Version 0.33.1 (August 25, 2025)

  • Include the missing Slang binary file into the package. (PR #445)

  • Introduce benchmark plugin and testing infrastructure with MongoDB integration for automated performance tracking. (PR #452)

  • Add support for bindless storage buffers in GPU abstraction layer. (PR #421).

  • Fix copy_from_torch() for CUDA devices and resolve PyTorch integration issues. (PR #391).

  • Introduce unified slangpy.testing module consolidating all testing utilities and pytest plugin system. (PR #448).

  • Force release all slang-rhi resources during shutdown to prevent memory leaks and segfaults on Linux. (PR #426).

  • Rename DeviceResource to DeviceChild for consistency with slang-rhi. (PR #425).

  • Enable more tests across platforms: Linux, CUDA, and Metal support improvements. (PR #429).

  • Fix race condition in hot reload test and improve shader change detection. (PR #433).

  • Force unroll small fixed size loops and globally disable warning 30856 for better compilation. (PR #437).

Version 0.33.0 (August 12, 2025)

  • Update to slang version 2025.14.3. (PR #409).

  • Fix tensor alignment issue when copying data to GPU tensors with vector element types. Metal platform now handles vector alignment correctly to match other platforms. (PR #418).

  • Update samples. (PR #413).

Version 0.32.0 (August 8, 2025)

  • Update to slang version 2025.14.

  • Improve CUDA support.

  • Improve Metal support.

  • Improve PyTorch support. (PR #362).

  • Add support for pointers. (PR #323, PR #326).

  • Add SGL_SLANG_DEBUG_INFO cmake variable to enable downloading Slang debug info (enabled by default). (PR #296).

  • Add sgl::CommandEncoder::generate_mips() (slangpy.CommandEncoder.generate_mips()) to generate mipmaps for textures. (PR #293).

  • Add optional _append_to argument to slangpy call functions to append commands to an existing command encoder. (PR #287).

  • Allow creating Bitmap from non-contiguous arrays. (PR #282).

Version 0.31.0 (June 5, 2025)

  • Update to slang version 2025.10.1.

  • Add support for vectorizing against Python lists.

  • Make NDBuffer and Tensor empty / zeros APIs consistent.

  • Added load_from_image for NDBuffer and Tensor.

  • Fix typings for float2x3, float3x2, float4x2 and float4x3.

Version 0.30.0 (May 27, 2025)

  • Update slang-rhi to latest version. Improve CUDA error reporting. Improve debug marker support and add WinPixEventRuntime. Fix resource lifetime tracking for entry point arguments. (PR #236).

Version 0.29.0 (May 22, 2025)

  • Update slang-rhi to latest version. Make enum infos constexpr. (PR #234).

  • Fix sgl::Feature (slangpy.Feature) to include missing value. (PR #233).

  • Fix registered matrix types in PYTHON_TYPES. (PR #232).

Version 0.28.0 (May 21, 2025)

  • Load PyTorch module lazily to avoid overhead when PyTorch is not used. (PR #184).

  • Improve warning when tev image viewer is not running. (PR #216).

  • Report correct LUID in sgl::DeviceInfo::adapter_luid (slangpy.DeviceInfo.adapter_luid). (PR #215).

Version 0.27.0 (May 9, 2025)

  • Package and distribute pytest tests. Fix deploying .pyi files in wheels + other minor fixes. (PR #197).

  • Introduce basic support for bindless textures and samplers. Currently only supported on D3D12. Add sgl::Feature::bindless (slangpy.Feature.bindless) to detect bindless support. Add sgl::DescriptorHandle (slangpy.DescriptorHandle) to represent bindless descriptor handles. Add sgl::Sampler::descriptor_handle() (slangpy.Sampler.descriptor_handle) to get the descriptor handle for a sampler. Add sgl::Texture::descriptor_handle_ro() (slangpy.Texture.descriptor_handle_ro) to get the read-only descriptor handle for a texture. Add sgl::Texture::descriptor_handle_rw() (slangpy.Texture.descriptor_handle_rw) to get the read-write descriptor handle for a texture. (PR #196).

  • Rename sgl::Struct to sgl::DataStruct to match slangpy.DataStruct. Rename sgl::StructConverter to sgl::DataStructConverter and slangpy.StructConverter to slangpy.DataStructConverter. (PR #185).

Version 0.26.0

  • Port samples to use new combined SlangPy/SGL API

  • CUDA and Metal fixes

  • Initial deployment of wheels for macOS

Version 0.25.0

  • Fix deploying slangpy shader files

Version 0.24.0

Version 0.23.0

  • Require SGL v0.15.0

  • Refactor of NDBuffer and Tensor to share some underlying type

  • NDBuffer and Tensor support indexing

Version 0.22.0

  • Requre new SGL v0.14.0 with switch to Slang-RHI

Version 0.21.1

  • Fix to numpy version requirement

  • Fixes to examples

  • Add neural network example

  • Require SGL v0.13.1

Version 0.21.0

  • Full Jupyter notebook support

  • Lots of fixes for edge-case hot reload crashes

  • Significantly more robust wang hash and rand float generators

  • Direct return of structs from scalar calls

  • Add diff splatting sample

  • Fix for rare issue involving lookup order of generic functions vs generic types

  • Require SGL v0.13.0

Version 0.20.1

  • Fix scalar wang-hash arg types

Version 0.20.0

  • Add SDF example

  • Transpose vector coordinates

Version 0.19.5

  • Documentation for generators

  • Extra fixes for grid

Version 0.19.4

  • Fix grid issue

Version 0.19.3

  • Update SGL -> 0.12.4

  • Significant improvements to generator types

  • Support textures as output type

Version 0.19.2

  • Update SGL -> 0.12.3

  • Better error messages during generation

  • Fix corrupt error tables

  • Restore detailed error information during dispatch

Version 0.19.1

  • Update SGL -> 0.12.2

  • Fix major issue with texture transposes

Version 0.19.0

  • Add experimental grid type

Version 0.18.2

  • Update SGL -> 0.12.1

  • Rename from_numpy to buffer_from_numpy

Version 0.18.1

  • Fix Python 3.9 typing

Version 0.18.0

  • Long file temp filenames fix

  • Temp fix for resolution of types that involve generics in multiple files

  • Support passing 1D NDBuffer to structured buffer

  • Fix native buffer not being passed to bindings

  • Missing slang field check

  • Avoid synthesizing store methods for none-written nested types

Version 0.17.0

  • Update to latest nv-sgl with CoopVec support

  • Native tensor implementation

  • Linux crash fix

Version 0.16.0

  • Native texture and structured buffer implementations

  • Native function dispatches

  • Lots of bug fixes

Version 0.15.2

  • Correctly package slang files in wheel

Version 0.15.0

  • Native buffer takes full reflection layout

  • Add uniforms + cursor api to native buffer

  • Update required version of nv-sgl to 0.9.0

Version 0.14.0

  • Update required version of nv-sgl to 0.8.0

  • Substantial native + python optimizations

Version 0.13.0

  • Update required version of nv-sgl to 0.7.0

  • Native SlangPy backend re-enabled

  • Conversion of NDBuffer to native code

  • PyTorch integration refactor

Version 0.12.0

  • Update required version of nv-sgl to 0.6.2

  • Re-enable broken Vulkan tests

Version 0.12.0

  • Update required version of nv-sgl to 0.6.1

Version 0.10.0

  • Initial test release