Changelog¶
SlangPy uses a semantic versioning policy for its API.
Version 0.41.0 (April 15, 2026)¶
Rewrite of Tensors and removal of NDBuffer in favour of unified Tensor type. (PR #697)
Kernel generation overhaul: Rewrote kernel generation with direct binding, entry point arguments, and removal of trampoline functions for cleaner and more efficient generated shaders. (PR #863, PR #870, PR #876, PR #879)
Move cached function call path from Python to C++ for significantly reduced per-call overhead. (PR #869)
Native PyTorch autograd integration: Full native torch autograd support with
retain_graph, proper VRAM lifecycle management, andtorch.nn.parameter.Parametercompatibility. (PR #816, PR #781, PR #921, PR #891)CUDA performance optimization: Reduced CUDA context management overhead by ~20× by removing per-call context push/pop from slang-rhi. When using PyTorch interop, the shared primary context is already set by PyTorch, so no user action is typically required. For edge cases, new APIs are exposed:
device.set_cuda_context_current()- Set context for this thread (multi-GPU, multi-threading)device.cuda_context_scope()- Context manager for temporary context switching
(PR #774)
Dispatch hot path optimizations: Eliminate heap allocations from cached dispatch, pack/unpack optimization, optimized value types, explicit shader object binding, block allocator, cached device addresses, short vector for shader object refs, and optimised uniform setting of tensors. (PR #872, PR #815, PR #814, PR #812, PR #707, PR #709, PR #708, PR #741, PR #712)
Add
DiffTensorView<T>andTensorView<T>support in slangpy with_threadcountandfloat<N>support. (PR #775, PR #818)Add
loadOnce/loadUniformtoDiffTensorfor optimized backward pass memory access. (PR #910)Support reinterpreting
torch.TensorasTensor<StructType, N>for structured GPU data. (PR #906)Add
torch.boolsupport forTensorView. (PR #898)PyTorch interop optimizations including faster numpy array detection and optimized tensor marshalling. (PR #759, PR #802)
Add
SlangSession::compose_modulesAPI for programmatic module composition. (PR #894)Add
Bitmap::resample()functions and reconstruction filters. (PR #926)Add
samplefunction toTensor. (PR #809)Support for combined texture/sampler descriptor handles. (PR #765)
Add
TextureLoader::load_textureoverloads for multiple options andformat_callbackfor texture conversion. (PR #767, PR #737)Add support for specifying sampler when creating textures and texture views (CUDA). (PR #748)
Add
enable_experimental_featuresoption toSlangCompilerOptions. (PR #771)Support generic entrypoints in the functional API. (PR #670)
Cooperative Vector improvements. (PR #699)
Complete slangpy matrix multiplication support. (PR #674)
Extend
Windowproperties for resizing and positioning. (PR #698)Add
DescriptorHandledefault constructor. (PR #897)Add write function for binding. (PR #893)
Add
[Differentiable]to getters so they satisfy differentiability constraints for interface requirements. (PR #895)Add spaceship operator (
<=>) for quaternion, matrix, and vector types. (PR #927)Add
std::hashspecializations for vector, matrix, and quaternion types. (PR #889, PR #888)Add comparator to
TypeConformances. (PR #871)Add
SGL_ENUM_FLAGS_INFOfor improved enum flag introspection. (PR #932)Expose debug options in device constructor. (PR #710)
Configure the
SPIRV_DISdownstream compiler path. (PR #701)Add Aftermath flag for GPU crash debugging on supported platforms. (PR #785)
Improve
static_vectorandshort_vectorcontainers. (PR #752)Crashpad integration for automated crash reporting. (PR #726, PR #729)
Initialize logger on first use to avoid initialization order issues. (PR #931)
Reduce logging output for cleaner runtime experience. (PR #890)
Filter unicode in source files for broader platform compatibility. (PR #930)
Wrap remaining Slang API calls with
SGL_CATCH_INTERNAL_SLANG_ERRORfor consistent error handling. (PR #857)Fix scalar
DiffPairbackward pass codegen. (PR #917)Fix
slangpy.Tensorbackward pass throughDiffTensorView. (PR #920)Fix crash and incorrect exception with null gradients. (PR #882)
Fix zero-size dispatch causing CUDA SIGABRT. (PR #905)
Fix array-of-vector return types for numpy and torch. (PR #873)
Fix array-type returns. (PR #676)
Fix type resolution for arrays of
StructuredBufferparameters. (PR #792)Fix
Texture3Dparameters failing with “invalid dimensionality 1”. (PR #754)Fix
float3alignment bug on Metal for gradient accumulation. (PR #713)Fix Blitter and module name issues in the presence of multiple Blitters. (PR #877, PR #878)
Fix blit function to use destination texture size for dispatch size calculations. (PR #669)
Fix
KeyCodeWORLD_1/WORLD_2in Python bindings. (PR #677)Fix
LMDBCacheeviction and related cache issues. (PR #739, PR #743)Fix
ShaderCursor::setto be const. (PR #764)Fix
likefunctions onTensorto correctly copy usage and other properties. (PR #880)Fix torch bridge copy to/from buffer functions in fallback mode. (PR #794)
Accept tensors with null data pointers. (PR #675)
Fix handling crashpad reports on POSIX. (PR #734)
Update Slang to version 2026.5.2. (PR #903, PR #856, PR #813, PR #796, PR #772, PR #745)
Update slang-rhi submodule with PyTorch-style caching allocator and other improvements. (PR #887, PR #798, PR #747, PR #705)
Update nanobind. (PR #700)
Add wheels-dev workflow for dev/release wheel publishing to internal Artifactory. (PR #861, PR #849)
Fix Linux wheel builds for aarch64 and missing build dependencies. (PR #852, PR #851, PR #850)
Support cross-repo CI testing from Slang PRs. (PR #780)
This version carries with it some breaking changes, please see the migration guide here for details.
Version 0.40.1 (January 7, 2026)¶
Rebuild of 0.40.0 due to failed PyPI push.
Version 0.40.0 (January 7, 2026)¶
Update to Slang version 2025.24.3 with latest shader compilation improvements and bug fixes. (PR #678, PR #673)
Update slang-rhi submodule to latest version with improved stability and performance. (PR #682, PR #662, PR
#659, PR #647) - Add Windows ARM64 platform support for improved cross-platform compatibility.
(PR #567)
Introduce SGL_SLANG_VERSION CMake cache variable for better build configuration management. (PR #680)
Add float8 data type support for enhanced precision options in GPU computations. (PR #649)
Add rhi.slang module for improved hardware abstraction layer access. (PR #653)
Significant refactor of type inference system for better handling of generics and complex types. (PR #652)
Refactor cooperative vector API for improved performance and usability. (PR #645)
Add support for assigning objects with to_cursor to cursor objects for enhanced data manipulation. (PR #651)
Fix Buffer::get_element() method for proper buffer element access. (PR #661)
Fix module linking to preserve module order when making links unique. (PR #657)
Fix mouse position inclusion in button events for improved UI interaction. (PR #660)
Sort EXR channels when writing via tinyexr for consistent image output format. (PR #531)
Move vcpkg buildtrees to build directory for cleaner project organization. (PR #650)
Disable compiler warnings for cleaner build output. (PR #656)
Fix incorrect Tensor constructor API documentation in autodiff examples. (PR #628)
Version 0.39.0 (November 17, 2025)¶
Update to Slang version 2025.22.1 with latest shader compilation improvements and bug fixes. (PR #642)
Add scalar and vector
selectintrinsic functions for conditional value selection. (PR #641)Add support for precompiled modules to enable faster shader loading and compilation. (PR #637)
Update to Slang version 2025.22 with CUDA 12.2 support and improved platform compatibility. (PR #640)
Add separate module cache from shader cache for improved caching and compilation performance. (PR #635)
Add test for extension cache update issue to ensure proper module extension handling. (PR #631)
Add
Texture::descriptor_handlegetters based on default texture views for improved bindless texture support. (PR #627)Update
RayTracingPipelineFlagswith new flag values for enhanced ray tracing configuration. (PR #634)Update slang-rhi submodule to latest version with improved stability. (PR #633)
Add GitHub release upload capability to wheels workflow for automated release artifact distribution. (PR #618)
Version 0.38.1 (November 10, 2025)¶
Update to Slang version 2025.21.2 with latest shader compilation improvements and bug fixes.
Optimize PyTorch tensor marshalling to significantly reduce CPU overhead and kernel launch latency when using PyTorch tensors with SlangPy. (PR #625)
Fix AccelerationStructureBuildDescConverter for improved ray tracing acceleration structure handling. (PR #626)
Fix asmjit usage on older x86_64 processors by improving detection and fallback paths for instruction generation. (PR #624)
Verify wheel builds before upload to PyPI to improve package quality and reliability. (PR #623)
Sign versioned .so files for improved security and deployment. (PR #621)
Update to Slang version 2025.21.1 with additional improvements. (PR #620)
Update slang-rhi submodule to latest version with improved stability. (PR #619)
Update to Slang version 2025.21 with latest shader compilation improvements and bug fixes. (PR #615)
Update slang-rhi submodule to latest version with improved stability and performance. (PR #612, PR #596, PR #592, PR #579)
Add support for new acceleration structure types for improved ray tracing capabilities. (PR #607)
Implement initial capability support system for better hardware feature detection. (PR #598)
Add bindless configuration support for more flexible resource binding. (PR #597)
Add labels to SlangPy generated kernels for improved debugging and profiling. (PR #584)
Refactor UI API for better usability and consistency. (PR #591)
Add support for macOS file dialogs in UI components. (PR #568)
Replace BS::thread_pool with nanothread for improved threading performance. (PR #564)
Add ability to control per-thread printing for better debugging in multi-threaded scenarios. (PR #587)
Add handling of YA bitmaps (found in vMaterials) by extending support to RGBA format. (PR #588)
Update SlangPy for library rename and versioning improvements. (PR #606)
Fix texture subresource handling when pitches are not provided. (PR #586)
Fix blit functionality and improve reliability. (PR #593, PR #583)
Remove obsolete Slang math code for cleaner codebase. (PR #602)
Add setuptools to requirements for improved build compatibility. (PR #601)
Enable Linux aarch64 pip packaging support. (PR #549)
Improve test infrastructure with performance labels and PyTorch version locking. (PR #613, PR #611, PR #605)
Fix Slang compiler DLL copying for improved deployment. (PR #609)
Cleanup pathtracer example and improve code formatting standards. (PR #590, PR #589)
Version 0.38.0 (November 3, 2025)¶
Yanked due to twine check failures.
Version 0.37.0 (October 15, 2025)¶
Update to Slang version 2025.19 with latest shader compilation improvements and bug fixes. (PR #572, PR #560)
Update slang-rhi submodule to latest version with improved stability and bug fixes. (PR #569, PR #550, PR #541)
Add persistent shader cache implementation based on LMDB for improved compilation performance and caching across sessions. (PR #561, PR #555)
Implement string printing support in shaders for improved debugging capabilities. (PR #566)
Add support for calling interface parameters with implementing types. (PR #562)
Add nanothread library and improve threading support. (PR #563)
Fix import determinism to ensure consistent code generation for shader cache compatibility. (PR #565)
Fix texture loader for CUDA and improve platform compatibility. (PR #545, PR #552)
Fix compute blit functionality and various bug fixes. (PR #503, PR #546, PR #554, PR #553)
Version 0.36.0 (September 30, 2025)¶
Update to Slang version 2025.18 with latest shader compilation improvements and bug fixes.
Update slang-rhi submodule to latest version with improved dependency handling. (PR #533)
Version 0.35.0 (September 18, 2025)¶
Add initial support for ray tracing pipelines, enabling hardware-accelerated ray tracing workflows. (PR #502)
Update to latest Slang version (2025.17) with improved shader compilation and platform support. (PR #507)
Add helper function to create homogeneous 4x4 transformation matrices from 3x4 matrices. (PR #506)
Add new
load_from_fileandload_from_numpyfunctions for improved data loading workflows. (PR #513)Fix hot reload functionality for built-in reflection data to ensure proper shader recompilation. (PR #514)
Fix memory stream loading issues and improve data loading reliability. (PR #513)
Rename getter methods throughout the API to follow consistent coding conventions. (PR #505)
Version 0.34.0 (September 9, 2025)¶
Add
Device.report_heaps()method to query internal memory heap status and allocation information.Update to latest Slang version (2025.16.0) with improved CUDA and Metal support. (PR ```#493 <https://github.com/shader-slang/slangpy/pull/493>```__)
Add GPU clock locking for consistent benchmark results and implement trimmed mean calculation for more accurate performance measurements. (PR ```#484 <https://github.com/shader-slang/slangpy/pull/484>```__, PR ```#480 <https://github.com/shader-slang/slangpy/pull/480>```__, PR ```#472 <https://github.com/shader-slang/slangpy/pull/472>```__)
Support passing call data as entry point parameters on CUDA for improved performance. (PR ```#481 <https://github.com/shader-slang/slangpy/pull/481>```__)
Fix multiple memory leaks related to Python object references and improve resource cleanup. (PR ```#488 <https://github.com/shader-slang/slangpy/pull/488>```__)
Add benchmark comparison and delta reporting functionality with GPU information in reports. (PR ```#471 <https://github.com/shader-slang/slangpy/pull/471>```__, PR ```#456 <https://github.com/shader-slang/slangpy/pull/456>```__)
Rename
`command_buffer`to`command_encoder`for API consistency. (PR ```#487 <https://github.com/shader-slang/slangpy/pull/487>```__)Add
`PassEncoder::write_timestamp`and timestamp support in`ComputeKernel::dispatch`. (PR ```#473 <https://github.com/shader-slang/slangpy/pull/473>```__)Optimize
`write_from_numpy`performance with faster copy options. (PR ```#455 <https://github.com/shader-slang/slangpy/pull/455>```__)Fix PyTorch examples and improve integration. (PR ```#459 <https://github.com/shader-slang/slangpy/pull/459>```__)
Add support for platform-specific test isolation via environment variables. (PR ```#478 <https://github.com/shader-slang/slangpy/pull/478>```__)
Fix module linking for layout when using
`link`modules. (PR ```#449 <https://github.com/shader-slang/slangpy/pull/449>```__)Add string conversion functions for slangpy types and improve debugging capabilities. (PR ```#463 <https://github.com/shader-slang/slangpy/pull/463>```__, PR ```#464 <https://github.com/shader-slang/slangpy/pull/464>```__)
Version 0.33.1 (August 25, 2025)¶
Include the missing Slang binary file into the package. (PR #445)
Introduce benchmark plugin and testing infrastructure with MongoDB integration for automated performance tracking. (PR #452)
Add support for bindless storage buffers in GPU abstraction layer. (PR #421).
Fix
copy_from_torch()for CUDA devices and resolve PyTorch integration issues. (PR #391).Introduce unified
slangpy.testingmodule consolidating all testing utilities and pytest plugin system. (PR #448).Force release all slang-rhi resources during shutdown to prevent memory leaks and segfaults on Linux. (PR #426).
Rename
DeviceResourcetoDeviceChildfor consistency with slang-rhi. (PR #425).Enable more tests across platforms: Linux, CUDA, and Metal support improvements. (PR #429).
Fix race condition in hot reload test and improve shader change detection. (PR #433).
Force unroll small fixed size loops and globally disable warning 30856 for better compilation. (PR #437).
Version 0.33.0 (August 12, 2025)¶
Version 0.32.0 (August 8, 2025)¶
Update to slang version 2025.14.
Improve CUDA support.
Improve Metal support.
Improve PyTorch support. (PR #362).
Add
SGL_SLANG_DEBUG_INFOcmake variable to enable downloading Slang debug info (enabled by default). (PR #296).Add
sgl::CommandEncoder::generate_mips()(slangpy.CommandEncoder.generate_mips()) to generate mipmaps for textures. (PR #293).Add optional
_append_toargument to slangpy call functions to append commands to an existing command encoder. (PR #287).Allow creating
Bitmapfrom non-contiguous arrays. (PR #282).
Version 0.31.0 (June 5, 2025)¶
Update to slang version 2025.10.1.
Add support for vectorizing against Python lists.
Make
NDBufferandTensorempty/zerosAPIs consistent.Added
load_from_imageforNDBufferandTensor.Fix typings for
float2x3,float3x2,float4x2andfloat4x3.
Version 0.30.0 (May 27, 2025)¶
Update slang-rhi to latest version. Improve CUDA error reporting. Improve debug marker support and add WinPixEventRuntime. Fix resource lifetime tracking for entry point arguments. (PR #236).
Version 0.29.0 (May 22, 2025)¶
Version 0.28.0 (May 21, 2025)¶
Version 0.27.0 (May 9, 2025)¶
Package and distribute pytest tests. Fix deploying
.pyifiles in wheels + other minor fixes. (PR #197).Introduce basic support for bindless textures and samplers. Currently only supported on D3D12. Add
sgl::Feature::bindless(slangpy.Feature.bindless) to detect bindless support. Addsgl::DescriptorHandle(slangpy.DescriptorHandle) to represent bindless descriptor handles. Addsgl::Sampler::descriptor_handle()(slangpy.Sampler.descriptor_handle) to get the descriptor handle for a sampler. Addsgl::Texture::descriptor_handle_ro()(slangpy.Texture.descriptor_handle_ro) to get the read-only descriptor handle for a texture. Addsgl::Texture::descriptor_handle_rw()(slangpy.Texture.descriptor_handle_rw) to get the read-write descriptor handle for a texture. (PR #196).Rename
sgl::Structtosgl::DataStructto matchslangpy.DataStruct. Renamesgl::StructConvertertosgl::DataStructConverterandslangpy.StructConvertertoslangpy.DataStructConverter. (PR #185).
Version 0.26.0¶
Port samples to use new combined SlangPy/SGL API
CUDA and Metal fixes
Initial deployment of wheels for macOS
Version 0.25.0¶
Fix deploying slangpy shader files
Version 0.24.0¶
Merge SGL (https://github.com/shader-slang/sgl) into SlangPy.
Version 0.23.0¶
Require SGL v0.15.0
Refactor of NDBuffer and Tensor to share some underlying type
NDBuffer and Tensor support indexing
Version 0.22.0¶
Requre new SGL v0.14.0 with switch to Slang-RHI
Version 0.21.1¶
Fix to numpy version requirement
Fixes to examples
Add neural network example
Require SGL v0.13.1
Version 0.21.0¶
Full Jupyter notebook support
Lots of fixes for edge-case hot reload crashes
Significantly more robust wang hash and rand float generators
Direct return of structs from scalar calls
Add diff splatting sample
Fix for rare issue involving lookup order of generic functions vs generic types
Require SGL v0.13.0
Version 0.20.1¶
Fix scalar wang-hash arg types
Version 0.20.0¶
Add SDF example
Transpose vector coordinates
Version 0.19.5¶
Documentation for generators
Extra fixes for grid
Version 0.19.4¶
Fix grid issue
Version 0.19.3¶
Update SGL -> 0.12.4
Significant improvements to generator types
Support textures as output type
Version 0.19.2¶
Update SGL -> 0.12.3
Better error messages during generation
Fix corrupt error tables
Restore detailed error information during dispatch
Version 0.19.1¶
Update SGL -> 0.12.2
Fix major issue with texture transposes
Version 0.19.0¶
Add experimental grid type
Version 0.18.2¶
Update SGL -> 0.12.1
Rename from_numpy to buffer_from_numpy
Version 0.18.1¶
Fix Python 3.9 typing
Version 0.18.0¶
Long file temp filenames fix
Temp fix for resolution of types that involve generics in multiple files
Support passing 1D NDBuffer to structured buffer
Fix native buffer not being passed to bindings
Missing slang field check
Avoid synthesizing store methods for none-written nested types
Version 0.17.0¶
Update to latest nv-sgl with CoopVec support
Native tensor implementation
Linux crash fix
Version 0.16.0¶
Native texture and structured buffer implementations
Native function dispatches
Lots of bug fixes
Version 0.15.2¶
Correctly package slang files in wheel
Version 0.15.0¶
Native buffer takes full reflection layout
Add uniforms + cursor api to native buffer
Update required version of nv-sgl to 0.9.0
Version 0.14.0¶
Update required version of nv-sgl to 0.8.0
Substantial native + python optimizations
Version 0.13.0¶
Update required version of nv-sgl to 0.7.0
Native SlangPy backend re-enabled
Conversion of NDBuffer to native code
PyTorch integration refactor
Version 0.12.0¶
Update required version of nv-sgl to 0.6.2
Re-enable broken Vulkan tests
Version 0.12.0¶
Update required version of nv-sgl to 0.6.1
Version 0.10.0¶
Initial test release