PyTorch

Building on the previous auto-diff example, the switch to PyTorch and its auto-grad capabilities is straightforward.

Initialization

To use SlangPy with PyTorch, you first need to create a device configured for PyTorch integration:

import slangpy as spy
import torch

# Create a device configured for PyTorch integration
# CUDA backend is recommended for best performance
device = spy.create_torch_device(type=spy.DeviceType.cuda)

# Load module using the standard Module type
module = spy.Module.load_from_file(device, "example.slang")

SlangPy automatically detects when PyTorch tensors are used and integrates them into PyTorch’s auto-grad graph. No special module types are needed - you can use the standard spy.Module type as documented in First Functions.

Using the slangpy-torch extension

Whilst SlangPy can integrate with PyTorch out-of-the-box, it does not compile against libtorch, and thus by default has to go via the Python API for any interaction with PyTorch tensor operations. To substantially improve performance, users should install the slangpy-torch pip package, which provides native torch integration to SlangPy. This extension provides fast (~28ns) tensor metadata access from native code, compared to ~350ns when going through the Python API.

The package is built locally against your installed version of PyTorch to ensure ABI compatibility, so C++ build tools are required.

Prerequisites

  • Python 3.9+

  • PyTorch installed

Windows

Install Visual Studio 2019 or 2022 with the “Desktop development with C++” workload. This provides the MSVC compiler required to build the extension.

Linux

Install the build-essential package:

# Ubuntu/Debian
sudo apt-get install build-essential

Installation

The extension must be installed with --no-build-isolation to ensure ABI compatibility with your installed PyTorch version:

pip install wheels
pip install slangpy-torch --no-build-isolation

Note

The --no-build-isolation flag is critical. Without it, pip may use a different PyTorch version during the build process, leading to ABI incompatibilities and crashes.

Verifying Installation

To verify the extension is installed correctly:

import torch  # Must import torch first
import slangpy_torch
print(slangpy_torch.get_api_ptr())  # Should print a non-zero integer

If you see a non-zero integer, the extension is working correctly and SlangPy will automatically use it for improved PyTorch tensor performance.

Troubleshooting

  • “torch not found”: Ensure PyTorch is installed first with pip install torch

  • “torch still not found”: Ensure you are running with --no-build-isolation

  • buildwheels not found: Ensure you have the wheels package installed with pip install wheels

  • Windows can not find compiler / ninja: Ensure you have the Visual Studio (or build tools) installed, and are running from a visual studio Developer Tools command prompt or have the compiler in your PATH.

Creating a tensor

Now, rather than use a SlangPy Tensor, we create a torch.Tensor to store the inputs:

# Create a tensor
x = torch.tensor([1, 2, 3, 4], dtype=torch.float32, device='cuda', requires_grad=True)

Note:

  • We set requires_grad=True to tell PyTorch to track the gradients of this tensor.

  • We set device='cuda' to ensure the tensor is on the GPU and matches our device configuration.

Running the kernel

Calling the function is unchanged from the standard SlangPy API, but calculation of gradients is now done via PyTorch:

# Evaluate the polynomial. Result will automatically be a torch tensor.
# Expecting result = 2x^2 + 8x - 1
result = module.polynomial(a=2, b=8, c=-1, x=x)
print(result)

# Run backward pass on result, using result grad == 1
# to get the gradient with respect to x
result.backward(torch.ones_like(result))
print(x.grad)

This works because SlangPy automatically detects PyTorch tensors and wraps the call to polynomial in a custom autograd function. As a result, the call to result.backward automatically invokes module.polynomial.bwds to compute gradients.

Device Backend Selection

SlangPy supports multiple backend types for PyTorch integration:

CUDA Backend (Recommended)

The CUDA backend provides the best performance by directly sharing the CUDA context with PyTorch:

device = spy.create_torch_device(type=spy.DeviceType.cuda)

This approach avoids expensive context switching and memory copies, making it ideal for performance-critical applications.

Graphics Backends (D3D12, Vulkan)

For applications that need access to graphics features (such as rasterization), you can use D3D12 or Vulkan backends:

# D3D12 backend (Windows only)
device = spy.create_torch_device(type=spy.DeviceType.d3d12)

# Vulkan backend (Cross-platform)
device = spy.create_torch_device(type=spy.DeviceType.vulkan)

These backends use CUDA interop with shared memory and semaphores to synchronize between SlangPy and PyTorch. While functional, this approach has higher overhead due to hardware context switching and memory copies.

A word on performance

The choice of backend significantly impacts performance:

  • CUDA Backend: Provides the best performance for compute-focused workloads. Very simple operations may still be faster in pure PyTorch, but as functions become more complex, the benefits of SlangPy’s vectorization and GPU optimization become apparent.

  • Graphics Backends (D3D12/Vulkan): Useful when graphics features are required, but expect substantially worse performance due to context switching overhead. Consider whether the graphics features are truly necessary for your use case.

TensorView Compatibility (slangtorch)

SlangPy provides PyTorch tensor support via its own Tensor* types (Tensor<T,N>, RWTensor<T,N>, DiffTensor<T,N>, etc.). However, slangtorch uses TensorView<T> and DiffTensorView<T> for tensor interop.

SlangPy supports backward compatibility for code originally written for slangtorch, allowing torch.Tensor arguments to bind to TensorView<T> parameters in Slang functions.

Warning

TensorView is not recommended for new code. This feature exists only for backward compatibility with existing slangtorch code. For new projects, use SlangPy’s native Tensor* types which offer better cross-platform support and integration.

Note

TensorView<T> is CUDA-only. It will not work with Vulkan or D3D12 backends.

Example usage with existing slangtorch-style Slang code:

// slangtorch-style function using TensorView
void copy_tensor(TensorView<float> input, TensorView<float> output)
{
    for (uint i = 0; i < input.size(0); i++)
        output.store(i, input.load(i));
}
import slangpy as spy
import torch

device = spy.create_torch_device(type=spy.DeviceType.cuda)
module = spy.Module.load_from_file(device, "example.slang")

# torch.Tensor arguments bind directly to TensorView<T> parameters
input_tensor = torch.tensor([1.0, 2.0, 3.0], device="cuda", dtype=torch.float32)
output_tensor = torch.zeros(3, device="cuda", dtype=torch.float32)

module.copy_tensor(input_tensor, output_tensor)

Key differences between SlangPy’s Tensor<T,N> and TensorView<T>:

  • Tensor<T,N> has compile-time dimensions (N); TensorView<T> has runtime dimensions

  • Tensor<T,N> works on all backends; TensorView<T> is CUDA-only

  • For new code, prefer SlangPy’s native Tensor* types for better cross-platform support

Summary

PyTorch integration with SlangPy is seamless and automatic. This example covered:

  • Device creation using create_torch_device with support for CUDA, D3D12, and Vulkan backends

  • Automatic detection of PyTorch tensors - no special module types required

  • Use of PyTorch’s .backward() process to track an auto-grad graph and backpropagate gradients

  • Performance considerations when choosing between CUDA and graphics backends

  • TensorView compatibility for code migrating from slangtorch

The CUDA backend is recommended for best performance, while graphics backends provide access to additional GPU features at the cost of some performance overhead.