Basic Auto-diff

One of Slang’s most powerful features is its auto-diff capabilities, documented in detail here. SlangPy carries this feature over to Python, allowing you to easily calculate the derivative of a function.

A differentiable function

Let’s start with a simple polynomial function:

[Differentiable]
float polynomial(float a, float b, float c, float x) {
    return a * x * x + b * x + c;
}

Note that it has the [Differentiable] attribute, which tells Slang to generate the backward propagation function.

The Tensor type

To store simple differentiable data, SlangPy utilizes the Tensor type. Here we’ll initialize one from the data in a numpy array and use it to evaluate a polynomial.

# Create a tensor with attached grads from a numpy array
# Note: We pass zero=True to initialize the grads to zero on allocation
x = spy.Tensor.numpy(device, np.array([1, 2, 3, 4], dtype=np.float32)).with_grads(zero=True)

# Evaluate the polynomial and ask for a tensor back
# Expecting result = 2x^2 + 8x - 1
result: spy.Tensor = module.polynomial(a=2, b=8, c=-1, x=x, _result='tensor')
print(result.to_numpy())

By specifying _result='tensor', we ask SlangPy to return the result as a Tensor. Equally, we could have pre-allocated a tensor to fill in:

result = spy.Tensor(device, element_type=module.float, shape=(4,))
module.polynomial(a=2, b=8, c=-1, x=x, _result=result)

Or we could have used the return_type modifier:

result: spy.Tensor = module.polynomial.return_type(Tensor)(a=2, b=8, c=-1, x=x)

In all cases, we end up with a result tensor that contains the evaluated polynomial.

Backward pass

Now we’ll attach gradients to the result and set them to 1, then run back propagation:

# Attach gradients to the result, and set them to 1 for the backward pass
result = result.with_grads()
result.grad.storage.copy_from_numpy(np.array([1, 1, 1, 1], dtype=np.float32))

# Call the backwards version of module.polynomial
# This will read the grads from _result, and write the grads to x
# Expecting result = 4x + 8
module.polynomial.bwds(a=2, b=8, c=-1, x=x, _result=result)
print(x.grad.to_numpy())

That’s the lot! The call to bwds generates a kernel that calls bwds_diff(polynomial) in Slang, and automatically deals with passing in/out the correct data.

It is worth noting that SlangPy currently always accumulates gradients, so you will need to ensure gradient buffers are zeroed. In the demo above, we used zero=True when creating the tensor to do so.

If you’re familiar with ML frameworks such as PyTorch, the big difference is that SlangPy is (by design) not a host side auto-grad system. It does not record an auto-grad graph, and instead requires you to explicitly call the backward function, providing the primals used in the forward call. However, SlangPy provides strong integration with PyTorch and all its auto-grad features.

Summary

Use of auto-diff in SlangPy requires: - Marking your function as differentiable - Using the Tensor type to store differentiable data - Calling the bwds function to calculate gradients

SlangPy’s tensor type currently only supports basic types for gradient accumulation due to the need for atomic accumulation. However, we intend to expand this to all struct types in the future.