Skip to contents

This does two things:

Usage

with_detect_anomaly(code)

Arguments

code

Code that will be executed in the detect anomaly context.

Details

  • Running the forward pass with detection enabled will allow the backward pass to print the traceback of the forward operation that created the failing backward function.

  • Any backward computation that generate "nan" value will raise an error.

Warning

This mode should be enabled only for debugging as the different tests will slow down your program execution.

Examples

if (torch_is_installed()) {
x <- torch_randn(2, requires_grad = TRUE)
y <- torch_randn(1)
b <- (x^y)$sum()
y$add_(1)

try({
  b$backward()

  with_detect_anomaly({
    b$backward()
  })
})
}
#> Error in (function (self, inputs, gradient, retain_graph, create_graph)  : 
#>   one of the variables needed for gradient computation has been modified by an inplace operation: [CPUFloatType [1]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
#> Exception raised from unpack at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/saved_variable.cpp:194 (most recent call first):
#> frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 81 (0x10f6b0ca1 in libc10.dylib)
#> frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 98 (0x10f6af342 in libc10.dylib)
#> frame #2: torch::autograd::SavedVariable::unpack(std::__1::shared_ptr<torch::autograd::Node>) const + 2815 (0x130270d0f in libtorch_cpu.dylib)
#> frame #3: torch::autograd::generated::PowBackward1::apply(std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> >&&) + 168 (0x12f3c3488 in libtorch_cpu.dylib)
#> frame #4: torch::autograd::Node::operator()(std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> >&&) + 99 (0x1302416c3 in libtorch_cpu.dylib)
#> frame #5: torch::autograd::Engine::evaluate_function(std::__1::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::__1::shared_ptr<torch::autograd::ReadyQueue> const&) + 3401 (0x130237a89 in libtorch_cpu.dylib)
#> frame #6: torch::autograd::Engine::thread_main(std::__1::shared_ptr<torch::autograd::GraphTask> const&) + 954 (0x1302366da in libtorch_cpu.dylib)
#> frame #7: torch::autograd::Engine::execute_with_graph_task(std::__1::shared_ptr<torch::autograd::GraphTask> const&, std::__1::shared_ptr<torch::autograd::Node>, torch::autograd::InputBuffer&&) + 374 (0x130240646 in libtorch_cpu.dylib)
#> frame #8: torch::autograd::Engine::execute(std::__1::vector<torch::autograd::Edge, std::__1::allocator<torch::autograd::Edge> > const&, std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> > const&, bool, bool, bool, std::__1::vector<torch::autograd::Edge, std::__1::allocator<torch::autograd::Edge> > const&) + 2605 (0x13023e68d in libtorch_cpu.dylib)
#> frame #9: torch::autograd::run_backward(std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> > const&, std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> > const&, bool, bool, std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> > const&, bool, bool) + 2129 (0x130224df1 in libtorch_cpu.dylib)
#> frame #10: torch::autograd::backward(std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> > const&, std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> > const&, c10::optional<bool>, bool, std::__1::vector<at::Tensor, std::__1::allocator<at::Tensor> > const&) + 104 (0x130225408 in libtorch_cpu.dylib)
#> frame #11: torch::autograd::VariableHooks::_backward(at::Tensor const&, c10::ArrayRef<at::Tensor>, c10::optional<at::Tensor> const&, c10::optional<bool>, bool) const + 435 (0x1302769f3 in libtorch_cpu.dylib)
#> frame #12: at::Tensor::_backward(c10::ArrayRef<at::Tensor>, c10::optional<at::Tensor> const&, c10::optional<bool>, bool) const + 75 (0x12cbb7eab in libtorch_cpu.dylib)
#> frame #13: _lantern_Tensor__backward_tensor_tensorlist_tensor_bool_bool + 442 (0x1125870aa in liblantern.dylib)
#> frame #14: std::__1::__function::__func<cpp_torch_method__backward_self_Tensor_inputs_TensorList(XPtrTorchTensor, XPtrTorchTensorList, XPtrTorchOptionalTensor, XPtrTorchoptional_bool, XPtrTorchbool)::$_1, std::__1::allocator<cpp_torch_method__backward_self_Tensor_inputs_TensorList(XPtrTorchTensor, XPtrTorchTensorList, XPtrTorchOptionalTensor, XPtrTorchoptional_bool, XPtrTorchbool)::$_1>, void ()>::operator()() + 53 (0x110b0a8b5 in torchpkg.so)
#> frame #15: std::__1::packaged_task<void ()>::operator()() + 72 (0x110b08b88 in torchpkg.so)
#> frame #16: EventLoop<void>::run() + 377 (0x110b08979 in torchpkg.so)
#> frame #17: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPool<void>::ThreadPool(int)::'lambda'()> >(void*) + 45 (0x110b0876d in torchpkg.so)
#> frame #18: _pthread_start + 125 (0x7ff8125a34e1 in libsystem_pthread.dylib)
#> frame #19: thread_start + 15 (0x7ff81259ef6b in libsystem_pthread.dylib)
#>