PyTorch is a widely used, open source deep learning platform used for easily writing neural network layers in Python enabling a seamless workflow from research to production.

Here are the latest updates / bug fix releases.

Serious

  • Higher order gradients for CPU Convolutions have been fixed (regressed in 1.0.0 under MKL-DNN setting) #15686
  • Correct gradients for non-contiguous weights in CPU Convolutions #16301
  • Fix ReLU on CPU Integer Tensors by fixing vec256 inversions #15634
  • Fix bincount for non-contiguous Tensors #15109
  • Fix torch.norm on CPU for large Tensors #15602
  • Fix eq_ to do equality on GPU (was doing greater-equal due to a typo) (#15475)
  • Workaround a CuDNN bug that gave wrong results in certain strided convolution gradient setups
    • blacklist fft algorithms for strided dgrad (#16626)

Correctness

  • Fix cuda native loss_ctc for varying input length (#15798)
    • this avoids NaNs in variable length settings
  • C++ Frontend: Fix serialization (#15033)
    • Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn’t have any parameters, but its submodules do
  • Fix derivative for mvlgamma (#15049)
  • Fix numerical stability in log_prob for Gumbel distribution (#15878)
  • multinomial: fix detection and drawing of zero probability events (#16075)

Crashes

  • PyTorch binaries were crashing on AWS Lambda and a few other niche systems, stemming from CPUInfo handling certain warnings as errors. Updated CPUInfo with relevant fixes.
  • MKL-DNN is now statically built, to avoid conflicts with system versions
  • Allow ReadyQueue to handle empty tasks (#15791)
    • Fixes a segfault with a DataParallel + Checkpoint neural network setting
  • Avoid integer divide by zero error in index_put_ (#14984)
  • Fix for model inference crash on Win10 (#15919) (#16092)
  • Use CUDAGuard when serializing Tensors:
    • Before this change, torch.save and torch.load would initialize the CUDA context on GPU 0 if it hadn’t been initialized already, even if the serialized tensors are only on GPU 1.
  • Fix error with handling scalars and rpow, for example 1 ^^ x, where x is a PyTorch scalar (#16687)
  • Switch to CUDA implementation instead of CuDNN if batch size >= 65536 for affine_grid (#16403)
    • CuDNN crashes when batch size >= 65536
  • [Distributed] TCP init method race condition fix (#15684)
  • [Distributed] Fix a memory leak in Gloo’s CPU backend
  • [C++ Frontend] Fix LBFGS issue around using inplace ops (#16167)
  • [Hub] Fix github branch prefix v (#15552)
  • [Hub] url download bugfix for URLs served without Content-Length header

Performance

  • LibTorch binaries now ship with CuDNN enabled. Without this change, many folks saw significant perf differences while using LibTorch vs PyTorch, this should be fixed now. #14976
  • Make btriunpack work for high dimensional batches and faster than before (#15286)
  • improve performance of unique with inverse indices (#16145)
  • Re-enable OpenMP in binaries (got disabled because of a CMake refactor)

Other

  • create type hint stub files for module torch (#16089)
    • This will restore auto-complete functionality in PyCharm, VSCode etc.
  • Fix sum_to behavior with zero dimensions (#15796)
  • Match NumPy by considering NaNs to be larger than any number when sorting (#15886)
  • Fixes various error message / settings in dynamic weight GRU / LSTMs (#15766)
  • C++ Frontend: Make call operator on module holder call forward (#15831)
  • C++ Frontend: Add the normalize transform to the core library (#15891)
  • Fix bug in torch::load and unpack torch::optim::detail namespace (#15926)
  • Implements Batched upper triangular, lower triangular (#15257)
  • Add torch.roll to documentation (#14880)
  • (better errors) Add backend checks for batch norm (#15955)

JIT

  • Add better support for bools in the graph fuser (#15057)
  • Allow tracing with fork/wait (#15184)
  • improve script/no script save error (#15321)
  • Add self to Python printer reserved words (#15318)
  • Better error when torch.load-ing a JIT model (#15578)
  • fix select after chunk op (#15672)
  • Add script standard library documentation + cleanup (#14912)