PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing.

Here’s what’s new in PyTorch v1.2.0.

[JIT] New TorchScript API for PyTorch

Version 1.2 includes a new, easier-to-use API for converting nn.Modules into ScriptModules. A sample usage is:

class MyModule(torch.nn.Module):

# Construct an nn.Module instance
module = MyModule(args)

# Pass it to `torch.jit.script` to compile it into a ScriptModule.
my_torchscript_module = torch.jit.script(module)

torch.jit.script() will attempt to recursively compile the given nn.Module, including any submodules or methods called from forward(). See the migration guide for more info on what’s changed and how to migrate.

[JIT] Improved TorchScript Python language coverage for PyTorch

In 1.2, TorchScript has significantly improved its support for Python language constructs and Python’s standard library. Highlights include:

  • Early returns, breaks and continues.
  • Iterator-based constructs, like loops, zip(), and enumerate().
  • NamedTuples.
  • math and string library support.
  • Support for most Python builtin functions.

See the detailed notes below for more information.

PyTorch Expanded Onnx Export

In PyTorch 1.2, working with Microsoft, added full support to export ONNX Opset versions 7(v1.2), 8(v1.3), 9(v1.4) and 10 (v1.5). and have also enhanced the constant folding pass to support Opset 10, the latest available version of ONNX. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export. Here is a summary of the all of the major improvements:

  • Support for multiple Opsets including the ability to export dropout, slice, flip and interpolate in Opset 10.
  • Improvements to ScriptModule including support for multiple outputs, tensor factories and tuples as inputs and outputs.
  • More than a dozen additional PyTorch operators supported including the ability to export a custom operator.

Updated docs can be found here and also a refreshed tutorial using ONNXRuntime can be found here.

Tensorboard is no Longer Considered Experimental for PyTorch

Read the documentation or simply type fromtorch.utils.tensorboardimport SummaryWriter to get started!

PyTorch NN.Transformer

PyTorch include a standard nn.Transformer module, based on the paper “Attention is All You Need”. The nn.Transformer module relies entirely on an attention mechanism to draw global dependencies between input and output. The individual components of the nn.Transformer module are designed so they can be adopted independently. For example, the nn.TransformerEncoder can be used by itself, without the larger nn.Transformer. New APIs include:

  • nn.Transformer
  • nn.TransformerEncoder and nn.TransformerEncoderLayer
  • nn.TransformerDecoder and nn.TransformerDecoderLayer

See the Transformer Layers documentation for more info.

Breaking Changes for PyTorch

Comparison operations (lt (<), le (<=), gt (>), ge (>=), eq (==), ne, (!=) ) return dtype has changed from torch.uint8 to torch.bool (21113)

Version 1.1:

>>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])
tensor([1, 0, 0], dtype=torch.uint8)

Version 1.2:

>>> torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2])
tensor([True, False, False])

For most programs, devs don’t expect that any changes will need to be made as a result of this change. There are a couple of possible exceptions listed below.

PyTorch Mask Inversion

In prior versions of PyTorch, the idiomatic way to invert a mask was to call 1 - mask. This behavior is no longer supported; use the ~ or bitwise_not() operator instead.

Version 1.1:

>>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
tensor([0, 1, 1], dtype=torch.uint8)

Version 1.2:

>>> 1 - (torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported.
If you are trying to invert a mask, use the `~` or `bitwise_not()` operator instead.

>>> ~(torch.tensor([1, 2, 3]) < torch.tensor([3, 1, 2]))
tensor([False,  True,  True])

sum(Tensor) (python built-in) does not upcast dtype like torch.sum

Python’s built-in sum returns results in the same dtype as the tensor itself, so it will not return the expected result if the value of the sum cannot be represented in the dtype of the tensor.

Version 1.1:

# value can be represented in result dtype
>>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2)
tensor(3, dtype=torch.uint8)

# value can NOT be represented in result dtype
>>> sum(torch.ones((300,)) > 0)
tensor(44, dtype=torch.uint8)

# torch.sum properly upcasts result dtype
>>> torch.sum(torch.ones((300,)) > 0)

Version 1.2:

# value cannot be represented in result dtype (now torch.bool)
>>> sum(torch.tensor([1, 2, 3, 4, 5]) > 2)

# value cannot be represented in result dtype
>>> sum(torch.ones((300,)) > 0)

# torch.sum properly upcasts result dtype
>>> torch.sum(torch.ones((300,)) > 0)

TLDR: use torch.sum instead of the built-in sum. Note that the built-in sum() behavior will more closely resemble torch.sum in the next release.

Note also that masking via torch.uint8 Tensors is now deprecated, see the Deprecations section for more information.

__invert__ / ~: now calls torch.bitwise_not instead of 1 - tensor and is supported for all integral+Boolean dtypes instead of only torch.uint8. (22326)

Version 1.1:

>>> ~torch.arange(8, dtype=torch.uint8)
tensor([ 1, 0, 255, 254, 253, 252, 251, 250], dtype=torch.uint8)

Version 1.2:

>>> ~torch.arange(8, dtype=torch.uint8)
tensor([255, 254, 253, 252, 251, 250, 249, 248], dtype=torch.uint8)

torch.tensor(bool) and torch.as_tensor(bool) now infer torch.booldtype instead of torch.uint8. (19097)

Version 1.1:

>>> torch.tensor([True, False])
tensor([1, 0], dtype=torch.uint8)

Version 1.2:

>>> torch.tensor([True, False])
tensor([ True, False])

nn.BatchNorm{1,2,3}D: gamma (weight) is now initialized to all 1s rather than randomly initialized from U(0, 1). (13774)

Version 1.1:

>>> torch.nn.BatchNorm2d(5).weight
Parameter containing:
tensor([0.1635, 0.7512, 0.4130, 0.6875, 0.5496], 

Version 1.2:

>>> torch.nn.BatchNorm2d(5).weight
Parameter containing:
tensor([1., 1., 1., 1., 1.], requires_grad=True)

A number of deprecated Linear Algebra operators have been removed (22841)

Removed Use Instead
btrifact lu
btrifact_with_info lu with get_infos=True
btrisolve lu_solve
btriunpack lu_unpack
gesv solve
pstrf cholesky
potrf cholesky
potri cholesky_inverse
potrs cholesky_solve
trtrs triangular_solve

Sparse Tensors: Changing the sparsity of a Tensor through .data is no longer supported. (17072)

>>> x = torch.randn(2,3)
>>> = torch.sparse_coo_tensor((2, 3))
RuntimeError: Attempted to call `variable.set_data(tensor)`,
but `variable` and  `tensor` have incompatible tensor type.

Sparse Tensors: in-place shape modifications of Dense Tensor Constructor Arguments will no longer modify the Sparse Tensor itself (20614)

Version 1.1:

>>> i = torch.tensor([[0, 1]])
>>> v = torch.ones(2)
>>> s = torch.sparse_coo_tensor(i, v)
>>> i.resize_(1, 1)
>>> v.resize_(1)

>>> s.coalesce().indices().shape
torch.Size([1, 1])

>>> s.coalesce().values().shape

Notice indices() and values() reflect the resized tensor shapes.

Version 1.2:

>>> i = torch.tensor([[0, 1]])
>>> v = torch.ones(2)
>>> s = torch.sparse_coo_tensor(i, v)
>>> i.resize_(1, 1)
>>> v.resize_(1)

>>> s.coalesce().indices().shape
torch.Size([1, 2])

>>> s.coalesce().values().shape

Notice indices() and values() reflect the original tensor shapes.

Sparse Tensors: Accumulating dense gradients into a sparse .grad will no longer retain Python object identity. (17072)

Version 1.1:

>>> m = torch.nn.Embedding(10, 3, sparse=True)
>>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward()
>>> assert m.weight.grad.layout == torch.sparse_coo
>>> m_weight_grad_saved = m.weight.grad

# accumulate dense gradient into sparse .grad, change sparsity
>>> m.weight.sum().backward()
>>> assert m.weight.grad.layout == torch.strided
# m_weight_grad_saved still refers to the .grad of m's weight
# even though the sparsity has changed
>>> assert id(m_weight_grad_saved) == id (m.weight.grad)

Version 1.2:

>>> m = torch.nn.Embedding(10, 3, sparse=True)
>>> m(torch.tensor([[1,2,4,5],[4,3,2,9]])).sum().backward()
>>> assert m.weight.grad.layout == torch.sparse_coo
>>> m_weight_grad_saved = m.weight.grad

# accumulate dense gradient into sparse .grad, change sparsity
>>> m.weight.sum().backward()
>>> assert m.weight.grad.layout == torch.strided
# m_weight_grad_saved NO LONGER refers to the .grad of m's weight
>>> assert id(m_weight_grad_saved) == id (m.weight.grad)

nn.utils.convert_sync_batchnorm has been replaced with nn.SyncBatchNorm.convert_sync_batchnorm (18787)

Example of new usage:

>>> # Network with nn.BatchNorm layer
>>> module = torch.nn.Sequential(
>>>     torch.nn.Linear(20, 100),
>>>     torch.nn.BatchNorm1d(100)
>>> ).cuda()
>>> # creating process group (optional)
>>> process_group = torch.distributed.new_group(process_ids)
>>> sync_bn_module = torch.nn.SyncBatchNorm.convert_sync_batchnorm(module, process_group)

Error Checking: torch.addcmul and torch.lerp operators enforce stronger shape requirements on the output tensor (out= keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs.

Version 1.1:

>>> x=torch.zeros(1)
>>> torch.addcmul(x, x, torch.zeros(2,3), out=x)
tensor([[0., 0., 0.],
        [0., 0., 0.]])

Version 1.2:

>>> x=torch.zeros(1)
>>> torch.addcmul(x, x, torch.zeros(2,3), out=x)
RuntimeError: output with shape [1] doesn't match the broadcast shape [2, 3]

If you run into this error, please ensure the out parameter is of the correct output shape (post-broadcasting).

Error Checking: Improved Variable version tracking (203912282121865)

PyTorch’s autograd system uses a version tracking mechanism to ensure that Tensors that are saved for backwards computations retain their correct values when the backward pass is computed (i.e. that they haven’t been updated in-place since they were saved). See In Place Correctness Checks in the docs for more information.

PyTorch 1.2 enhanced the version tracking in a number of cases, which may flag issues that were not caught previously. There is now additional tracking through the Variable() constructor, the nn.Parameter() constructor, after setting .data, and via nn.Module._apply (internal API).

Track changes through Variable constructor:

>>> x = torch.ones(1, requires_grad=True)+1
>>> y = x*x

# do an in-place update through Variable constructor
>>> torch.autograd.Variable(x).add_(1)
>>> y.backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0 

Track changes on an nn.Parameter:

>>> x = torch.ones(1)
>>> p = torch.nn.Parameter(x)
>>> y = p * p

# do an in-place update on a saved Parameter
>>> x.add_(1)
>>> y.sum().backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0 

Track changes after setting .data:

>>> x = torch.zeros(1, requires_grad=True)+1
>>> y = x * x
>>> = torch.zeros(1, requires_grad=True)+1

>>> x.add_(1)
>>> y.backward()
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [1]], which is output 0 of AddBackward0,
is at version 1; expected version 0 instead.

[JIT] Python called from scripted modules must be @ignored

torch.jit.script now recursively compiles everything it finds in the original function, so if you had Python functions called from in your scripted function or module, you must now explicitly @ignore it. See the new API guide for more details.

Version 1.1

def my_unscriptable_python_fn():
    # weird stuff

def fn():
    # This gets inserted as a Python call, and only errors on `save()`.

Version 1.2

@torch.jit.ignore  # this needs to be added ...
def my_unscriptable_python_fn():

def fn():
    # ... or else recursive compilation will attempt to compile this call

NOTE: This is also a change to behavior of the @torch.jit.ignore decorator. In version 1.1, @ignore tells the compiler to omit compiling a function entirely, to mark Python functions that you know will not be called after export. In version 1.2 @ignore, tells the compiler to insert a call back to the Python interpreter instead of trying to compile the function.

To get the old behavior, use @torch.jit.ignore(drop_on_export=True) (@torch.jit.ignore with no arguments is equivalent to @torch.jit.ignore(drop_on_export=False)).


[JIT] optimize for ScriptModules is now a context manager

Whether optimization passes are run is now a thread-local flag. This better reflects how optimization actually happens in the JIT (i.e. it is decided at runtime, not compilation time).

Version 1.1

def fn(inputs):


Version 1.2

def fn(inputs):

with @torch.jit.optimized_execution(False):

[jit] script::Module is now a reference type

To better align with the PyTorch C++ API philosophyscript::Module and script::Method are now reference types. APIs have been updated to use script::Module instead of std::shared_ptr<script::Module>.

Version 1.1

using torch::jit::script::Module;

std::shared_ptr<Module> m = torch::jit::load("");

Version 1.2

using torch::jit::script::Module;

Module m = torch::jit::load("");

[C++ only] mean() / sum() / prod() APIs have changed slightly (21088)

Version 1.1 API:

Tensor sum(IntArrayRef dim, bool keepdim=false) const;    
Tensor sum(IntArrayRef dim, ScalarType dtype) const;

Version 1.2 API:

Tensor sum(IntArrayRef dim, bool keepdim=false,
           c10::optional<ScalarType> dtype=c10::nullopt) const;

that is, to override dtypekeepdim must now be provided.

Binary distribution and nightly changes

PyTorch has streamlined conda and wheel binary distributions, so that it is easier than ever to install the version of PyTorch appropriate for your needs. The install instructions on have been updated, but if you have tooling to download and install PyTorch, here is a detailed description of the changes made:

Wheels now have local version identifiers. Wheels that are for non-default CUDA configurations (the default CUDA version for this release is 10.0) now have local version identifiers like +cpu and +cu92. This means that, when installing, it is no longer necessary to specify a full wheel URL—just specify an appropriate version constraint like torch==1.2.0+cu92.

Version 1.1 (for Python 3.7 on Linux only):

pip install numpy
pip install

Version 1.2 (works for all versions of Python, and both Linux and Mac):

pip install torch==1.2.0+cpu -f

CPU-only binaries on conda can be selected with the cpuonly feature. Eliminated the pytorch-cpu conda package; instead, the cpu-only conda package can be enabled by installing the cpuonly metapackage. Similarly, there is no longer both a torchvision and torchvision-cpu package; the feature will ensure that the CPU version of torchvision is selected.

Version 1.1:

conda install -c pytorch pytorch-cpu

Version 1.2:

conda install -c pytorch pytorch cpuonly

Conda nightlies now live in the pytorch-nightly channel and no longer have “-nightly” in their name. Devs have added a new dedicated channel for nightlies called pytorch-nightly; all nightlies (pytorch, torchvision, torchaudio, etc.) will now be uploaded to this channel, but with the same name as their corresponding stable versions (unlike before, had a separate pytorch-nightly, torchvision-nightly, etc. packages.) This makes it more difficult to accidentally install a copy of the nightly and stable at the same time.

Version 1.1:

conda install -c pytorch pytorch-nightly

Version 1.2:

conda install -c pytorch-nightly pytorch

Wheel nightlies no longer have -nightly in their name. Similar to the changes made in Conda,  no longer suffix wheel nightlies with “-nightly”, to make it harder to accidentally install a copy of nightly and stable at the same time.

Version 1.1:

pip install --pre torch_nightly -f

Version 1.2:

pip install --pre torch -f

New Features

Tensor Type Support

  • torch.bool: added support for many operators (masking, comparison, arithmetic operators) to achieve feature parity with torch.uint8. See the Breaking Changes section for details about how this could affect existing programs. (21032, etc.)
  • torch.sparse.HalfTensor: Added support for torch.float16 sparse Tensors on both CPU and CUDA. (19695)
  • torch.bfloat16: Added basic creation and serialization support for Brain Floating Point Tensors. (21522215232186022852)

NN Package

  • nn.Transformer: added implementation of Transformer from Attention is All You Need. (2017022588)
  • nn.Embedding: support float16 embeddings on CUDA. (19695)
  • nn.Flatten: added a Module that performs torch.flatten. (22245)
  • nn.functional.gelu: Added support for Gaussian Error Linear Units. (2066521237)
  • nn.Module hooks: add ability to replace input/output via forward_pre_hook and forward_hook. (22285)
  • nn.Module: add requires_grad_() method for turning on/off requires_grad for Module parameters. (22576)


  • Tensor.to_sparse: now supports autograd. (20458)
  • Tensor.fill_diagonal_: operator to fill the main diagonal of a Tensor. (21892)
  • torch.qr: supports autograd. (21274)
  • torch.bitwise_not: add operator for boolean/integer types. Also have python ~ operator use this. (2228322320)
  • torch.trapz: integrate using the trapezoid rule; equivalent to numpy.trapz. (21610)
  • torch.var_mean / torch.std_mean: compute variance and mean at the same time.(18731)
  • torch.utils.ThroughputBenchmark: benchmark utility for measuring the throughput of PyTorch operators. (20766).
  • Logging: lightweight at-most-once logging to record operators that are used (c10::Logging). (20745)

Optim Package

Distributed Package

  • DistributedDataParallel: support CPU modules. (20236)
  • DistributedDataParallel: support sparse tensors. (19146)
  • DistributedDataParallel: support local gradient accumulation. (21736)


  • IterableDataset: introduces a new type of Dataset designed for data read from a stream. (19228)

Tensorboard Package

  • TensorBoard support in PyTorch has improved and is no longer experimental!
  • SummaryWriter.flush: now supported. (20607)
  • SummaryWriter.add_mesh: add support for 3D point clouds. (20413)

JIT Features

  • Improved support for iterator infrastructure. TorchScript now supports looping through a ListTupleDictTensorString and you can also use zip()enumerate(), and (21801220062199021985)
  • Support in membership checks. (21527)
  • Improved support for strings and the string libraries. (2082620188207612165620617)
  • Improved math support. (20979197072115121131211292113021512211262112721128)
  • Support for various other Python builtin functions. (21451)
  • Support for NamedTuple. (21428)
  • All the rest of the dict methods. (21979)
  • sorted() keyword for lists and dicts. (23274)
  • Add support for breaks and continues. (21692)
  • Improved custom operator API with several bugfixes and new features. It now allows more primitive types, supports torch::Listtorch::Dict and torch::Optional, supports dispatch (i.e. registering a different function for CPU and CUDA for the same operator).
  • Support nn.GRU in script. (23266)
  • Support pack_padded_sequence and pad_packed_sequence. (23249)
  • Support torch._C._get_tracing_state in TorchScript. (23248)
  • Support torch.as_tensor in TorchScript. (23247)
  • add support for recursive compilation on Modules. (20708)
  • add all builtin. (20521)
  • Add Final[T] annotated members to __constants__. (21603)
  • Add save() to scripted Functions. (20386)
  • Support for serializing class attributes. (22953)
  • Support for class annotations. (21379)
  • support Python 3.8 Constant node. (22007)
  • Support for type annotations instead of torch.jit.annotate(). (21390)
  • Support operator overloading for user-defined classes. (20033)
  • Support recursive ModuleList / Sequential. (21306)
  • Trace multiple methods in a single Module. (19905)


  • Tensor.pin_memory(): only ask for context on current device. (22229)
  • Tensor.view(): suggest using reshape() instead of contiguous() when the input is non-contiguous. (20968)
  • Tensor.numpy(): throw TypeError instead of ValueError if the type isn’t supported. (21608)
  • torch.norm: add support for p="nuc" with dim specified. (21022)
  • torch.qr: support batching of input matrices. (20689)
  • torch.qr: support some parameter akin to NumPy’s mode option. (20689)
  • torch.det / torch.logdet / torch.slogdet: added batching support. (22909)
  • torch.cdist: support batching. (20934)
  • torch.symeig: support batching. (21858)
  • torch._dirichlet_grad: support CUDA. (21191)
  • torch.randperm: support torch.float16. (22102)
  • torch.Size is now pickle-able in Python2. (20952)
  • torch.tensor / torch.as_tensor: infer device if input supports Numba’s __cuda_array_interface__. (20584)
  • torch.isinf / torch.isfinite: throw TypeError instead of ValueError when a non-tensor is passed in. (20817)
  • nn.MultiheadedAttention: add functional support. (20415)
  • nn.MultiheadedAttention: added support for key/value to have different number of features. (21288)
  • nn.MultiheadAttention: allow static key/values. (21288)
  • nn.Conv{1,2,3}D: support torch.int64 dtype in forward. (2073022594)
  • nn.AvgPool{1,2,3}D: support torch.int64 dtype in forward. (22433)
  • nn.Module: make _save_to_state_dict overrideable. (21933)
  • autograd: Checkpointing of modules inside large fanout networks no longer hits a recursion error. (22397)
  • autograd: Track in-pace changes of Tensors through Module._apply (internal API). (21865)
  • autograd.profiler: Add shape aggregation support. 20035)
  • autograd.profiler: Profile custom c10 ops. (20175)
  • DataLoader: support setting batch_size=0 to disable automatic batching (collation) in DataLoader for easier bulk loading. (19228)
  • DataLoader: add multiprocessing_context parameter. (22990)
  • DataLoader: added error detection for worker_init_fn. (20150)
  • DataLoader: Retry on EINTR. (21723)
  • torch.cuda.set_rng_state / torch.cuda.get_rng_state: accept string as device parameter. (23448)
  • CUDA: add warning when using Turing GPUs and CUDA <= 9000. (21468)
  • CUDA: warn on conditions that can trigger a cuBLAS 9.0 bug. (22034)
  • CPU: Improve CPUAllocator OOM message. (20618)
  • [memory_format]: added support for torch.emptytorch.empty_likeTensor.contiguous()Tensor.is_contiguous() to specify / check the order in which dimensions are laid out in memory. (2045520558)
  • distributions.MultivariateNormal: fix precision matrix instability. (21366)
  • distributions.transforms.SigmoidTransform: fix numerical instability. (19802)

Distributed Improvements

  • DistributedDataParallel: Support DDP forward/backward calls even if no module parameter is used. (19821)
  • DistributedDataParallel: Only call into reducer if grad is enabled. (19897)
  • DistributedDataParallel: Require finalize DDP backward only when there are indeed gradients computed, this allows application to completely discard DDP outputs and move on to the next iteration. (19901)
  • DistributedDataParallel: Improve DDP backward reduction error messages. (20586)
  • DistributedDataParallel: make DDP failure recoverable. (21591)
  • DistributedDataParallel: Delay reduction of unused parameters until first autograd hook is called. (22219)
  • c10d: support tensors shared across processes. (21449)
  • c10d: ProcessGroupMPI Add device guard around MPI operations. (22446)
  • Make shuffling optional. (22479)

Tensorboard Improvements

  • Usage of kwarg-only arguments has been removed. (21786)

Numpy Compatibility Improvements

  • Tensor.T: added numpy-like support for reversing dimensions. (20598)
  • Tensor.ndim: NumPy equivalent property for the number of dimensions. (20565)
  • Tensor.nonzero: added as_tuple argument (default False) that when True, will return a tuple of Tensors, which matches the behavior of numpy.nonzero. (20293)
  • torch.dtype: support passing in NumPy dtypes as arguments. (21215)
  • torch.normal: add size parameter when called with two floats. (20545)
  • torch.where: add one-argument overload that is an alias for Numpy-like nonzero. (21986)
  • support a number of argument name overrides, e.g. axis instead of dim. (20451)

JIT Improvements

  • The original source code debug information is now saved with the model. If a model is saved and then loaded into another process, the loaded process can now print out error messages that point to the original source code. (22177221782217922180)
  • Error message source range highlighting now includes filename, line number, and column number. (21157)
  • Better Constant Propagation through Tuples. (22561)
  • Add start and step parameters for range in TorchScript. (20795)
  • Support for threading options for TorchScript inference (doc)
  • Add max_pool2d to symbolic derivatives. (19661)
  • Optimize matmul memory usage for certain cases. (23433)
  • Avoid kernel launches for zero-sized tensor inputs. (22790)
  • Add support for steps (strides) in tensor slices. (20929)
  • Added error for classes that don’t have an __init__ function. (21880)
  • Allow classes to be used in their own methods. (20106)
  • Better error message when a variable is conditionally defined. (20911)
  • Consider contained types in alias analysis. (21431)
  • Convenience APIs for script objects. (20226)
  • Don’t print backtrace for interpreter errors. (20925)
  • Improve error msg for missing attribute. (20779)
  • Improve error msg on inferred type. (21058)
  • Improve error msg on recursive class defs. (21842)
  • Include module names in recursive error stacks. (22921)
  • Improve recursive scripting error message. (21841)
  • Index into a tuple with non constant integer. (20081)
  • Let ScriptModule buffer attributes can also cast device/type. (19700)
  • Lower batchmm to non-diff optimization. (19987)
  • Make an attribute instead of a parameter. (21078)
  • Make strtod_c compatible with different gcc abi. (21293)
  • make magic methods work with casts too. (20654)
  • Improve performance of alias analysis. (20899)
  • Print a warning if a type annotation prefix is invalid according to mypy. (20884)
  • schema_matching.cpp: improve error messages. (21141)
  • Resolve with closed over variables instead of stack frame. (22270)
  • Report errors through call stack. (22280)
  • Reduce number of stack manipulation instructions in interpreter. (21240)

C++ API Improvements

  • nn::PoissonNLLLoss: Added support. (19316)
  • nn::Module: added replace_module API to overwrite submodules in C++ Frontend. (22546)
  • nn:Module::register_module / register_parameter / register_buffer: make public (23196)
  • data::datasets::ChunkDataReader: fix include headers and a vector issue. (19485)
  • data::datasets::ChunkDataset: add new get_batch method. (21797)
  • data::datasets::ChunkDataset: add checkpoint support. (21889)
  • data::datasets::ChunkDataset: add support for cross-chunk shuffling. (22347)
  • data::datasets::ChunkDataset: add sorting policy. (23053)

MKLDNN Tensor Improvements

Add support for a number of operators on MKLDNN Tensors including:

  • Tensor.is_mkldnn: (22386)
  • Tensor.transpose(): (21943)
  • Tensor.zero_(): (20573)
  • torch.empty: (21184)
  • torch.mul: (20575)
  • nn.AdaptiveAvgPool{1,2,3}D: (19818)
  • nn.Sigmoid: (20820)
  • nn.Softmax: (21516)
  • nn.Module: support saving/loading MKLDNN modules. (20799)
  • nn.MaxPool{1,2,3}D: support ceil_mode. (21310)

PyTorch Bug Fixes

  • Indexing: fix advanced indexing where there are more than (2^31)-1 bytes in the output. (20919)
  • Indexing: fix indexing when there are more than 65535 elements in a non-indexing first dimension on CUDA. (23123)
  • Indexing: fix issue with slicing empty tensors. (20914)
  • Tensor.index_copy_: fix segfault by properly checking dimension is in range. (21617)
  • Tensor.copy_: Fix a bug where non-blocking was not being respected. (20305)
  • Tensor.clone: Fix an issue with MKLDNN tensors. (20943)
  • Tensor subclassing: give a proper error instead of crashing. (20283)
  • Fix segfault with tensors that can’t be indexed with 32-bit ints. (21530)
  • torch.range / torch.linspace / torch.logspace: properly respect the current Stream. (21619)
  • return the identity permutation instead of zeros when not using pivoting. (22242)
  • torch.einsum: Fix an issue where the backward pass would potentially be skipped. (22111)
  • torch.cosh: Fix an issue where torch.cos was instead calculated with torch.double dtype and vectorized instructions. (20797)
  • torch.triu / torch.tril: handle strides correctly for in-place versions. (22730).
  • torch.triu / torch.tril: Fix handling of batches > 65535 on CUDA. (21067)
  • torch.inverse / torch.solve / torch.cholesky_solve / torch.triangular_solve: Fix batch sizes > 65535 on CUDA. (21689)
  • torch.histc: return dtype is now the same as the input tensor on CUDA, matching CPU behavior. (20369)
  • torch.histc: properly return 1-dim tensor on CPU with 0-dim input and 1 bin. (21497)
  • torch.randperm: handle non-contiguous out parameter. (23043)
  • torch.unique: Fix empty tensor handling when dim is passed as an argument. (19000)
  • torch.min / torch.max: properly error on empty tensor inputs, as with CPU tensors. (19612).
  • CUDA: fix launch parameters for reductions. (22827).
  • torch.hub: fix an issue with find_module. (20782)
  • autograd: Fix a number of custom autograd Function corner cases by inverting the relationship between PyFunction and THPFunction. (22983)
  • autograd: give “Trying to backward through the graph a second time” error instead of internal assert when the buffers are a list of Tensors (with indexing). (21533)
  • optim.lr_scheduler.CosineAnnealingLR: rename from CosineAnnealingLr. (23242)
  • distributions.Binomial: Fix overflow of log_prob when logits is large. (20679)
  • distributions.SigmoidTransform: Fix numerical issues that could result in inf / -inf return values. (20288)
  • distributions.Categorical.sample: fix a view bug. (23328)
  • CUDA: Give proper error message for bad cuda forks. (23322)
  • pickle: Fix Unpickling error when loading multiple objects from a file. (20270)
  • NCCL: Fix race condition. (23040)

torch.nn Bug Fixes

  • nn.Conv{1,2,3}D: fix memory leak on MKLDNN code path. (22392)
  • nn.Conv{1,2,3}D: properly unpickle older pickled versions. (21687)
  • nn.CTCLoss: fix backward on CUDA when 2d target tensor is larger than max_target_length. (20971)
  • nn.CTCLoss: fix some numerical stability issues. (21392)
  • nn.CTCLoss: disable buggy non-deterministic CudNN algorithm. (22977)
  • nn.CTCLoss: fixed empty target handling. (2191023298)
  • nn.SyncBatchNorm: fix syncing of running statistics when count size differs between GPUs. (22248)
  • nn.SyncBatchNorm: retain requires_grad value when converting from nn.BatchNorm. (22569)
  • nn.SyncBatchNorm: correctly handle process_group in convert_sync_batchnorm. (19240)
  • nn.MultiheadedAttention: fix for torch.float16 dtype. (21658).
  • nn.EmbeddingBag: fix NaN output when input is empty. (21400)
  • nn.Dropout: fix python crash (with SIGFPE) when called on an empty cuda tensor. (20541)
  • nn.MaxPool: fix output size calculation in some corner cases. (22304)
  • nn.MaxPool: return valid indices if all entries are -inf. (23161)
  • nn.Softmax: respect the current Stream. (22470)
  • nn.LogSoftmax: fix numerical stability issues. (21672)
  • nn.Module.load_state_dict: break ref cycle. (20397)
  • nn.Module: fix loading in 32-bit environments. (20900)
  • nn.utils.rnn.pack_padded_sequence: Fix segfault on empty tensors. (21461)
  • nn.utils.spectral_norm: fix loading state_dict when strict=False. (22545)
  • CudNN: Fix uninitialized PoolWindow on Windows. (22405)

Distributed Bug fixes

  • nn.parallel.DataParallel: fix error in no_grad mode. (21262)
  • torch.distributed.all_gather: fix errors for views and aliases. (21490)
  • c10d: fix collective communication errors on empty tensors. (20658)

JIT Bug Fixes

  • Fix specialized list from dict keys. (23267)
  • Switch keys to be sequential and stable in pickle serialization. (23280)
  • deepCopy also copies type information of lists, (23271)
  • dictKeys and dictItems ops on typed dicts return typed lists. (23270)
  • Fix pickler bug where it would not load if no tensors were saved. (23263)
  • Avoid multiple writes to files on export. (21186)
  • Better error msg for mismatched dict key type. (22231)
  • Better error msg for using Python builtin_function_or_method. (22935)
  • Better error msg in __get_state__ to let a user know that ScriptModules can’t be deep-copied at the moment.(20885)
  • Better error msg when seeing a unsupported builtin function. (21068)
  • dropout derivative should respect the train flag. (20760)
  • Fix __constants__ for some nn modules. (21071)
  • Fix ScriptModule.__dir__(). (22426)
  • Fix 3x DenseNet compile time regression by restoring earlier-out tests in AliasDB::writesToAlias. (21425)
  • Fix a bug in loop unrolling. (21239)
  • Fix alias annotations for dict ops. (22900)
  • Fix inaccurate SourceRange reporting. (21109)
  • Fix broken indexing when using None and ellipses indexing together. (22905)
  • Fix bug in CompilationUnit::define. (21886)
  • Fix compilation order for class methods. (20094)
  • Fix dead code elimination over loops. (22632)
  • Fix dead code elimination in onnx export. (22476)
  • Fix incorrect default on Graph::toString. (21370)
  • Fix optional type promotion for classes. (21593)
  • Fix optional type unification. (19813)
  • Fix NameError with PYTORCH_JIT=0. (20120)
  • Fix overspecializing constants in compilation. (22816)
  • Fix pow() bug on overloads. (20824)
  • Fix recusive method compilation. (21862)
  • Fix reflection on weak modules, copy attributes. (20190)
  • Fix slow unpickling. (21542)
  • Fix input/output type mismatch. (20829)
  • Fix insert_guard for norm decomposation. (19646)
  • Fix Trace inlining of graphs with optional inputs. (22686)
  • Fix tracing bugs where using 1 - x in C++ would cause the size of 1 to get hardcoded. (20932)
  • Fix tuple indexing bug. (21521)
  • Fix type hints for None constants. (23029)
  • Fix weak module cuda() _flat_weights bug. (21107)
  • Fix WeakIValueEq. (21891)
  • Fixed gcd to use 64 bit integers. (21041)
  • Fixed list() not making a copy. (22093)
  • Fix race condition on Module::forward method. (21398)
  • Made a += b for lists do an in place add. (21896)
  • Made floor/ceil return ints. (21124)
  • Out-of-memory on GPU due to the “weak_script” decorators. (20588)
  • Override print when python is present. (21625)
  • Set __file__ for torch.ops. (21888)
  • Set correct list type in pybind_utils. (23188)

C++ Frontend bug fixes fpr PyTorch

  • nn::RNN: Fix assertions in bidirectional RNN. (22850).
  • nn::MaxPool nn::AvgPool: expand incomplete kernel size, as in Python. (2207322075)
  • Optim: Fix memory leak when weight_decay is applied to AdamAdagradRMSProp. (23125)
  • Optim::SGD: fix memory leak with weight_decay. (23007)
  • torch::autograd::Scatter / torch::autograd::Gather: Fix nullptr bug. (20286)
  • torch::nn::parallel::data_parallel: fix gradient computation error. (20910)
  • [C++ Extensions] Fix an issue when building multiple extensions in the same directory. (20221)

PyTorch Deprecations

Masking via torch.uint8 Tensors is now deprecated in favor of masking via torch.bool Tensors.

See the Breaking Changes section for more details about torch.bool Tensors and comparison operators.

torch.masked_selecttorch.masked_filltorch.masked_scatter now expect torch.bool masks rather than torch.uint8.

>>> a = torch.tensor([1, 2, 3])
>>> b = torch.tensor([3, 1, 2])

>>> a.masked_select(tensor([0, 1, 1], dtype=torch.uint8))
UserWarning: masked_select received a mask with dtype torch.uint8,
this behavior is now deprecated, please use a mask with dtype torch.bool instead.

tensor([2, 3])

# instead use torch.bool
>>> a.masked_select(tensor([False,  True,  True]))
tensor([2, 3])

Comparison operators with out= parameters now expect torch.bool dtype rather than torch.uint8.

>>> a = torch.tensor([1, 2, 3])
>>> b = torch.tensor([3, 1, 2])
>>> res = torch.empty_like(a, dtype=torch.uint8)
>>>, b, out=res)
UserWarning: received 'out' parameter with dtype torch.uint8, this behavior
is now deprecated, please use 'out' parameter with dtype torch.bool instead.

tensor([0, 1, 1], dtype=torch.uint8)

# instead use torch.bool
>>> res = torch.empty_like(a, dtype=torch.bool)
>>>, b, out=res)
tensor([False, True, True])

Legacy autograd.Function (Function without static forward method) is now deprecated

>>> class MyLegacyFunction(Function):
>>>     def forward(self, x):
>>>         return x
>>>     def backward(self, grad_output):
>>>         return grad_output
>>> MyLegacyFunction()(torch.randn((3,), requires_grad=True)
UserWarning: Legacy autograd function with non-static forward method is deprecated
and will be removed in 1.3. Please use new-style autograd function
with static forward method.

# instead use new-style Autograd Function
>>> class MyFunction(Function):
>>>     @staticmethod
>>>     def forward(ctx, x):
>>>         return x
>>>     @staticmethod
>>>     def backward(ctx, grad_output):
>>>         return grad_output
>>> MyFunction.apply(torch.randn((3,), requires_grad=True)

See the torch.autograd.Function documentation for more details.

torch.gels: has been renamed to torch.lstsqtorch.gels will work for this release but is now deprecated. (23460)

Performance upgrades to PyTorch

  • Advanced Indexing: significantly improve performance of advanced indexing backward. (20557)
  • Tensor.copy_: increase broadcasting CUDA copy performance by 25%. (20685)
  • torch.matmul: Optimize the case A.ndim <= 2 && B.ndim >= 3, shows up to 15x speed up. (20448)
  • torch.bmm: Improve performance by up to 3x for small cases on CPU by applying TensorAccessor. (20266)
  • torch.inverse: Move workspace query and allocation outside loop to improve performance by up to 5x. (20904)
  • torch.topk: Optimize CPU perf using parallel and partial sort, up to 6x improvement. (22865)
  • torch.cdist: Improve CPU perf by up to 10x for some cases. (20605)
  • torch.normal: Move normalnormal_meansnormal_stddevs, and normal_means_stddevs to ATen, increasing performance by up to 3x. (21287)
  • torch.bernoulli: Speedup bernoulli_scalar_cuda_kernel with grid-stride loop, increasing performance by up to 2x. (21300)
  • torch.coalesce: Use _sparse_coo_tensor_unsafe in coalesce for up to 10x speedup. (21214)
  • torch.sinh / torch.cosh: Parallelize and vectorize on CPU. (21115)
  • torch.lerp: Vectorize on CPU. (22038)
  • torch.eye: Parallelize on CPU. (21077)
  • torch.randperm: Parallelize initialization in randperm on CPU. (21529)
  • Vectorization: Don’t split 256-bit AVX2 load/store intrinsics. (20609).

Torch.NN Performance Improvements

  • nn.Softmax: Add persistent CUDA kernels that increase performance 2-10x on small inputs. (20827)
  • nn.Embedding / nn.EmbeddingBag: Optimize CUDA kernel, increasing performance up to 2.7x. (22016)
  • nn.Linear: optimize BERT model perf by using mkldnn inner product. (21851)
  • nn.Conv{1,2,3}D: improve perf for depthwise convolutions in torch.float16 on Volta and Turing GPUs. (22302)
  • nn.RNN: optimize on CPU by fusing matmul ops. (22512)
  • nn.Upsample: a number of significant perf improvements on CUDA. (2187921694).
  • nn.functional.layer_norm: optimize a fast path for layer_norm, increasing perf by up to 4x on CPU. (2034520883)
  • Use mkldnn inner product for nn.Linear() to improve BERT perf. (21851).

PyTorch Documentation

  • torch.bool: doc the Boolean tensor type. (21601)
  • torch.as_strided: add docs. (22842)
  • torch.empty_strided: add docs. (23740)
  • torch.lerp: clarify broadcasting requirements. (23268)
  • torch.enable_grad / torch.no_grad / torch.set_grad_enable: clarify interaction between these features. (23310)
  • torch.autograd.grad_mode: Document that no_grad is thread local. (21755)
  • torch.multiprocessing: Explain refcounting of CUDA tensors. (19904)
  • torch.Tensor: Add a warning about memory usage. (20801)
  • Document RNG state consumption. (22540)
  • torch.optim.lr_scheduler.CyclicLR: Clarify base_momentum and max_momentum. (20880).
  • Document production environment features. (23010)
  • Add note about contributing recently released research. (23513)
  • Clarify performance implications of deterministic mode. (21337)
  • Update cuda pinned memory note to include (20977)

Torch.NN Documentation

  • nn.functional / nn.init: Break up NN in docs so they load faster. (21291)
  • nn.functional.conv{1,2,3}d: Remove padding_mode. (20891)
  • nn.functional.upsample / nn.functional.interpolate: add note about overshooting with mode=‘bicubic’. (23321)
  • nn.init.zeros_ / nn.init.ones_: add documentation. (23145)
  • nn.MultiheadAttention: Add documentation for add_bias_kvadd_zero_attn, and attn_mask. (20071)
  • nn.MultiheadAttention: Fix documentation for attention mask shape. (20850)
  • nn.Softmax: Fixed to specify dimension to prevent warning in 1.1.0. (20310)

PyTorch Contributor Documentation

  • Updated web links on contribution_guide and governance documentation. (21243)
  • Improve documentation for publishing hub models. (21307)
  • Suggest a faster linker in the contributing guide. (21334)
  • Add CUDA C++11 and profiling notes to the contribution guide. (21386)

PyTorch Build Documentation

  • Add magma for CUDA 10.1 to Windows docs. (19914)
  • Improve build-from-source instructions. (20088)
  • Add ninja to build instructions. (20079)
  • Update libtorch build docs. (21150)

TensorBoard Documentation

  • Tensorboard Documentation has been greatly improved! Browse the latest version here.

Torch HUB Documentation

  • Improve docs for publishing hub models. (21307)
  • Update docs of entry point in hub. (21568)


PyTorch 1.2, has added the full support for ONNX Opset 7, 8, 9 and 10 in ONNX exporter, and have also enhanced the constant folding pass to support Opset 10. The export of ScriptModule has better support. Additionally, users now are able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export.

PyTorch Supporting More ONNX Opsets

  • Add basic supports for multiple ONNX Opsets and support for Opset 10. (19294)
  • Support ONNX Opset 7 and 8 in PyTorch ONNX Exporter. (2242120036)
  • Export Dropout for Opset 10. (20710)
  • Export Slice and Flip for Opset 10. (20533)
  • Export Interpolate (Resize) for Opset 10. (21434)

Enhancing the Support for ScriptModule in PyTorch

  • Support multiple outputs in ScriptModule in ONNX Exporter. (20256)
  • Support tensor factories in ScriptModule in ONNX Exporter. (20255)
  • Support tuples as inputs and outputs in ScriptModule. (20784)

Exporting More Torch Operators to ONNX with PyTorch

  • Export custom ops. (21321)
  • Export torch.arange . (22601)
  • Export torch.masked_fill. (22521)
  • Export torch.floortorch.ceiltorch.log2 and prim::shape. (17895)
  • Export torch._dim_arange. (20078)
  • Export torch.randn_like. (20093)
  • Export torch._standard_gamma. (20126)
  • Export torch.topk. (21104)
  • Export __ and____or__. (17894)
  • Export torch.sign. (20470)
  • Export torch.scatter. (18543)
  • Export torch.rand. (20559)
  • Export torch.gather. (21235)
  • Export torch.cosine_similarity. (21884)
  • Export torch.sum. (22240)
  • Export torch.logsumexp. (22306)
  • Export torch.layer_norm. (22265)

Extending Existing Exporting Logic in PyTorch

  • Support torch.min and torch.max with dim. (19689)
  • Support maxpool with dilations. (18721)
  • Support RNN with batch_first=True. (19766)
  • Support Upsample with dynamic input. (20116)
  • Improve support for Loop export. (20445)
  • Enable torch.full with scalar parameters. (21931)
  • Added support for exporting models with variable length input/output to ONNX. (20034)

Optimizing Exported ONNX Graph in PyTorch

  • Support constant folding in Opset 10. (22515)
  • Support negative indexing for Slice in constant folding optimization. (21811)

Bugfixes/Improvements in PyTorch

  • Fix the shape of PReLU weight. (21330)
  • Fix the export for torch.pixel_shuffle. (21486)
  • Fix the export for torch.full. (21669)
  • Update logic for folding onnx::Constant nodes. (20109)


Interested in more PyTorch Blogs?

Learn more about PyTorch Deep Learning Software

Deep Learning PC