TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.
The newest version of TensorFlow gives new mechanisms to solve input bottlenecks and save resources, new tools like Memory Profiler, experimental support for Keras Preprocessing Layers API, and more.
Major Features and Improvements
- tf.data adds two new mechanisms to solve input pipeline bottlenecks and save resources:
In addition checkout the detailed guide for analyzing input pipeline performance with TF Profiler.
- tf.distribute.TPUStrategy is now a stable API and no longer considered experimental for TensorFlow. (earlier tf.distribute.experimental.TPUStrategy).
- TF Profiler introduces two new tools: a memory profiler to visualize your model’s memory usage over time and a python tracer which allows you to trace python function calls in your model. Usability improvements include better diagnostic messages and profile options to customize the host and device trace verbosity level.
- Introduces experimental support for Keras Preprocessing Layers API (tf.keras.layers.experimental.preprocessing.*) to handle data preprocessing operations, with support for composite tensor inputs. Please see below for additional details on these layers.
- TFLite now properly supports dynamic shapes during conversion and inference. We’ve also added opt-in support on Android and iOS for XNNPACK, a highly optimized set of CPU kernels, as well as opt-in support for executing quantized models on the GPU.
- Libtensorflow packages are available in GCS starting this release. We have also started to release a nightly version of these packages.
- The experimental Python API tf.debugging.experimental.enable_dump_debug_info() now allows you to instrument a TensorFlow program and dump debugging information to a directory on the file system. The directory can be read and visualized by a new interactive dashboard in TensorBoard 2.3 called Debugger V2, which reveals the details of the TensorFlow program including graph structures, history of op executions at the Python (eager) and intra-graph levels, the runtime dtype, shape, and numerical composistion of tensors, as well as their code locations.
- Increases the minimum bazel version required to build TF to 3.1.0.
- Makes the following (breaking) changes to the tf.data.
- C++ API: – IteratorBase::RestoreInternal, IteratorBase::SaveInternal, and DatasetBase::CheckExternalState become pure-virtual and subclasses are now expected to provide an implementation.
- The deprecated DatasetBase::IsStateful method is removed in favor of DatasetBase::CheckExternalState.
- Deprecated overrides of DatasetBase::MakeIterator and MakeIteratorFromInputElement are removed.
- The signature of tensorflow::data::IteratorBase::SaveInternal and tensorflow::data::IteratorBase::SaveInput has been extended with SerializationContext argument to enable overriding the default policy for the handling external state during iterator checkpointing. This is not a backwards compatible change and all subclasses of IteratorBase need to be updated accordingly.
- Add a new BackupAndRestore callback for handling distributed training failures & restarts. Please take a look at this tutorial for details on how to use the callback.
- tf.image.extract_glimpse has been updated to correctly process the case
where centered=False and normalized=False. This is a breaking change as
the output is different from (incorrect) previous versions. Note this
breaking change only impacts tf.image.extract_glimpse and
tf.compat.v2.image.extract_glimpse API endpoints. The behavior of
tf.compat.v1.image.extract_glimpse does not change. The behavior of
exsiting C++ kernel ExtractGlimpse does not change either, so saved
models using tf.raw_ops.ExtractGlimpse will not be impacted.
- Keras-based LSTM models must be converted with an explicit batch size in the input layer.
Bug Fixes and Other Changes
- Set tf2_behavior to 1 to enable V2 for early loading cases.
- Add execute_fn_for_device function to dynamically choose the implementation based on underlying device placement.
- Add reduce_logsumexp benchmark with experiment compile.
- Give EagerTensors a meaningful __array__ implementation.
- Add another version of defun matmul for performance analysis.
- AutoGraph now includes into TensorFlow loops any variables that are closed over by local functions. Previously, such variables were sometimes incorrectly ignored.
- functions returned by the get_concrete_function method of tf.function objects can now be called with arguments consistent with the original arguments or type specs passed to get_concrete_function. This calling convention is now the preferred way to use concrete functions with nested values and composite tensors. Please check the guide for more details on concrete_ function.
- Update tf.function’s experimental_relax_shapes to handle composite tensors appropriately.
- Optimize tf.function invocation, by removing redundant list converter.
- tf.function will retrace when called with a different variable instead of simply using the dtype & shape.
- Improve support for dynamically-sized TensorArray inside tf.function.
- Narrow down argmin/argmax contract to always return the smallest index for ties.
- tf.math.reduce_variance and tf.math.reduce_std return correct computation for complex types and no longer support integer types.
- Add Bessel functions of order 0,1 to tf.math.special.
- tf.divide now always returns a tensor to be consistent with documentation and other APIs.
- Replaced tf.image.non_max_suppression_padded with a new implementation that supports batched inputs, which is considerably faster on TPUs and GPUs. Boxes with area=0 will be ignored. Existing usage with single inputs should still work as before.
- Add tf.linalg.banded_triangular_solve.
- Add tf.random.stateless_parameterized_truncated_normal.
- Add tf.ragged.cross and tf.ragged.cross_hashed operations.
- RaggedTensor.to_tensor() now preserves static shape.
- Add tf.strings.format() and tf.print() to support RaggedTensors.
- @tf.function from SavedModel no longer ignores args after a RaggedTensor when selecting the concrete function to run.
- Fix save model issue for ops with a list of functions.
- Add tf.saved_model.LoadOptions with experimental_io_device as arg with default value None to choose the I/O device for loading models and weights.
- Update tf.saved_model.SaveOptions with experimental_io_device as arg with default value None to choose the I/O device for saving models and weights.
- No longer includes PTX kernels for GPU except for sm_70 to reduce binary size. On systems with NVIDIA® Ampere GPUs (CUDA architecture 8.0) or newer, kernels are JIT-compiled from PTX and TensorFlow can take over 30 minutes to start up. This overhead can be limited to the first start up by increasing the default JIT cache size with: export CUDA_CACHE_MAXSIZE=2147483648.:
- Retain parent namescope for ops added inside tf.while_loop/tf.cond/tf.switch_case.
- Update tf.vectorized_map to support vectorizing tf.while_loop and TensorList operations.
- tf.custom_gradient can now be applied to functions that accept nested structures of tensors as inputs (instead of just a list of tensors). Note that Python structures such as tuples and lists now won’t be treated as tensors, so if you still want them to be treated that way, you need to wrap them with tf.convert_to_tensor.
- No lowering on gradient case op when input is DeviceIndex op.
- Extend the ragged version of tf.gather to support batch_dims and axis args.
- Update tf.map_fn to support RaggedTensors and SparseTensors.
- Deprecate tf.group. It is not useful in eager mode.
- Add CPU and GPU implementation of modified variation of FTRL/FTRLV2 that can triggerred by multiply_linear_by_lr allowing a learning rate of zero.
- tf.data.experimental.dense_to_ragged_batch works correctly with tuples.
- tf.data.experimental.dense_to_ragged_batch to output variable ragged rank.
- tf.data.experimental.cardinality is now a method on tf.data.Dataset.
- tf.data.Dataset now supports len(Dataset) when the cardinality is finite.
- Expose experimental tf.distribute.DistributedDataset and tf.distribute.DistributedIterator to distribute input data when using tf.distribute to scale training on multiple devices.
- Allow var.assign on MirroredVariables with aggregation=NONE in replica context. Previously this would raise an error. We now allow this because many users and library writers find using .assign in replica context to be more convenient, instead of having to use Strategy.extended.update which was the previous way of updating variables in this situation.
- tf.distribute.experimental.MultiWorkerMirroredStrategy adds support for partial batches. Workers running out of data now continue to participate in the training with empty inputs, instead of raising an error. Learn more about partial batches here.
- Improve the performance of reading metrics eagerly under tf.distribute.experimental.MultiWorkerMirroredStrategy.
- Fix the issue that strategy.reduce() inside tf.function may raise exceptions when the values to reduce are from loops or if-clauses.
- Fix the issue that tf.distribute.MirroredStrategy cannot be used together with tf.distribute.experimental.MultiWorkerMirroredStrategy.
- Add a tf.distribute.cluster_resolver.TPUClusterResolver.connect API to simplify TPU initialization.
- Introduces experimental preprocessing layers API (tf.keras.layers.experimental.preprocessing) to handle data preprocessing operations such as categorical feature encoding, text vectorization, data normalization, and data discretization (binning). The newly added layers provide a replacement for the legacy feature column API, and support composite tensor inputs.
- Added categorical data processing layers:
- IntegerLookup & StringLookup: build an index of categorical feature values
- CategoryEncoding: turn integer-encoded categories into one-hot, multi-hot, or tf-idf encoded representations
- CategoryCrossing: create new categorical features representing co-occurrences of previous categorical feature values
- Hashing: the hashing trick, for large-vocabulary categorical features
- Discretization: turn continuous numerical features into categorical features by binning their values
- Improved image preprocessing layers: CenterCrop, Rescaling
- Improved image augmentation layers: RandomCrop, RandomFlip, RandomTranslation, RandomRotation, RandomHeight, RandomWidth, RandomZoom, RandomContrast
- Improved TextVectorization layer, which handles string tokenization, n-gram generation, and token encoding
- The TextVectorization layer now accounts for the mask_token as part of the vocabulary size when output_mode=’int’. This means that, if you have a max_tokens value of 5000, your output will have 5000 unique values (not 5001 as before).
- Change the return value of TextVectorization.get_vocabulary() from byte to string. Users who previously were calling ‘decode’ on the output of this method should no longer need to do so.
- Introduce new Keras dataset generation utilities :
- image_dataset_from_directory is a utility based on tf.data.Dataset, meant to replace the legacy ImageDataGenerator. It takes you from a structured directory of images to a labeled dataset, in one function call. Note that it doesn’t perform image data augmentation (which is meant to be done using preprocessing layers).
- text_dataset_from_directory takes you from a structured directory of text files to a labeled dataset, in one function call.
- timeseries_dataset_from_array is a tf.data.Dataset-based replacement of the legacy TimeseriesGenerator. It takes you from an array of timeseries data to a dataset of shifting windows with their targets.
- Added experimental_steps_per_execution
arg to model.compile to indicate the number of batches to run per tf.function call. This can speed up Keras Models on TPUs up to 3x.
- Extends tf.keras.layers.Lambda layers to support multi-argument lambdas, and keyword arguments when calling the layer.
- Functional models now get constructed if any tensor in a layer call’s arguments/keyword arguments comes from a keras input. Previously the functional api would only work if all of the elements in the first argument to the layer came from a keras input.
- Clean up BatchNormalization layer’s trainable property to act like standard python state when it’s used inside tf.functions (frozen at tracing time), instead of acting like a pseudo-variable whose updates kind of sometimes get reflected in already-traced tf.function traces.
- Add the Conv1DTranspose layer.
- Refine the semantics of SensitivitySpecificityBase derived metrics. See the updated API docstrings for tf.keras.metrics.SensitivityAtSpecificity and tf.keras.metrics.SpecificityAtSensitivty.
- Restored inference_input_type and inference_output_type flags in TF 2.x TFLiteConverter (backward compatible with TF 1.x) to support integer (tf.int8, tf.uint8) input and output types in post training full integer quantized models.
- Added support for converting and resizing models with dynamic (placeholder) dimensions. Previously, there was only limited support for dynamic batch size, and even that did not guarantee that the model could be properly resized at runtime.
- Enabled experimental support for a new quantization mode with 16-bit activations and 8-bit weights. See lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8.
- Fix an issue w/ dynamic weights and Conv2D on x86.
- Add a runtime Android flag for enabling XNNPACK for optimized CPU performance.
- Add a runtime iOS flag for enabling XNNPACK for optimized CPU performance.
- Add a compiler flag to enable building a TFLite library that applies XNNPACK delegate automatically when the model has a fp32 operation.
- Allow GPU acceleration starting with internal graph nodes
- Experimental support for quantized models with the Android GPU delegate
- Add GPU delegate whitelist.
- Rename GPU whitelist -> compatibility (list).
- Improve GPU compatibility list entries from crash reports.
- Set default value for StatefulNnApiDelegate::Options::max_number_delegated_partitions to 3.
- Add capability to disable NNAPI CPU and check NNAPI Errno.
- Fix crashes when using NNAPI with target accelerator specified with model containing Conv2d or FullyConnected or LSTM nodes with quantized weights.
- Fix ANEURALNETWORKS_BAD_DATA execution failures with sum/max/min/reduce operations with scalar inputs.
- TFLite Hexagon Delegate out of experimental.
- Experimental int8 support for most hexagon ops.
- Experimental per-channel quant support for conv in Hexagon delegate.
- Support dynamic batch size in C++ API.
- Opensource CoreML delegate
- Enable building Android TFLite targets on Windows
- Add support for BatchMatMul.
- Add support for half_pixel_centers with ResizeNearestNeighbor.
- Add 3D support for BatchToSpaceND.
- Add 5D support for BroadcastSub, Maximum, Minimum, Transpose and BroadcastDiv.
- Rename kTfLiteActRelu1 to kTfLiteActReluN1To1.
- Enable flex delegate on tensorflow.lite.Interpreter Python package.
- Add Buckettize, SparseCross and BoostedTreesBucketize to the flex whitelist.
- Add support for selective registration of flex ops.
- Add missing kernels for flex delegate whitelisted ops.
- Fix issue when using direct ByteBuffer inputs with graphs that have dynamic shapes.
- Fix error checking supported operations in a model containing HardSwish.
- Added tf.sysconfig.get_build_info(). Returns a dict that describes the build environment of the currently installed TensorFlow package, e.g. the NVIDIA CUDA and NVIDIA CuDNN versions used when TensorFlow was built.
- Fix a subtle use-after-free issue in XStatVisitor::RefValue().
- Adds 3D mesh support in TPU configurations ops.
- Added TPU code for FTRL with multiply_linear_by_lr.
- Silently adds a new file system registry at gstpu.
- Support restartType in cloud tpu client.
- Depend on a specific version of google-api-python-client.
- Fixes apiclient import.
Tracing and Debugging
- Add a TFE_Py_Execute traceme.
- Implement stable argmin and argmax
Bug Fixes and Other Changes
- Mutable tables now restore checkpointed values when loaded from SavedModel
This release has 2 assets:
- Source code (zip)
- Source code (tar.gz)
Visit the release page to download them.
Whether you’re an expert or a beginner, TensorFlow is an end-to-end platform that makes it easy for you to build and deploy machine learning (ML) models.
TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google’s Machine Intelligence Research organization to conduct machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.
Easy model building
TensorFlow offers multiple levels of abstraction so you can choose the right one for your needs. Build and train models by using the high-level Keras API, which makes getting started with TensorFlow and machine learning easy.
If you need more flexibility, eager execution allows for immediate iteration and intuitive debugging. For large ML training tasks, use the Distribution Strategy API for distributed training on different hardware configurations without changing the model definition.
Robust ML production anywhere
TensorFlow has always provided a direct path to production. Whether it’s on servers, edge devices, or the web, TensorFlow lets you train and deploy your model easily, no matter what language or platform you use.
Powerful experimentation for research
Build and train state-of-the-art models without sacrificing speed or performance. TensorFlow gives you the flexibility and control with features like the Keras Functional API and Model Subclassing API for creation of complex topologies. For easy prototyping and fast debugging, use eager execution.
TensorFlow also supports an ecosystem of powerful add-on libraries and models to experiment with, including Ragged Tensors, TensorFlow Probability, Tensor2Tensor and BERT.