TNC Guide
This guide acts as a starting point to get familiar with the library and its features. It is intended to be more abstract than the documentation, while providing more info and background than the examples. If anything is missing, feel free to open an issue on Github.
About
TNC is a library for general tensor network contraction, with a focus on classical simulation of quantum circuits.
It supports the local contraction of tensor networks as well as parallel contraction on distributed-memory systems (HPC). To this end, we employ partitioning of tensor networks to contract the individual partitions in parallel on different compute nodes. Different methods for optimizing a partitioning before contraction are available.
The library allows to build and import tensor networks and offers methods to find contraction paths.
Getting Started
TNC is not yet on crates.io. Instead, it can be added to your project by running
cargo add --git https://github.com/qc-tum/TNC tnc
or by directly modifying your Cargo.toml file to include
[dependencies]
tnc = { git = "https://github.com/qc-tum/TNC.git" }
Features
There are no default features activated. The list of optional features to include is:
cotengra: Enables Rust bindings to the tree annealing, tree reconfiguration and tree tempering methods of cotengra (for improving contraction paths)mkl: Uses the Intel Math Kernel Library (MKL) for performing tensor contractions. Otherwise, a pure Rust implementation is used
System Requirements
As noted in the README, the library relies on system dependencies. To install them, run
sudo apt install libhdf5-dev openmpi-bin libopenmpi-dev libboost-program-options-dev
Furthermore, the library relies on C++ libraries that are being built when building the library.
For this, cmake and a C++ compiler are required on the system.
Optionally, to use the cotengra feature, Python must be installed and the following Python packages must be installed:
pip install cotengra kahypar optuna
Theoretical Background
Here, we want to give a quick, informal overview of the theory behind tensor networks for those not so familiar with the matter. Feel free to skip this if you already know the concepts.
Tensors
A tensor is a multi-dimensional array.
If you are familiar with the Python package numpy, you can think of it as a numpy array.
It has a number of dimensions and each dimension has a specific size.
A few examples:
- a tensor with zero dimensions is a scalar.
- a tensor with one dimension is a vector. The size of the dimension is the number of elements in the vector.
- a tensor with two dimensions is a matrix. The size of the two dimensions specify the number of rows and columns of the matrix.
In the case of our library, the tensor elements are always complex numbers.
A tensor is usually visualized by a circle with an outgoing line (“leg”, “edge”, “bond”) for each dimension it has. For example, here is a tensor with three dimensions:
Contraction
Contraction is a binary tensor operation where specified dimensions of both tensors are multiplied together and summed over to create a new tensor. The contracted dimensions of both tensors must match in size.
To make it more concrete, let us consider two matrices, i.e., two tensors with two dimensions each.
Say tensor 1 has dimension sizes [a, b] and tensor 2 has dimension sizes [b, c].
Hence, the second dimension of tensor 1 has the same size as the first dimension of tensor 2.
Contracting the two tensors along this dimension is now exactly the same as performing a matrix-matrix multiplication:
It will result in a tensor [a, c] where each element results from summing over the product of the elements of the two tensors.
In general, any contraction can be interpreted as matrix-matrix multiplication. We move all dimensions which are not involved to the front and move the involved dimensions to the back (and do the reverse for the second tensor). The order of the involved dimensions must be the same in both tensors. Then we reshape the tensors such that all involved dimensions are a single dimension and all uninvolved dimensions are a single dimension. This leaves us with matrices which we can then multiply. Finally, we can reshape the resulting matrix into the original dimensions again.
Example:
Contracting tensor [a, b, c, d] and tensor [d, a, e, f] along dimensions a, d:
- Group into non-involved and involved:
[b, c, a, d]and[a, d, e, f] - Reshape into matrices
[I, J]and[J, K]withI = (b, c),J = (a, d),K = (e, f) - Multiply to get matrix
[I, K] - Reshape into resulting tensor
[b, c, e, f]
Visually, two tensors that are to be contracted along some dimensions share the legs corresponding to those dimensions. For example, a matrix-matrix multiplication would be visualized like this:
Tensor Networks
Often we need to contract multiple tensors together to compute something. A tensor network is a collection of tensors that are to be contracted together. The shape of the final tensor corresponds to the open legs of the network, i.e., the legs that are not connected to another tensor. For example, the following tensor network would end up as a tensor with 5 dimensions:
Contraction Paths
Contraction paths specify the order in which tensor networks are contracted. We assume the tensors in the tensor network are enumerated in some way. The contraction path is then a list of pairs of numbers, specifying which tensors to contract together.
Important
While any order will result in the same result, the required number of operations and the sizes of intermediate tensors can vary significantly!
Note that finding an optimal contraction path is NP-hard in general. However, there exist different heuristics to find good contraction paths in practice.
User guide
Structure of tensors
In contrast to the common design of having tensors and tensor networks, this library allows arbitrary nesting of tensors. This means that the children of tensor networks can themselves be tensor networks again, forming a tree-like structure. We call tensors with children “composite tensors” and those without children “leaf tensors”.
Composite tensors
Composite tensors are the equivalent to tensor networks, as they own a list of tensors that are their children.
However, these child tensors can also be composite, allowing for a hierarchical tree structure.
The outer legs of a composite tensor can be obtained by the external_tensor method.
Leaf tensors
Leaf tensors have legs with corresponding bond dimensions and can optionally have data.
Rationale
The idea for this recursive design is natural: When looking only at the outer (i.e. open) legs of a tensor network, it can be seen as a plain tensor again. It has legs with sizes and it has data (that is obtained by contracting the tensor network).
In addition, this format is useful for multi-level parallelization based on partitioning the network: For example, the top level could be a composite tensor, where each children is assigned to one compute node. Each children is also again a composite tensor, where each children is assigned to one core. Finally, each of those children is again a composite tensor that is an actual tensor network (i.e., with leaf tensors as children).
Constructing tensor networks
There are multiple ways to construct tensors and tensor networks.
Quantum
Since this library focuses on simulation of quantum circuits, there are multiple methods to construct corresponding tensor networks.
OpenQASM2 code
If the goal is to clasically simulate quantum circuits, one can directly load OpenQASM2 code and construct a Circuit out of it using import_qasm.
Note
This library implements many standard gates and when it encounters one in the QASM code, it will not look for a gate definition; only when it doesn’t know the gate, it will decompose the gate using an earlier gate definition in the QASM code.
From the circuit, we can then construct different tensor networks, depending on what we want to compute:
into_amplitude_networkcreates a tensor network that computes the amplitude(s) to one or more states.into_statevector_networkcreates a tensor network that computes the full statevector.into_expectation_value_network: creates a tensor network that computes the expectation value of the circuit with respect toZobservables on each qubit.
Circuit builder
Similar to importing QASM2 code, the Circuit struct can also directly be used to construct tensor networks that simulate quantum circuits.
Sycamore circuit
There are special methods to construct tensor networks corresponding to the quantum circuits of the Sycamore experiment (Quantum supremacy using a programmable superconducting processor (Arute et al.)). See the sycamore_circuit method.
HDF5 files
Tensors and tensor networks can also be saved and loaded from HDF5 files, see the functions in hdf5.
The structure of the files is:
[Group name="tensors"]
[Dataset name="tensorA" datatype=double complex tensor]
[Attribute name="bids" datatype=int list]
[Dataset name="tensorB" datatype=double complex tensor]
[Attribute name="bids" datatype=int list]
...
where bids are the leg IDs.
General tensor networks
The Tensor struct can be used to directly construct arbitrary tensors and tensor networks.
Tensors are created from a list of leg IDs and the corresponding dimensions of these legs.
Connected tensors are identified by having at least one leg ID in common.
The corresponding bond dimensions have to match.
Tensors without data can already be used for e.g. finding a contraction path, but if you want to actually contract a tensor network, the tensors need data.
For this, there is the set_tensor_data method which takes a variant of TensorData.
A normal tensor network is a list of tensors. However, this library also supports hierarchical tensor network structures, which are detailed in another tutorial.
Pathfinding and Contraction
Contraction Paths
To contract a tensor network, we need to specify the order in which the contractions should appear. This is passed as a list of pairs of tensor indices. There are different formats / interpretions of contraction paths; this library uses what we call the “replace left” format:
| Format | Action for a single contraction (i, j) | Example for 4 tensors |
|---|---|---|
| SSA | Contract tensor i and j, then append resulting tensor at end | [(0, 2), (1, 3), (4, 5)], result at index 6 |
| opt-einsum | Pop tensors i and j, then contract and append resulting tensor at end | [(0, 2), (0, 2), (0, 1)], result at index 0 |
| Replace left | Contract tensor i and j, then replace i by the resulting tensor | [(0, 2), (1, 3), (0, 1)], result at index 0 |
The rationale behind this format is that it can operate in-place without modifying the size of the tensors list, hence avoiding moving of elements and reallocations.
Hierarchical Paths
Since tensor networks can be nested to form a tree-like structure (see Tensor Structure), contraction paths can likewise be nested, specifying sub-paths for sub-networks.
A contraction path hence specifies the contraction paths for all composite children of a composite tensors, and a top-level path to use after all children have been contracted to single tensors.
An abstract example could look like this: [{0: [(0, 1), (0, 2)], 2: [(0, 1)]}, (2, 1), (0, 2)].
This specifies the paths to contract composite tensors 0 and 2, followed by the top-level path to contract the resulting tensor networks.
Note that the tensor numbering is not globally unique: In every composite tensor, we start labeling from 0 again.
Pathfinding
To find contraction paths for a given tensor network, the library offers various contraction path finders.
The best choice is usually to use Cotengrust, which integrates a Rust port of path finders from the Python library cotengra.
It has three variants specified by OptMethod.
Contraction
To contract the tensor network locally, just use contract_tensor_network and pass in the tensor network and a contraction path.
Partitioning
The library uses partitiioning of tensor networks to parallelize the contraction. For example, given the following tensor network
one can partition it into two separate tensor networks like this:
To partition a tensor network, use find_partitioning and specify how many partitions should be created.
This will use the Hypergraph partitioner KaHyPar to find roughly equally sized partitions with minimial cost for the legs between the partitions.
To actually get a partitioned tensor network, use partition_tensor_network and pass the partitioning (you can of course also provide your own).
The partitioning just specifies for each tensor to which partition it should belong to.
This initial partitioning can be suboptimal for contraction, though.
For this reason, we provide multiple methods to iteratively refine the partitioning for lower time-to-solution.
The best method is simulated annealing with the IntermediatePartitioningModel.
For details on the method, see Optimizing Tensor Network Partitioning using Simulated Annealing (Geiger et al.).
Parallelization
Distributed memory parallelism
The parallelization strategy currently is partitioning: Given a tensor network, it can be partitioned into multiple networks using the find_partitioning function which makes use of the hypergraph partitioning library KaHyPar.
Then, the partitioned tensor network can be distributed to individual nodes using scatter_tensor_network.
Each node can then indepently contract its part of the tensor network.
No communication is needed during this time.
Finally, the results are gathered in a parallel reduce operation, where the tensors are sent between nodes according to the contraction path, contracted locally and sent again, until the final contraction, which is guaranteed to happen on rank 0.
Shared memory parallelism
Given the large tensor sizes that can occur during contraction, we do not partition tensor networks further than the node level. Instead, we use the avilable cores to parallelize the individual tensor tensor contractions.
What about slicing?
Slicing is currently not supported, as it is not easy to combine it with partitioning. We hope to implement it at a later point.
Dealing with memory limits
Unfortunately, the high memory requirements of tensor network contraction are a general problem.
One thing to try is to use less or more partitions, as there can be sweet spots – more is not always better.
In particular, the memory requirements can already be computed theoretically (with the functions in contraction_cost) before doing any actual run on a compute cluster.
Other than that, the library unfortunately currently lacks support for slicing, which would allow trading compute time for lower memory requirements.
Running on HPC
Running with MPI
The library can run on multiple connected compute nodes such as in a HPC setting.
It uses MPI for communication between nodes.
To run the code with MPI, compile it (e.g. using cargo build -r) and then execute the binary found in the target/release
folder using an MPI launcher (such as mpirun).
For example:
cargo build -r --example distributed_contraction
mpirun -n 4 target/release/examples/distributed_contraction
This command runs the executable on 4 nodes in parallel. While the nodes could be on the same physical device, this would make memory limitations even more severe. Instead, you usually want to do this with distributed nodes, where each node has its own memory. This is common in HPC centers.
Running on a cluster
Usually, HPC clusters use a cluster management system, most commonly SLURM. Here, the steps to running this library are similar to:
- Load modules for the required dependencies (HDF5, Boost, Python, …)
- Optionally create a virtual Python environment and install the Python dependencies for cotengra
- Build the code using
cargo build -r(assuming the login nodes have the same architecture as the compute nodes) - Write a SLURM script that calls
mpirunon the compiled binary - Submit the job
Tips
- For max performance:
- Use
export RUSTFLAGS='-Ctarget-cpu=native'when building (assuming the login nodes have the same hardware as the compute nodes) - Add the following to your
Cargo.toml:[profile.release] codegen-units = 1 # don't split crate into multiple compilation units lto = true # enable link time optimization
- Use
- There might be problems with finding or loading dependencies if they come from modules.
- You can write a custom build script to specify further linker args.
- MPI might not terminate if one of the MPI ranks panics, making the job hang until it hits the time limit.
- Use
panic = "abort"in theCargo.tomlfile (see here). The library doesn’t rely on unwinding panics, so aborting is fine.
- Use
- Use
export RUST_BACKTRACE=fullto get a stacktrace in case of errors
Future Work
There are a few things that would be nice to have for the library. The following list of features and ideas is sorted somewhat by priority. We can not guarantee that these points will ever be implemented by us, but if you feel like working on something, check out the Contribution guide in the repository.
- Tensor Network Optimization: It is common to perform optimizations before contracting tensor networks. For instance, by performing trivial contractions in advance, the search space for the pathfinding can be reduced.
- Slicing: Currently, the library can face memory limitations with large tensor networks. We want to use slicing to reduce the required memory in such cases, at the cost of having to do additional computations.
- Support for shared-memory parallelism: While we already use multithreading for single contractions, this will only be worth it for large tensors. It would be nice to utilize the full system at all times, even when the individual tensor contractions are not so large.
- GPU Support: GPUs are undoubtedly superior to CPU at matrix calculations. Having support for GPUs could speed up contractions tremendously.
- More I/O formats:
Support more file formats for importing and exporting. For example, store and load tensors using the
numpyfile format, import QIR and cirq files - Replacing the C++ dependencies: The installation of the library is cumbersome, particularily due to the C++ dependencies. If we found pure Rust replacements, the installation would be easier, build times probably faster, and we could likely support more plattforms.
- A Python interface: This would enable to use the library from Python.
- More tensor operations: Having e.g. options for singular value decomposition or other operations could make the library more useful for other use cases.