tnc/_tutorial/
hpc_overview.rs

1//! # High-Performance Computing (HPC)
2//!
3//! The library can run on multiple connected compute nodes such as in a HPC setting.
4//! It uses MPI for communication between nodes. To run the code with MPI, compile it
5//! (e.g. using `cargo build -r`) and then execute the binary found in the `target`
6//! folder using an MPI launcher (such as `mpirun`). For example:
7//! ```shell
8//! cargo build -r --example basic_usage
9//! mpirun -n 4 target/release/examples/basic_usage
10//! ```
11//!
12//! ## Parallelization
13//!
14//! ### Distributed memory parallelism
15//! The parallelization strategy currently is partitioning: Given a tensor network,
16//! it can be partitioned into multiple networks using the [`find_partitioning`]
17//! function which makes use of the hypergraph partitioning library KaHyPar. Then,
18//! the partitioned tensor network can be distributed to individual nodes using
19//! [`scatter_tensor_network`]. Each node can then indepently contract its part of
20//! the tensor network. No communication is needed during this time. Finally, the
21//! results are gathered in a parallel reduce operation, where the tensors are sent
22//! between nodes according to the contraction path, contracted locally and sent
23//! again, until the final contraction, which is guaranteed to happen on rank 0.
24//!
25//! ### Shared memory parallelism
26//! Given the large tensor sizes that can occur during contraction, we do not
27//! partition tensor networks further than the node level. Instead, we use the
28//! avilable cores to parallelize the individual tensor tensor contractions.
29//!
30//! ### What about slicing?
31//! Slicing is currently not supported, as it is not easy to combine it with
32//! partitioning. We hope to implement it at a later point.
33//!
34//! ## Dealing with memory limits
35//! Unfortunately, the high memory requirements of tensor network contraction are a
36//! general problem. One thing to try is to use less or more partitions, as there can
37//! be sweet spots -- more is not always better. In particular, the memory
38//! requirements can already be computed theoretically (with the functions in
39//! [`contraction_cost`]) before doing any actual run on a compute cluster. Other
40//! than that, the library unfortunately currently lacks support for slicing, which
41//! would allow trading compute time for lower memory requirements.
42#![allow(unused_imports)]
43use crate::contractionpath::contraction_cost;
44use crate::mpi::communication::scatter_tensor_network;
45use crate::tensornetwork::partitioning::find_partitioning;
46use crate::tensornetwork::tensor::Tensor;
47
48pub use crate::_tutorial as table_of_contents;