2.5. drrc.parallelreservoirs module

This module contains the implementation of the ParallelReservoirsBase class, three coresponding subclasses ParallelReservoirs, ParallelReservoirsFFT and ParallelReservoirsPCA and the ParallelReservoirsArguments dataclass.

The Subclasses are used to predict high dimensional (spatially extended) systems using multiple Reservoir Computers in parallel:
  1. The ParallelReservoirs class is used for high dimensional systems, without a dimension reduction.

  2. The ParallelReservoirsFFT class is used for high dimensional systems, using only a selection of fft modes as a dimension reduction.

  3. The ParallelReservoirsPCA class is used for high dimensional systems, using only a selection of pca modes as a dimension reduction.

The base class ParallelReservoirsBase contains everything for parallel reservoir applications that predicts high dimensional (spatially extended) systems using multiple Reservoir Computers in parallel. The classes use a class instance of ParallelReservoirsArguments, which is a dataclass that represents the parameters for configuring the (parallel-) reservoirs.

Author: Luk Fleddermann, Gerrit Wellecke Date: 13.06.2024

class ParallelReservoirsArguments(adjacency_degree: int, adjacency_dense: bool, adjacency_spectralradius: float, reservoir_leakage: float, reservoir_nodes: int, input_scaling: float, input_bias: float, spatial_shape: tuple[int, ...], system_variables: int, boundary_condition: str, parallelreservoirs_grid_shape: tuple[int, ...], parallelreservoirs_ghosts: int, dimensionreduction_fraction: float, training_includeinput: bool, training_regularization: float, training_output_bias: float, identical_inputmatrix: bool, identical_adjacencymatrix: bool, identical_outputmatrix: bool | str | tuple[int, ...])[source]

Bases: object

Parallel-reservoir parameters.

Parameters:
  • adjacency_degree (int) – average degree of the adjacency matrix

  • adjacency_dense (bool) – whether the adjacency matrix is sparse (False) or dense (True)

  • adjacency_spectralradius (float) – spectral radius, largest eigenvalue of the adjacency matrix

  • reservoir_leakage (float) – leakage the strength of memory a reservoir state remembers old excitations with (0.0 only driven from new data no memory, 1.0 no update of rs)

  • reservoir_nodes (int) – number of nodes in each of the parallel reservoirs

  • input_scaling (float) – input scaling the maximal absolute value of entries in the input matrix

  • input_bias (float) – scaling of the bias strength double the maximal absolute value of the bias input to a reservoir node, None defaults to inscale

  • spatial_shape (tuple[int, ...]) – shape of the input data, without boundary condition (for ex. (128,) in 1D case, (128,128) in 2D case)

  • system_variables (int) – number of variables in the system (data for one time step is of shape (system_variables, *spatial_shape)).

  • boundary_condition (str) – the type of boundary condition to apply to the input data

  • parallelreservoirs_grid_shape (tuple[int, ...]) – the amount of reservoirs per dimension that should be used together as a multi-reservoir (for ex. [2,1] in 2D case for 2 reservoirs in x direction)

  • parallelreservoirs_ghosts (int) – number of variables that a reservoir sees from outside the region where its predicting for sync.

  • dimensionreduction_fraction (float) – fraction of variables that actually enters the reservoir after dimension reduction

  • training_includeinput (bool) – whether to also fit the input signal for predicting the next timestep

  • training_regularization (float) – regularization strength for the ridge regression

  • training_output_bias (float) – scaling of the output bias

  • identical_inputmatrix (bool) – whether we use different input matrices for each reservoir

  • identical_adjacency (bool) – whether we use different adjacency matrices for each reservoir

  • identical_outputmatrix (bool | str | tuple[int, ...]) – whether we train each domain with a separate reservoir (False) or one reservoir on all domains (‘combine_data’) or one reservoir on one domain (tuple of indices)

classmethod from_config(conf: Config, job_idx: int, sub_idx: int)[source]

Make a ParallelReservoirsArguments object from a Config object.

This method generates a ParallelReservoirsArguments object based on the provided Config object. The Config object should have keys that match the names defined in this class.

Parameters:
  • conf (Config) – The Config object corresponding to the YAML for the parameter scan.

  • idx (int) – The index of the current parameter set.

Returns:

The generated ParallelReservoirsArguments object.

Return type:

ParallelReservoirsArguments

classmethod from_dict(input_dict: dict)[source]

Generate parameters from dictionary

This function supports passing of a dictionary that contains more than the needed keys. In this case any additional information is simply ignored.

Parameters:

input_dict – dictionary containing at least all the needed keys for the initialiser.

class ParallelReservoirsBase(args: ParallelReservoirsArguments, **kwargs)[source]

Bases: ABC

Base class for parallel reservoir applications. Implements the use of multiple Reservoir Computers in parallel to predict high dimensional (spatially extended) systems. Parallel is to be understood in terms of domain splitting of the input data.

  • Base class: Set up reservoir and parallel-reservoir parameters, initialize reservoirs and slices.

Parameters:

args (ParallelReservoirsArguments) – All needed arguments in a dataclass object.

train(input_training_data: ndarray, output_training_data: ndarray, transient_steps: int = 0) None[source]

Train the output matrix of all parallel reservoirs using ridge regression. Hereby it is differentiated between different methods:

  1. Each reservoir is trained individually. Works for arbirary systems.

  2. One reservoir (used for the predictions of all domains) is trained on one domain only. Works only for homogenious systems.

  3. One reservoir (used for the predictions of all domains) is trained on all domains by combigning the data. Works only for homogenious systems.

Parameters:
  • input_training_data – np.ndarray[float] input data for training, without bouncary condition ghostcells. Data needs to be of shape (time_steps, variables, *spatial_shape).

  • output_training_data – np.ndarray[float] output data for training, without bouncary condition ghostcells. Data needs to be of shape (time_steps - transient_steps, variables, *spatial_shape).

  • transient_steps – int, optional number of transient steps to be used for training (default: 0)

Notes

The training method includes a transient phase, where the reservoirs are driven by the input data without using the results for the training.

iterative_predict(initial, max_steps, supervision_data=None, **kwargs) tuple[ndarray, ndarray | None, int][source]

Iteratively predict a time series.

Parameters:
  • initial (np.ndarray[float]) – Initial condition for the time series prediction.

  • max_steps (int) – Maximum number of steps to predict.

  • supervision_data (np.ndarray[float], optional) – Supervision data for evaluating the prediction. Defaults to None.

  • **kwargs – Additional keyword arguments for evaluating the prediction: Including an error function error_function of {‘NRMSE’}, the (temporal) mean of the norm of the data mean_norm, a threshhold value for the error function error_stop, and the number of extra steps to predict after the error threshold is exceeded extra_steps.

Returns:

A tuple containing the predicted time series, the prediction errors (if supervision data is provided, else None), and the number of steps predicted.

Return type:

tuple[np.ndarray, np.ndarray | None, int]

Notes: Before the prediction the parallel reservoir states need to be adjusted to the state of the system to be predicted by using reservoir_transient.

Warning: This function with supervision data is not tested nor fully implemented yet.

reservoir_transient(input, predict_on_transient=False) None | ndarray[source]

Transient dynamics of the reservoirs. Updates the reservoir states and optionally predicts one step ahead on the transient dynamics, without reusing the predictions.

Parameters:
  • input – np.ndarray[float] input data for the transient dynamics

  • predict_on_transient – bool if True, one step-ahead predictions are performed on transient data and returned, else only the reservoir states are updated

_initialize_reservoirs(seed: int) list[ReservoirComputer][source]

Initialize the reservoirs for the parallel reservoirs using rc_grid_shape, which specifies how many reservoirs in each dimension are used.

Parameters:

seed (int) – Seed for the random number generator, used for initializing the reservoirs.

Returns:

The list of initialized reservoirs.

Return type:

list[ReservoirComputer]

_initialize_reservoir_slices() list[tuple[slice]][source]

Prepare the reservoir slices for different reserovirs. First two dimensions are ignored, because they are (time_steps, variables)/ not spatial. In each spatial dimension of length \(dim\), the slices for reservoir number \(i\) (w.r.t. this dimension) of total number of reservoirs \(n\) is chosen to be slice(start : stop), where

\[\text{start} = i \cdot \left\lfloor\frac{dim}{n}\right\rfloor\quad\text{and}\quad\text{stop} = (i+1) \cdot \left\lfloor\frac{dim}{n}\right\rfloor + 2 \cdot \text{ghosts}.\]
Returns:

List of slices for each reservoir

Return type:

list[tuple[slice]]

_boundary_condition(data: ndarray, add_ghostcells: bool = False) ndarray[source]
The function enforces boundary conditions on all spatial dimensions and returns the result.

Either boundary cells are used as ghost cells or ghost cells are added. Ghost cell values depend on the boundary condition.

Parameters:
  • data (np.ndarray) – Input data in region to be predicted. Shape is (time_steps, variables, *spatial_shape).

  • add_ghostcells (bool) – If True, the boundary condition is fullfilled by adding ghostcells, else the outercells of the array are updated to fullfill the boundary condition.

Returns:

The extended array of size data.shape or data.shape+2*window_size, depending on add_ghostcells.

Return type:

np.ndarray

Notes

The boundary condition is applied to spatial dimensions only. Therefore, the first and second dimension of the input data is ignored.

Attention

This has not been tested for arbitrary dimensions. Only up to 2D. Might be useful to be precompiled(Numba-JIT), it runs in every time step

static _evaluate_one_step(prediction: ndarray, supervision_data: ndarray, errrofunction='NRMSE', mean_norm: float = 1, error_stop: float = 1) tuple[bool, float][source]

Evaluate the prediction of one time step using the error function errorfunction. If errorfunction ="NRMSE", the normalized root mean square error

\[\frac{\|\vec{u}(t)-\vec{u}^{\mathrm{true}}(t)\|_2}{\langle\|\vec{u}^{\mathrm{true}}(t)\|^2\rangle_{\mathrm{t}}^{1/2}}\]

is used.

Parameters:
  • prediction (np.ndarray) – Prediction of the system at one time step. Shape is (variables, *spatial_shape).

  • supervision_data (np.ndarray) – Supervision data for the prediction. Shape is (variables, *spatial_shape).

  • errorfunction (str) – Error function to evaluate the prediction. Only the nomalized root mean square error "NRMSE" is implemented so far.

  • mean_norm (float) – Mean norm of the supervision data, is used to normalize the error root mean square error.

  • error_stop (float) – Threshold for the error function. Iterative predictions are stopped if the error is above the threshold.

Returns:

True if the error is below the threshold error_stop, else False.

Return type:

bool

Warning: This method is not tested and might have errors!

abstractmethod _get_input_length() int[source]

Depends on dimension reduction method.

Get the length of the input data for each parallel object of class ReservoirComputer. Depends on Dimension Reduction fraction.

Returns:

The length of the input data for each ReservoirComputer

Return type:

int

abstractmethod _transform_data(data: ndarray, fraction: float) ndarray[source]

Depends on dimension reduction method.

Transform the data to the shape used in the reservoirs. Data of shape (time_steps, variables, *spatial_shape) is transformed to (time, res_variables), where all variables and spatial shapes are used in the dimension reduction and flatted into one dimension.

Parameters:
  • data (np.ndarray) – Data to be transformed. First dimension is temporal and not transformed. Second dimension chooses different variables and will be used in dimension reduction. All others are spatial dimensions and are flatted as well.

  • fraction (float) – Fraction of variables that actually enters the reservoir after dimension reduction.

Returns:

Transformed data. First dimension is temporal and not transformed. Rest is flattened.

Return type:

np.ndarray

abstractmethod _inv_transform_data(data: ndarray) ndarray[source]

Depends on dimension reduction method.

Inverse dimension reduction transformation of the flatted reservoir data data. The output is transformed to the geometric shape of the output prediction. First dimension of input data is temporal and not transformed. Second dimension is split into different variables and spatial dimensions.

Parameters:

data (np.ndarray) – Data to be transformed. First dimension is temporal and not transformed.

Returns:

Inverse transformed data. First dimension is temporal and not transformed. Rest is reshaped to number of variables and spatial shape.

Return type:

np.ndarray

save(filename: str) None[source]

Save a ParallelReservoir to pkl file

A possible use case of this is to conserve a trained multi-reservoir for later.

Parameters:

filename (str) – name of file to generate

Warning: This function might be depracated. Else, might need to be put to derived classes.

load(filename: str) None[source]

Load a ParallelReservoir from pkl file

A possible use case of this is to load a trained model and perform further predictions without having to do the training again.

Parameters:

filename (str) – name of file to read in

Warning: This function might be depracated. Else, might need to be put to derived classes.

class ParallelReservoirs(*, Parameter: ParallelReservoirsArguments, **kwargs)[source]

Bases: ParallelReservoirsBase

Use multiple Reservoir Computers in parallel to predict high dimensional systems without dimensionality reduction. Parallel is to be understood in terms of domain splitting of the input data. This class uses the base class ParallelReservoirsBase, which initiates Reservoirs of the class ReservoirComputer and handles the training and prediction of the parallel reservoirs.

Initialization:

  • Only base class setup is done. No further initialization is needed.

  • Base class: Set up reservoir and parallel-reservoir parameters, initialize reservoirs and slices.

Parameters:

args (ParallelReservoirsArguments) – All needed arguments in a dataclass object.

_get_input_length() int[source]

Get the length of the input data for each parallel object of class ReservoirComputer. Without dimension reduction, the input length is

\[\text{input length} = \prod_{i=1}^{d} \left( \frac{N_i}{R_i} + 2 g\right)\]

where \(d\) is the number of spatial dimensions of the predicted system, \(N_i\) is the number of support points in the \(i\)-th dimension, \(R_i\) is the number of parallel reservoirs in the \(i\)-th dimension, \(g\) is the number of ghost cells used aroud the predicted region of each reservoir.

Returns:

The length of the input data for each ReservoirComputer.

Return type:

int

_transform_data(data: ndarray, fraction: float) ndarray[source]

Transform the data to the shape used in the reservoirs.

Parameters:
  • data (np.ndarray) – Data to be transformed. First dimension is temporal and not transformed. Second dimension chooses different variables and will be flattend. All others are spatial dimensions and are flatted as well.

  • fraction (float) – Fraction of variables that actually enters the reservoir after dimension reduction. Should be one. Is not used.

Returns:

Transformed data. First dimension is temporal and not transformed. Rest is flattened.

Return type:

np.ndarray

_inv_transform_data(data) ndarray[source]

Inverse transformation of the data flatted reservoir data to the geometric shape. First dimension of input is temporal and not transformed. Second dimension is split into different variables and spatial dimensions.

Parameters:

data (np.ndarray) – Data to be transformed. First dimension is temporal and not transformed.

Returns:

Transformed data. First dimension is temporal and not transformed. Rest is reshaped to number of variables and spatial shape.

Return type:

np.ndarray

class ParallelReservoirsFFT(*, Parameter: ParallelReservoirsArguments, prediction_model: str, **kwargs)[source]

Bases: ParallelReservoirsBase

Use multiple Reservoir Computers in parallel to predict high dimensional systems using largest modes of an FFT for dimensionality reduction. Parallel is to be understood in terms of domain splitting of the input data. This class uses the base class ParallelReservoirsBase, which initiates Reservoirs of the class ReservoirComputer and handles the training and prediction of the parallel reservoirs.

Initialization:

  • Calculates or loads the largest FFT modes for the given datatype, i.e. model and parameters, and input and output dimension for each parallel Reservoir Computer.

  • Base class: Set up reservoir and parallel-reservoir parameters, initialize reservoirs and slices.

Parameters:
  • args (ParallelReservoirsArguments) – All needed arguments in a dataclass object.

  • prediction_model (str) – Name of the model to be predicted. Used to load or choose FFT modes with largest amplitude.

  • **kwargs

    dict nr_fft_datasets (int):

    The number of training data sets used to train the FFT. If not used all 10 training data sets are used. This is useful for computations with not enough memory for the full training.

Warning

The same modes are used everywhere and choosen on all training data sets and all domains. Hence, the class only works if two conditions are met: 1. The predicted Systems are homogeneous. 2. All data sets need to be sampled from the same attractor.

_get_input_length() int[source]

Get the length of the input data for each parallel object of class ReservoirComputer. With dimension reduction, the input length is

\[\text{input length} = \left\lfloor f \times \prod_{i=1}^{d} \left( \frac{N_i}{R_i} + 2g \right) \right\rfloor\]

where \(f\) is the fraction self.dimensionreduction_fraction of the dimensions used, \(d\) is the number of spatial dimensions of the predicted system, \(N_i\) is the number of support points in the \(i\)-th dimension, \(R_i\) is the number of parallel reservoirs in the \(i\)-th dimension, \(g\) is the number of ghost cells used aroud the predicted region of each reservoir.

Returns:

The length of the input data for each ReservoirComputer.

Return type:

int

_transform_data(data: ndarray, fraction: float) ndarray[source]

Transform and reduce the data using the largest modes of the FFT. Flattening all but the first dimension to the shape used in the reservoirs.

Parameters:
  • data (np.ndarray) – Data to be transformed. First dimension is temporal and not transformed. Second dimension chooses different variables, all others are spatial dimensions.

  • fraction (float) – Fraction of variables that actually enters the reservoir after dimension reduction.

Returns:

Transformed data. First dimension is temporal and not transformed. Rest is flattened.

Return type:

np.ndarray

Warning: Tested only in 1d

_inv_transform_data(data: ndarray) ndarray[source]

Inverse FFT transformation of the data flatted reservoir output (all components need to be predicted and are realigned to spatial shape). Transformed back to the geometric shape. This includes the boundary which will not be used when stitching the reservoir predictions together. First dimension of input is temporal and not transformed. Second dimension is split into different variables and spatial dimensions.

Parameters:

data (np.ndarray) – Data to be transformed, of shape (time_steps, reservoir_variables). First dimension is temporal and not transformed.

Returns:

Transformed data of shape (time_steps, variables, *spatial_shape). First dimension is temporal and not transformed. Rest is reshaped to number of variables and spatial shape.

Return type:

np.ndarray

Notes

The input data is the output of one reservoir, which corresponds to one spatial domain.

Warning: Tested only in 1d.

class ParallelReservoirsPCA(*, Parameter: ParallelReservoirsArguments, prediction_model: str, **kwargs)[source]

Bases: ParallelReservoirsBase

Use multiple Reservoir Computers in parallel to predict high dimensional systems with using a PCA for dimensionality reduction. Parallel is to be understood in terms of domain splitting of the input data. This class uses the base class ParallelReservoirsBase, which initiates Reservoirs of the class ReservoirComputer and handles the training and prediction of the parallel reservoirs.

Initialization:

  • Trains or loads the PCA object for the given datatype, i.e. model and parameters, and output dimension for each parallel Reservoir Computer.

Due to memory constrains, the pca-training data is taken from 10 training data sets and only each second time step is used. In addition for multiple parallel reservoirs the training data is sampled from all domains. In this case, the number of timesteps per domain is shortened (devided by number of domains) to keep the number of timesteps independet of the number of parallel reservoirs. Computational Complexity: \(\mathcal{O}(n\cdot d^2)\), where \(n\) is the number of samples and \(d\) is the number of dimensions.

  • Base class: Set up reservoir and parallel-reservoir parameters, initialize reservoirs and slices.

Parameters:
  • args (ParallelReservoirsArguments) – All needed arguments in a dataclass object.

  • prediction_model (str) – Name of the model to be predicted. Used to load or train the PCA object.

  • **kwargs

    dict nr_pca_trainings (int):

    The number of training data sets used to train the PCA. If not used all 10 training data sets are used. This is useful for computations with not enough memory for the full training.

Warning

The same PCA is used everywhere and trained on all training data sets and all domains. Hence, the class only works if two conditions are met: 1. The predicted Systems are homogeneous. 2. All data sets need to be sampled from the same attractor.

Warining:

Training PCAs with many parallel reservoirs and large ghost cells can be memory intensive.

_get_input_length() int[source]

Get the length of the input data for each parallel object of class ReservoirComputer. With dimension reduction, the input length is

\[\text{input length} = \left\lfloor f \times \prod_{i=1}^{d} \left( \frac{N_i}{R_i} + 2g \right) \right\rfloor\]

where \(f\) is the fraction self.dimensionreduction_fraction of the dimensions used, \(d\) is the number of spatial dimensions of the predicted system, \(N_i\) is the number of support points in the \(i\)-th dimension, \(R_i\) is the number of parallel reservoirs in the \(i\)-th dimension, \(g\) is the number of ghost cells used aroud the predicted region of each reservoir.

Returns:

The length of the input data for each ReservoirComputer.

Return type:

int

_transform_data(data: ndarray, fraction: float) ndarray[source]

Transform and reduce the data using the largest explained variances of the PCA. Flattening all but the first dimension to the shape used in the reservoirs.

Parameters:
  • data (np.ndarray) – Data to be transformed. First dimension is temporal and not transformed. Second dimension chooses different variables, all others are spatial dimensions.

  • fraction (float) – Fraction of variables that actually enters the reservoir after dimension reduction.

Returns:

Transformed data. First dimension is temporal and not transformed. Rest is flattened.

Return type:

np.ndarray

Warning: Not implemented

_inv_transform_data(data: ndarray) ndarray[source]

Inverse PCA transformation of the data flatted reservoir output (all components need to be predicted). Transformed back to the geometric shape. This includes the boundary which will not be used when stitching the reservoir predictions together. First dimension of input is temporal and not transformed. Second dimension is split into different variables and spatial dimensions.

Parameters:

data (np.ndarray) – Data to be transformed, of shape (time_steps, reservoir_variables). First dimension is temporal and not transformed.

Returns:

Transformed data of shape (time_steps, variables, *spatial_shape). First dimension is temporal and not transformed. Rest is reshaped to number of variables and spatial shape.

Return type:

np.ndarray

Notes

The input data is the output of one reservoir, which corresponds to one spatial domain.