2.5. drrc.parallelreservoirs module
This module contains the implementation of the ParallelReservoirsBase class, three coresponding subclasses ParallelReservoirs, ParallelReservoirsFFT and ParallelReservoirsPCA and the ParallelReservoirsArguments dataclass.
- The Subclasses are used to predict high dimensional (spatially extended) systems using multiple Reservoir Computers in parallel:
The ParallelReservoirs class is used for high dimensional systems, without a dimension reduction.
The ParallelReservoirsFFT class is used for high dimensional systems, using only a selection of fft modes as a dimension reduction.
The ParallelReservoirsPCA class is used for high dimensional systems, using only a selection of pca modes as a dimension reduction.
The base class ParallelReservoirsBase
contains everything for parallel reservoir applications that predicts high dimensional (spatially extended) systems using multiple Reservoir Computers in parallel.
The classes use a class instance of ParallelReservoirsArguments
, which is a dataclass that represents the parameters for configuring the (parallel-) reservoirs.
Author: Luk Fleddermann, Gerrit Wellecke Date: 13.06.2024
- class ParallelReservoirsArguments(adjacency_degree: int, adjacency_dense: bool, adjacency_spectralradius: float, reservoir_leakage: float, reservoir_nodes: int, input_scaling: float, input_bias: float, spatial_shape: tuple[int, ...], system_variables: int, boundary_condition: str, parallelreservoirs_grid_shape: tuple[int, ...], parallelreservoirs_ghosts: int, dimensionreduction_fraction: float, training_includeinput: bool, training_regularization: float, training_output_bias: float, identical_inputmatrix: bool, identical_adjacencymatrix: bool, identical_outputmatrix: bool | str | tuple[int, ...])[source]
Bases:
object
Parallel-reservoir parameters.
- Parameters:
adjacency_degree (int) – average degree of the adjacency matrix
adjacency_dense (bool) – whether the adjacency matrix is sparse (False) or dense (True)
adjacency_spectralradius (float) – spectral radius, largest eigenvalue of the adjacency matrix
reservoir_leakage (float) – leakage the strength of memory a reservoir state remembers old excitations with (0.0 only driven from new data no memory, 1.0 no update of rs)
reservoir_nodes (int) – number of nodes in each of the parallel reservoirs
input_scaling (float) – input scaling the maximal absolute value of entries in the input matrix
input_bias (float) – scaling of the bias strength double the maximal absolute value of the bias input to a reservoir node, None defaults to inscale
spatial_shape (tuple[int, ...]) – shape of the input data, without boundary condition (for ex.
(128,)
in 1D case,(128,128)
in 2D case)system_variables (int) – number of variables in the system (data for one time step is of shape
(system_variables, *spatial_shape)
).boundary_condition (str) – the type of boundary condition to apply to the input data
parallelreservoirs_grid_shape (tuple[int, ...]) – the amount of reservoirs per dimension that should be used together as a multi-reservoir (for ex. [2,1] in 2D case for 2 reservoirs in x direction)
parallelreservoirs_ghosts (int) – number of variables that a reservoir sees from outside the region where its predicting for sync.
dimensionreduction_fraction (float) – fraction of variables that actually enters the reservoir after dimension reduction
training_includeinput (bool) – whether to also fit the input signal for predicting the next timestep
training_regularization (float) – regularization strength for the ridge regression
training_output_bias (float) – scaling of the output bias
identical_inputmatrix (bool) – whether we use different input matrices for each reservoir
identical_adjacency (bool) – whether we use different adjacency matrices for each reservoir
identical_outputmatrix (bool | str | tuple[int, ...]) – whether we train each domain with a separate reservoir (False) or one reservoir on all domains (‘combine_data’) or one reservoir on one domain (tuple of indices)
- classmethod from_config(conf: Config, job_idx: int, sub_idx: int)[source]
Make a ParallelReservoirsArguments object from a Config object.
This method generates a ParallelReservoirsArguments object based on the provided Config object. The Config object should have keys that match the names defined in this class.
- Parameters:
- Returns:
The generated ParallelReservoirsArguments object.
- Return type:
- classmethod from_dict(input_dict: dict)[source]
Generate parameters from dictionary
This function supports passing of a dictionary that contains more than the needed keys. In this case any additional information is simply ignored.
- Parameters:
input_dict – dictionary containing at least all the needed keys for the initialiser.
- class ParallelReservoirsBase(args: ParallelReservoirsArguments, **kwargs)[source]
Bases:
ABC
Base class for parallel reservoir applications. Implements the use of multiple Reservoir Computers in parallel to predict high dimensional (spatially extended) systems. Parallel is to be understood in terms of domain splitting of the input data.
Base class: Set up reservoir and parallel-reservoir parameters, initialize reservoirs and slices.
- Parameters:
args (ParallelReservoirsArguments) – All needed arguments in a dataclass object.
- train(input_training_data: ndarray, output_training_data: ndarray, transient_steps: int = 0) None [source]
Train the output matrix of all parallel reservoirs using ridge regression. Hereby it is differentiated between different methods:
Each reservoir is trained individually. Works for arbirary systems.
One reservoir (used for the predictions of all domains) is trained on one domain only. Works only for homogenious systems.
One reservoir (used for the predictions of all domains) is trained on all domains by combigning the data. Works only for homogenious systems.
- Parameters:
input_training_data – np.ndarray[float] input data for training, without bouncary condition ghostcells. Data needs to be of shape
(time_steps, variables, *spatial_shape)
.output_training_data – np.ndarray[float] output data for training, without bouncary condition ghostcells. Data needs to be of shape
(time_steps - transient_steps, variables, *spatial_shape)
.transient_steps – int, optional number of transient steps to be used for training (default: 0)
Notes
The training method includes a transient phase, where the reservoirs are driven by the input data without using the results for the training.
- iterative_predict(initial, max_steps, supervision_data=None, **kwargs) tuple[ndarray, ndarray | None, int] [source]
Iteratively predict a time series.
- Parameters:
initial (np.ndarray[float]) – Initial condition for the time series prediction.
max_steps (int) – Maximum number of steps to predict.
supervision_data (np.ndarray[float], optional) – Supervision data for evaluating the prediction. Defaults to None.
**kwargs – Additional keyword arguments for evaluating the prediction: Including an error function
error_function
of {‘NRMSE’}, the (temporal) mean of the norm of the datamean_norm
, a threshhold value for the error functionerror_stop
, and the number of extra steps to predict after the error threshold is exceededextra_steps
.
- Returns:
A tuple containing the predicted time series, the prediction errors (if supervision data is provided, else None), and the number of steps predicted.
- Return type:
Notes: Before the prediction the parallel reservoir states need to be adjusted to the state of the system to be predicted by using
reservoir_transient
.Warning: This function with supervision data is not tested nor fully implemented yet.
- reservoir_transient(input, predict_on_transient=False) None | ndarray [source]
Transient dynamics of the reservoirs. Updates the reservoir states and optionally predicts one step ahead on the transient dynamics, without reusing the predictions.
- Parameters:
input – np.ndarray[float] input data for the transient dynamics
predict_on_transient – bool if True, one step-ahead predictions are performed on transient data and returned, else only the reservoir states are updated
- _initialize_reservoirs(seed: int) list[ReservoirComputer] [source]
Initialize the reservoirs for the parallel reservoirs using
rc_grid_shape
, which specifies how many reservoirs in each dimension are used.- Parameters:
seed (int) – Seed for the random number generator, used for initializing the reservoirs.
- Returns:
The list of initialized reservoirs.
- Return type:
- _initialize_reservoir_slices() list[tuple[slice]] [source]
Prepare the reservoir slices for different reserovirs. First two dimensions are ignored, because they are
(time_steps, variables)
/ not spatial. In each spatial dimension of length \(dim\), the slices for reservoir number \(i\) (w.r.t. this dimension) of total number of reservoirs \(n\) is chosen to beslice(start : stop)
, where\[\text{start} = i \cdot \left\lfloor\frac{dim}{n}\right\rfloor\quad\text{and}\quad\text{stop} = (i+1) \cdot \left\lfloor\frac{dim}{n}\right\rfloor + 2 \cdot \text{ghosts}.\]
- _boundary_condition(data: ndarray, add_ghostcells: bool = False) ndarray [source]
- The function enforces boundary conditions on all spatial dimensions and returns the result.
Either boundary cells are used as ghost cells or ghost cells are added. Ghost cell values depend on the boundary condition.
- Parameters:
data (np.ndarray) – Input data in region to be predicted. Shape is
(time_steps, variables, *spatial_shape)
.add_ghostcells (bool) – If True, the boundary condition is fullfilled by adding ghostcells, else the outercells of the array are updated to fullfill the boundary condition.
- Returns:
The extended array of size
data.shape
ordata.shape+2*window_size
, depending onadd_ghostcells
.- Return type:
np.ndarray
Notes
The boundary condition is applied to spatial dimensions only. Therefore, the first and second dimension of the input
data
is ignored.Attention
This has not been tested for arbitrary dimensions. Only up to 2D. Might be useful to be precompiled(Numba-JIT), it runs in every time step
- static _evaluate_one_step(prediction: ndarray, supervision_data: ndarray, errrofunction='NRMSE', mean_norm: float = 1, error_stop: float = 1) tuple[bool, float] [source]
Evaluate the prediction of one time step using the error function
errorfunction
. Iferrorfunction ="NRMSE"
, the normalized root mean square error\[\frac{\|\vec{u}(t)-\vec{u}^{\mathrm{true}}(t)\|_2}{\langle\|\vec{u}^{\mathrm{true}}(t)\|^2\rangle_{\mathrm{t}}^{1/2}}\]is used.
- Parameters:
prediction (np.ndarray) – Prediction of the system at one time step. Shape is
(variables, *spatial_shape)
.supervision_data (np.ndarray) – Supervision data for the prediction. Shape is
(variables, *spatial_shape)
.errorfunction (str) – Error function to evaluate the prediction. Only the nomalized root mean square error
"NRMSE"
is implemented so far.mean_norm (float) – Mean norm of the supervision data, is used to normalize the error root mean square error.
error_stop (float) – Threshold for the error function. Iterative predictions are stopped if the error is above the threshold.
- Returns:
True if the error is below the threshold
error_stop
, else False.- Return type:
Warning: This method is not tested and might have errors!
- abstractmethod _get_input_length() int [source]
Depends on dimension reduction method.
Get the length of the input data for each parallel object of class
ReservoirComputer
. Depends on Dimension Reduction fraction.- Returns:
The length of the input data for each
ReservoirComputer
- Return type:
- abstractmethod _transform_data(data: ndarray, fraction: float) ndarray [source]
Depends on dimension reduction method.
Transform the data to the shape used in the reservoirs. Data of shape
(time_steps, variables, *spatial_shape)
is transformed to(time, res_variables)
, where all variables and spatial shapes are used in the dimension reduction and flatted into one dimension.- Parameters:
data (np.ndarray) – Data to be transformed. First dimension is temporal and not transformed. Second dimension chooses different variables and will be used in dimension reduction. All others are spatial dimensions and are flatted as well.
fraction (float) – Fraction of variables that actually enters the reservoir after dimension reduction.
- Returns:
Transformed data. First dimension is temporal and not transformed. Rest is flattened.
- Return type:
np.ndarray
- abstractmethod _inv_transform_data(data: ndarray) ndarray [source]
Depends on dimension reduction method.
Inverse dimension reduction transformation of the flatted reservoir data
data
. The output is transformed to the geometric shape of the output prediction. First dimension of inputdata
is temporal and not transformed. Second dimension is split into different variables and spatial dimensions.- Parameters:
data (np.ndarray) – Data to be transformed. First dimension is temporal and not transformed.
- Returns:
Inverse transformed data. First dimension is temporal and not transformed. Rest is reshaped to number of variables and spatial shape.
- Return type:
np.ndarray
- save(filename: str) None [source]
Save a ParallelReservoir to pkl file
A possible use case of this is to conserve a trained multi-reservoir for later.
- Parameters:
filename (str) – name of file to generate
Warning: This function might be depracated. Else, might need to be put to derived classes.
- load(filename: str) None [source]
Load a ParallelReservoir from pkl file
A possible use case of this is to load a trained model and perform further predictions without having to do the training again.
- Parameters:
filename (str) – name of file to read in
Warning: This function might be depracated. Else, might need to be put to derived classes.
- class ParallelReservoirs(*, Parameter: ParallelReservoirsArguments, **kwargs)[source]
Bases:
ParallelReservoirsBase
Use multiple Reservoir Computers in parallel to predict high dimensional systems without dimensionality reduction. Parallel is to be understood in terms of domain splitting of the input data. This class uses the base class
ParallelReservoirsBase
, which initiates Reservoirs of the classReservoirComputer
and handles the training and prediction of the parallel reservoirs.Initialization:
Only base class setup is done. No further initialization is needed.
Base class: Set up reservoir and parallel-reservoir parameters, initialize reservoirs and slices.
- Parameters:
args (ParallelReservoirsArguments) – All needed arguments in a dataclass object.
- _get_input_length() int [source]
Get the length of the input data for each parallel object of class
ReservoirComputer
. Without dimension reduction, the input length is\[\text{input length} = \prod_{i=1}^{d} \left( \frac{N_i}{R_i} + 2 g\right)\]where \(d\) is the number of spatial dimensions of the predicted system, \(N_i\) is the number of support points in the \(i\)-th dimension, \(R_i\) is the number of parallel reservoirs in the \(i\)-th dimension, \(g\) is the number of ghost cells used aroud the predicted region of each reservoir.
- Returns:
The length of the input data for each
ReservoirComputer
.- Return type:
- _transform_data(data: ndarray, fraction: float) ndarray [source]
Transform the data to the shape used in the reservoirs.
- Parameters:
data (np.ndarray) – Data to be transformed. First dimension is temporal and not transformed. Second dimension chooses different variables and will be flattend. All others are spatial dimensions and are flatted as well.
fraction (float) – Fraction of variables that actually enters the reservoir after dimension reduction. Should be one. Is not used.
- Returns:
Transformed data. First dimension is temporal and not transformed. Rest is flattened.
- Return type:
np.ndarray
- _inv_transform_data(data) ndarray [source]
Inverse transformation of the data flatted reservoir data to the geometric shape. First dimension of input is temporal and not transformed. Second dimension is split into different variables and spatial dimensions.
- Parameters:
data (np.ndarray) – Data to be transformed. First dimension is temporal and not transformed.
- Returns:
Transformed data. First dimension is temporal and not transformed. Rest is reshaped to number of variables and spatial shape.
- Return type:
np.ndarray
- class ParallelReservoirsFFT(*, Parameter: ParallelReservoirsArguments, prediction_model: str, **kwargs)[source]
Bases:
ParallelReservoirsBase
Use multiple Reservoir Computers in parallel to predict high dimensional systems using largest modes of an FFT for dimensionality reduction. Parallel is to be understood in terms of domain splitting of the input data. This class uses the base class
ParallelReservoirsBase
, which initiates Reservoirs of the classReservoirComputer
and handles the training and prediction of the parallel reservoirs.Initialization:
Calculates or loads the largest FFT modes for the given datatype, i.e. model and parameters, and input and output dimension for each parallel Reservoir Computer.
Base class: Set up reservoir and parallel-reservoir parameters, initialize reservoirs and slices.
- Parameters:
args (ParallelReservoirsArguments) – All needed arguments in a dataclass object.
prediction_model (str) – Name of the model to be predicted. Used to load or choose FFT modes with largest amplitude.
**kwargs –
dict nr_fft_datasets (int):
The number of training data sets used to train the FFT. If not used all 10 training data sets are used. This is useful for computations with not enough memory for the full training.
Warning
The same modes are used everywhere and choosen on all training data sets and all domains. Hence, the class only works if two conditions are met: 1. The predicted Systems are homogeneous. 2. All data sets need to be sampled from the same attractor.
- _get_input_length() int [source]
Get the length of the input data for each parallel object of class
ReservoirComputer
. With dimension reduction, the input length is\[\text{input length} = \left\lfloor f \times \prod_{i=1}^{d} \left( \frac{N_i}{R_i} + 2g \right) \right\rfloor\]where \(f\) is the fraction
self.dimensionreduction_fraction
of the dimensions used, \(d\) is the number of spatial dimensions of the predicted system, \(N_i\) is the number of support points in the \(i\)-th dimension, \(R_i\) is the number of parallel reservoirs in the \(i\)-th dimension, \(g\) is the number of ghost cells used aroud the predicted region of each reservoir.- Returns:
The length of the input data for each
ReservoirComputer
.- Return type:
- _transform_data(data: ndarray, fraction: float) ndarray [source]
Transform and reduce the data using the largest modes of the FFT. Flattening all but the first dimension to the shape used in the reservoirs.
- Parameters:
data (np.ndarray) – Data to be transformed. First dimension is temporal and not transformed. Second dimension chooses different variables, all others are spatial dimensions.
fraction (float) – Fraction of variables that actually enters the reservoir after dimension reduction.
- Returns:
Transformed data. First dimension is temporal and not transformed. Rest is flattened.
- Return type:
np.ndarray
Warning: Tested only in 1d
- _inv_transform_data(data: ndarray) ndarray [source]
Inverse FFT transformation of the data flatted reservoir output (all components need to be predicted and are realigned to spatial shape). Transformed back to the geometric shape. This includes the boundary which will not be used when stitching the reservoir predictions together. First dimension of input is temporal and not transformed. Second dimension is split into different variables and spatial dimensions.
- Parameters:
data (np.ndarray) – Data to be transformed, of shape
(time_steps, reservoir_variables)
. First dimension is temporal and not transformed.- Returns:
Transformed data of shape
(time_steps, variables, *spatial_shape)
. First dimension is temporal and not transformed. Rest is reshaped to number of variables and spatial shape.- Return type:
np.ndarray
Notes
The input data is the output of one reservoir, which corresponds to one spatial domain.
Warning: Tested only in 1d.
- class ParallelReservoirsPCA(*, Parameter: ParallelReservoirsArguments, prediction_model: str, **kwargs)[source]
Bases:
ParallelReservoirsBase
Use multiple Reservoir Computers in parallel to predict high dimensional systems with using a PCA for dimensionality reduction. Parallel is to be understood in terms of domain splitting of the input data. This class uses the base class
ParallelReservoirsBase
, which initiates Reservoirs of the classReservoirComputer
and handles the training and prediction of the parallel reservoirs.Initialization:
Trains or loads the PCA object for the given datatype, i.e. model and parameters, and output dimension for each parallel Reservoir Computer.
Due to memory constrains, the pca-training data is taken from 10 training data sets and only each second time step is used. In addition for multiple parallel reservoirs the training data is sampled from all domains. In this case, the number of timesteps per domain is shortened (devided by number of domains) to keep the number of timesteps independet of the number of parallel reservoirs. Computational Complexity: \(\mathcal{O}(n\cdot d^2)\), where \(n\) is the number of samples and \(d\) is the number of dimensions.
Base class: Set up reservoir and parallel-reservoir parameters, initialize reservoirs and slices.
- Parameters:
args (ParallelReservoirsArguments) – All needed arguments in a dataclass object.
prediction_model (str) – Name of the model to be predicted. Used to load or train the PCA object.
**kwargs –
dict nr_pca_trainings (int):
The number of training data sets used to train the PCA. If not used all 10 training data sets are used. This is useful for computations with not enough memory for the full training.
Warning
The same PCA is used everywhere and trained on all training data sets and all domains. Hence, the class only works if two conditions are met: 1. The predicted Systems are homogeneous. 2. All data sets need to be sampled from the same attractor.
- Warining:
Training PCAs with many parallel reservoirs and large ghost cells can be memory intensive.
- _get_input_length() int [source]
Get the length of the input data for each parallel object of class
ReservoirComputer
. With dimension reduction, the input length is\[\text{input length} = \left\lfloor f \times \prod_{i=1}^{d} \left( \frac{N_i}{R_i} + 2g \right) \right\rfloor\]where \(f\) is the fraction
self.dimensionreduction_fraction
of the dimensions used, \(d\) is the number of spatial dimensions of the predicted system, \(N_i\) is the number of support points in the \(i\)-th dimension, \(R_i\) is the number of parallel reservoirs in the \(i\)-th dimension, \(g\) is the number of ghost cells used aroud the predicted region of each reservoir.- Returns:
The length of the input data for each
ReservoirComputer
.- Return type:
- _transform_data(data: ndarray, fraction: float) ndarray [source]
Transform and reduce the data using the largest explained variances of the PCA. Flattening all but the first dimension to the shape used in the reservoirs.
- Parameters:
data (np.ndarray) – Data to be transformed. First dimension is temporal and not transformed. Second dimension chooses different variables, all others are spatial dimensions.
fraction (float) – Fraction of variables that actually enters the reservoir after dimension reduction.
- Returns:
Transformed data. First dimension is temporal and not transformed. Rest is flattened.
- Return type:
np.ndarray
Warning: Not implemented
- _inv_transform_data(data: ndarray) ndarray [source]
Inverse PCA transformation of the data flatted reservoir output (all components need to be predicted). Transformed back to the geometric shape. This includes the boundary which will not be used when stitching the reservoir predictions together. First dimension of input is temporal and not transformed. Second dimension is split into different variables and spatial dimensions.
- Parameters:
data (np.ndarray) – Data to be transformed, of shape
(time_steps, reservoir_variables)
. First dimension is temporal and not transformed.- Returns:
Transformed data of shape
(time_steps, variables, *spatial_shape)
. First dimension is temporal and not transformed. Rest is reshaped to number of variables and spatial shape.- Return type:
np.ndarray
Notes
The input data is the output of one reservoir, which corresponds to one spatial domain.