gents.evaluation.model_free package

Module contents

class gents.evaluation.model_free.WassersteinDistances(original_data: ndarray, other_data: ndarray, normalisation: str | None = 'none', seed: int | None = None)

Bases: object

Calculate Wasserstein distance of two datasets in various ways. Addapted form https://gitlab.developers.cam.ac.uk/maths/cia/covid-19-projects/missing_data_fitting_quality

Parameters:

original_data (np.ndarray) – Original data set, an (n, d) ndarray.
other_data (np.ndarray) – Other data set, which might be imputed or simulated data, also an (n, d) ndarray.
normalisation (str) – Normalisation. Method of normalising data. If ‘none’, no normalisation will be used. If ‘standatdise’, then standardise the data by dividing by the standard deviation of the original data. (There is no need to subtract the mean, as this does not affect the Wasserstein distance.). Defaults to None.
seed (int) – Random seed. Defaults to None

directional_distance(direction: ndarray) → float

Calculate the dataset distance in a specified direction.

This projects the two datasets onto the specified direction (that is, a 1-dimensional subspace), and calculates the Wasserstein distance between the two resulting distributions.

Parameters:: direction (np.array) – The direction in which to calculate the W_2 distance between the datasets.
Returns:: distance – The calculated W_2^2 distance.
Return type:: float

feature_distance(feature: int) → float

Calculate the dataset distance for a specific feature.

This calculates the Wasserstein 2-distance between the specified feature in the two datasets.

Parameters:: feature (int) – The column number of the feature to consider: 0, 1, 2, …, num_fields - 1.
Returns:: distance – The Wasserstein 2-distance.
Return type:: float

get_marginal_directions() → list[ndarray]

Get marginal directions for an experiment.

These are just the standard basis vectors.

Returns:: directions – A list of standard unit vectors.
Return type:: list[np.ndarray]

get_random_directions(n_directions: int) → list[ndarray]

Get random directions for an experiment.

Parameters:: n_directions (int) – The number of directions to produce.
Returns:: directions – A list of unit vectors specifying the directions to use. The results will be given in the same order.
Return type:: list[np.ndarray]

marginal_distances() → ndarray

Calculate the marginal Wasserstein distances between datasets.

Returns:: distribution of Wasserstein distances over all features.
Return type:: np.ndarray

random_direction(dim: int) → ndarray

Generate a unit vector in a random direction.

Parameters:: dim (int) – Dimension of vector to be generated.
Returns:: unit_vector – A unit vector of shape (dim,).
Return type:: np.ndarray

sliced_distances(num_directions: int) → ndarray

Calculate the sliced Wasserstein distance between datasets.

Parameters:: num_directions (int) – Number of directions in the sliced Wasserstein estimation.
Returns:: distribution of Wasserstein distances over all directions.
Return type:: np.ndarray

gents.evaluation.model_free.crps(y_true: ndarray, y_pred: ndarray, quantiles=array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])) → float

Calculating continuous ranked probability score (CRPS) through Multi-Quantile loss.

Adapated from neuralforecast.losses.numpy.

Parameters:

y_true (np.ndarray) – Ground truth time series, in shape of [B, T, C]
y_pred (np.ndarray) – Predicted time series scenarios, in shape of [B, T, C, N] (N is the number of scenarios), or [B, T, C, Q] (Q is the number of quantiles).
quantiles (np.array, optional) – Quantile levels. The more levels are, the more accurately CRPS is approximated. Defaults to np.arange(0.1, 1.0, 0.1).

gents.evaluation.model_free.mse(y_true: ndarray, y_pred: ndarray) → float

Calculating Mean Squared Error (MSE).

Adapated from sklearn.metrics.mean_squared_error.

Parameters:

y_true (np.ndarray) – Ground truth time series, in shape of [B, T, C].
y_pred (np.ndarray) – Predicted time series scenarios, in shape of [B, T, C].