Datasets
- class Data(trajs, labels, dataset_name='custom')
Structure that groups the trajectories and labels along with some useful methods to work with the set of them.
- Parameters:
trajs (List[Trajectory]) – A list that contains a subset of the dataset trajectories.
labels (List[Any]) – A list that contains the label of each trajectory from the subset.
dataset_name (str) – Name of the dataset where the trajectories come from. If not provided, it will be set to “custom”.
- property classes: List[Any]
Classes present in the dataset.
- featurize(featurizer)
Featurizes the trajectories.
- Parameters:
featurizer (Featurizer) – Featurizer to be used.
- Returns:
A numpy array with the featurized trajectories.
- Return type:
np.ndarray
- take(size, stratify=True, shuffle=True, random_state=None)
Takes a subset of the dataset.
- Parameters:
size (Union[float, int]) – If float, it should be between 0 and 1 and it will be interpreted as the proportion of the dataset to be taken. If int, it should be between 0 and the dataset size and it will be interpreted as the number of trajectories to be taken.
stratify (bool, optional) – If True, the dataset will be stratified by the labels, by default True.
shuffle (bool, optional) – If True, the dataset will be shuffled before taking the subset, by default True.
random_state (Union[int, None], optional) – Random state to be used, by default None.
- Returns:
A new Data object with the subset of the dataset.
- Return type:
- cut(size)
Similar to split, but without shuffle, stratify, etc. Just slices the dataset into two parts.
- Parameters:
size (Union[float, int]) – If float, it should be between 0 and 1 and it will be interpreted as the proportion of the dataset to be taken. If int, it should be between 0 and the dataset size and it will be interpreted as the number of trajectories to be taken.
- Returns:
A tuple with two Data objects, the first one with the first part of the cut and the second one with the second part.
- Return type:
- split(train_size=0.8, stratify=True, shuffle=True, random_state=None)
Splits the dataset into train and test dataset slices.
It uses the sklearn.model_selection.train_test_split function.
- Parameters:
train_size (Union[float, int], optional) – The proportion of the dataset to include in the train split. If float, should be between 0.0 and 1.0, if int, represents the absolute number of train samples. By default 0.8.
stratify (bool, optional) – If True, the split will be stratified according to the labels, by default True
shuffle (bool, optional) – If True, the split will be shuffled, by default True
random_state (Union[int, None], optional) – Random seed for reproducibility, by default None
- Returns:
A tuple with the train and test Data objects.
- Return type:
- map(func)
Applies a function to each trajectory and label pair.
Usefull to apply some preprocessing to the trajectories or the labels.
- Parameters:
func (Callable[[Trajectory, Any], Tuple[Trajectory, Any]]) – Function to be applied to each trajectory and label pair.
- Returns:
A new Data object with the results of the function.
- Return type:
- class Dataset(name, trajs, labels, version=0)
Wraps the data with some general properties that describes a full dataset
- Parameters:
name (str) – Name of the dataset.
trajs (List[Trajectory]) – A list that contains the dataset trajectories.
labels (List[Any]) – A list that contains the label of each trajectory.
version (int) – Dataset version.
- static from_file(path, name)
Loads a dataset from a file.
- Parameters:
path (Path | str) –
name (str) –
- Return type:
- static geolife(redownload=False)
Loads the geolife dataset
- Parameters:
redownload (bool) –
- Return type:
- static animals(redownload=False)
Loads the animals dataset
- Parameters:
redownload (bool) –
- Return type:
- static mnist_stroke(redownload=False)
Loads the mnist_stroke dataset
- Parameters:
redownload (bool) –
- Return type:
- static hurdat2(redownload=False)
Loads the hurdat2 dataset
- Parameters:
redownload (bool) –
- Return type:
- static cma_bst(redownload=False)
Loads the cma_bst dataset
- Parameters:
redownload (bool) –
- Return type:
- static uci_gotrack(redownload=False)
Loads the uci_gotrack dataset
- Parameters:
redownload (bool) –
- Return type:
- static uci_movement_libras(redownload=False)
Loads the uci_movement_libras dataset
- Parameters:
redownload (bool) –
- Return type:
- static uci_pen_digits(redownload=False)
Loads the uci_pen_digits dataset
- Parameters:
redownload (bool) –
- Return type:
- static uci_characters(redownload=False)
Loads the uci_characters dataset
- Parameters:
redownload (bool) –
- Return type:
- static diffusive_particles(redownload=False)
Loads the diffusive particles dataset
- Parameters:
redownload (bool) –
- Return type: