Example 0

In this introductory example, we showcase how to explore a pactus-compatible dataset. For simplicity, we built the example around a single dataset: hurrdat2 dataset. However, the same procedure can be applied to all the other datasets included in pactus.

The example is structured as follows:

Note

You can access the script of this example.

1. Setup dependencies

Import all the dependencies:

import matplotlib.pyplot as plt
import numpy as np
from yupi.graphics import plot_2d, plot_hist

from pactus import Dataset

2. Loading Data

To load the original Hurdat2 dataset we can simply do:

ds = Dataset.hurdat2()

Then, we can inspect its content:

print(f"Loaded dataset: {ds.name}")
print(f"Total trajectories: {len(ds.trajs)}")
print(f"Different classes: {ds.classes}")
Loaded dataset: hurdat2
Total trajectories: 1903
Different classes: [1, 3, 0, 2, 4, 5]

Note

In this particular case, the classes are integers that reflect the hurrican category in the Saffir-Simpson scale. However, the classes of other datasets may be strings.

3. Inspecting a single trajectory

Here, we will pick the trajectory no. 20, and its corresponding label, from the dataset and plot it using yupi. Several operations can be performed over a trajectory. For a comprehensive guide see yupi’s documentation .

traj_idx = 20
traj, label = ds.trajs[traj_idx], ds.labels[traj_idx]
plot_2d([traj], legend=False, show=False)
plt.legend([f"Label: {label}"])
plt.title(f"Trajectory no. {traj_idx}")
plt.xlabel("lon")
plt.ylabel("lat")
plt.show()
../_images/ex00_1.png

4. Inspecting a subset of the first trajectories

Similarly, we can plot a group of trajectories all together. Next, we will pick the first 200 trajectories from the dataset and plot them:

traj_count = 200
first_trajs = ds.trajs[:traj_count]
plot_2d(first_trajs, legend=False, color="#2288dd", show=False)
plt.title(f"First {traj_count} trajectories")
plt.xlabel("lon")
plt.ylabel("lat")
plt.show()
../_images/ex00_2.png

5. Inspecting the distribution of trajectories on each class

In any kind of classification, it is very useful to know the balance of a dataset among all the available classes. The following code produces a histogram with the count of trajectories on every class.

plt.bar(ds.label_counts.keys(), ds.label_counts.values())
plt.title("Trajectory count by class")
plt.xlabel("Class")
plt.show()
../_images/ex00_3.png

6. Inspecting the lenght distribution of the trajectories in the dataset

Another useful information to extract from a trajectory dataset is the distribution of the trajectories lenghts. The following code produces a histogram of the lenghts of every trajectory in the dataset.

lengths = np.array([len(traj) for traj in ds.trajs])
plot_hist(lengths, bins=40, show=False)
plt.title("Trajectory lengths historgram")
plt.xlabel("Length")
plt.show()
../_images/ex00_4.png