📦 Dataset Design

⏬ Data Download

To get started with the datasets, download the all_data.zip file from either Google Drive or Baidu Netdisk. After downloading, unzip the files into the datasets/ directory:

cd /path/to/BasicTS # not BasicTS/basicts
unzip /path/to/all_data.zip -d datasets/

These datasets are preprocessed and ready for immediate use.

💿 Data Format

Each dataset contains at least two essential files: data.dat and desc.json:

data.dat: This file stores the raw time series data in numpy.memmap format with a shape of [L, N, C].
- L: Number of time steps. Typically, the training, validation, and test sets are split along this dimension.
- N: Number of time series, also referred to as the number of nodes.
- C: Number of features. Usually, this includes [target feature, time of day, day of week, day of month, day of year], with the target feature being mandatory and the others optional.
desc.json: This file contains metadata about the dataset, including:
- Dataset name
- Domain of the dataset
- Shape of the data
- Number of time slices
- Number of nodes (i.e., the number of time series)
- Feature descriptions
- Presence of prior graph structures
- Regular settings:
  - Input and output lengths
  - Ratios for training, validation, and test sets
  - Whether normalization is applied individually to each channel (i.e., time series)
  - Whether to re-normalize during evaluation
  - Evaluation metrics
  - Handling of outliers

🧑‍💻 Dataset Class Design

In time series forecasting, datasets are typically generated from raw time series data using a sliding window approach. As illustrated above, the raw time series is split into training, validation, and test sets along the time dimension, and samples are generated using a sliding window of size inputs + targets. Most datasets adhere to this structure.

BasicTS provides a built-in Dataset class called TimeSeriesForecastingDataset, designed specifically for time series data. This class generates samples in the form of a dictionary containing two objects: inputs and target. inputs represents the input data, while target represents the target data. Detailed documentation can be found in the class's comments.

🧑‍🍳 How to Add or Customize Datasets

If your dataset follows the structure described above, you can preprocess your data into the data.dat and desc.json format and place it in the datasets/ directory, e.g., datasets/YOUR_DATA/{data.dat, desc.json}. BasicTS will then automatically recognize and utilize your dataset.

For reference, you can review the scripts in scripts/data_preparation/, which are used to process datasets from raw_data.zip (Google Drive, Baidu Netdisk).

If your dataset does not conform to the standard format or has specific requirements, you can define your own dataset class by inheriting from torch.utils.data.Dataset. In this custom class, the __getitem__ method should return a dictionary containing inputs and target.

🧑‍💻 Explore Further

🎉 Getting Stared
💡 Understanding the Overall Design Convention of BasicTS
📦 Exploring the Dataset Convention and Customizing Your Own Dataset
🛠️ Navigating The Scaler Convention and Designing Your Own Scaler
🧠 Diving into the Model Convention and Creating Your Own Model
📉 Examining the Metrics Convention and Developing Your Own Loss & Metrics
🏃‍♂️ Mastering The Runner Convention and Building Your Own Runner
📜 Interpreting the Config File Convention and Customizing Your Configuration
🔍 Exploring a Variety of Baseline Models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset_design.md

dataset_design.md

📦 Dataset Design

⏬ Data Download

💿 Data Format

🧑‍💻 Dataset Class Design

🧑‍🍳 How to Add or Customize Datasets

🧑‍💻 Explore Further

Files

dataset_design.md

Latest commit

History

dataset_design.md

File metadata and controls

📦 Dataset Design

⏬ Data Download

💿 Data Format

🧑‍💻 Dataset Class Design

🧑‍🍳 How to Add or Customize Datasets

🧑‍💻 Explore Further