State Transformer (STR) is a general model for both motion prediction and motion planning across multiple large-scale real-world datasets. We reformulate the motion prediction and motion planning problem by arranging all elements into a sequence modeling task. Our experiment results reveal that large trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting outstanding adaptability and learning efficiency when trained using larger Transformer backbones. Qualitative analysis illustrates that LTMs are capable of generating plausible predictions in scenarios that diverge significantly from the training dataset’s distribution. LTMs can also learn to make complex reasonings for long-term planning, extending beyond the horizon of 8 seconds, without explicit loss designs or costly high-level annotations. Checkout our paper for more details.
Large Trajectory Models are Scalable Motion Predictors and Planners
Authors: Qiao Sun, Shiduo Zhang, Danjiao Ma, Jingzhe Shi, Derun Li, Simian Luo, Yu Wang, Ningyi Xu, Guangzhi Cao, Hang Zhao
[Arxiv]
Motion prediction and planning are vital tasks in autonomous driving, and recent efforts have shifted to machine learning-based approaches. The challenges include understanding diverse road topologies, reasoning traffic dynamics over a long time horizon, interpreting heterogeneous behaviors, and generating policies in a large continuous state space. Inspired by the success of large language models in addressing similar complexities through model scaling, we introduce a scalable trajectory model called State Transformer (STR). STR reformulates the motion prediction and motion planning problems by arranging observations, states, and actions into one unified sequence modeling task. Our approach unites trajectory generation problems with other sequence modeling problems, powering rapid iterations with breakthroughs in neighbor domains such as language modeling. Remarkably, experimental results reveal that large trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting outstanding adaptability and learning efficiency. Qualitative results further demonstrate that LTMs are capable of making plausible predictions in scenarios that diverge significantly from the training data distribution. LTMs also learn to make complex reasonings for long-term planning, without explicit loss designs or costly high-level annotations.
- Causal Transformer for sequence modeling instead of Encoder-Decoder models
- Easy modify the backbone to scale or replace with other LLMs
- Versatile for adding classification loss or rule-based post-processions
Install Pytorch with CUDA first (our recommendation is pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
). Then run pip install -r requirements.txt
to install all dependencies. There are some additional dependencies for NuScenes and Waymo datasets. Please refer to the official websites for more details.
run pip install -e .
from the root directory of Transformer4Planning.
(tested with v1.2)
run pip install -e .
from the root directory of NuPlan-Devkit.
Then install these packages:
pip install aioboto3
pip install retry
pip install aiofiles
pip install bokeh==2.4.1
run pip install waymo-open-dataset-tf-2-11-0==1.6.0
to install WOMD-Devkit following here.
We process the dataset into the Hugging Face Dataset format. Click here to download the NuPlan dataset. Unzip the file and pass the path to the '--saved_dataset_folder' argument to use it. This dataset contains the training, validation, and the test set.
Waymo dataset will be updated soon on the server to download.
If you need to customize your dataset, you would need to process the official dataset by our scripts.
- For NuPlan dataset, please refer to the
generation.py
and read the following 'NuPlan Dataset Pipeline' section for more details. - For Waymo dataset, please refer to the
waymo_generation.py
and read the following 'Waymo Dataset Pipeline' section for more details.
Usage:
- process NuPlan .db files to .pkl files (to agent dictionaries)
- generate filtered scenario index and cache in .arrow files
- generate map dictionary to pickles
We recommend you to organize your downloaded dataset in the following way. If you need to process from a customized directory, check generation.py
for more derivations.
{DATASET_ROOT}
├── maps
│ ├── us-ma-boston
│ ├── us-pa-pittsburgh-hazelwood
│ ├── us-nv-las-vegas-strip
│ ├── sg-one-north
│ ├── nuplan-maps-v1.0.json
├── nuplan-v1.1
│ │── train_singapore
│ │ ├── *.db
│ │── train_boston
│ │ ├── *.db
│ │── train_pittsburgh
│ │ ├── *.db
│ │── train_las_vegas_1
│ │ ├── *.db
│ │── train_las_vegas_2
│ │ ├── *.db
│ │── train_las_vegas_3
│ │ ├── *.db
│ │── train_las_vegas_4
│ │ ├── *.db
│ │── train_las_vegas_5
│ │ ├── *.db
│ │── train_las_vegas_6
│ │ ├── *.db
│ │── test
│ │ ├── *.db
│ │── val
│ │ ├── *.db
Step 1: Process .db to .pkl by running (data_path is the folder name like 'train_singleapore'):
python generation.py --num_proc 40 --sample_interval 100
--dataset_name boston_index_demo --starting_file_num 0
--ending_file_num 10000 --cache_folder {PATH_TO_CACHE_FOLDER}
--data_path {PATH_TO_DATASET_FOLDER} --only_data_dic
Step 2: Generate scenarios to .arrow datasets
python generation.py --num_proc 40 --sample_interval 100
--dataset_name boston_index_interval100 --starting_file_num 0
--ending_file_num 10000 --cache_folder {PATH_TO_CACHE_FOLDER}
--data_path {PATH_TO_DATASET_FOLDER} --only_index
Step 3: Generate Map files to .pickle files
python generation.py --num_proc 40 --sample_interval 1 --dataset_name {DATASET_NAME}
--starting_file_num 0 --ending_file_num 10000
--cache_folder {PATH_TO_CACHE_FOLDER}
--data_path {PATH_TO_DATASET_FOLDER} --save_map
python generation.py --num_proc 40 --sample_interval 1
--dataset_name vegas2_datadic_float32 --starting_file_num 0 --ending_file_num 10000
--cache_folder {PATH_TO_CACHE_FOLDER} --data_path {PATH_TO_DATASET_FOLDER} --save_map
You only need to process Vegas's map once for all Vegas subsets.
Why process .db files to .pkl files? Lower disk usage (lower precision) and faster loading (without initiate NuPlan DataWrapper)
root-
|--train
|--us-ma-boston
--*.pkl
|--us-pa-pittsburgh-hazelwood
...
|--test
|--us-ma-pittsburgh
--*.pkl
...
|--map
--us-ma-boston.pkl
--us-pa-pittsburgh-hazelwood.pkl
--us-nv-las-vegas-strip.pkl
--sg-one-north
...
|--index (can be organized dynamically)
|--train
|--train-index_boston
--*.arrow
|--test
|--test-index_pittsburgh
--*.arrow
We provide the pretrained checkpoint for STR(CKS) - 16M. This is a checkpoint with a diffusion decoder. You can download via this link for STR(CKS)-16M-Diff.12e, and this link for STR(CKS)-16M. These checkpoints were trained with the NuPlan dataset, so please make sure to read their non-comercial policies carefully before using them.
The OLS (Open Loop Simulation) performance of these checkpoints is as follows:
OLS | 8sADE | 3sFDE | 5sFDE | 8sFDE | |
---|---|---|---|---|---|
STR(CKS)-16M | 83.42 | 1.9134 | 1.0463 | 2.2173 | 4.8369 |
STR(CKS)-16M-Diff.12e | 86.52 | 1.773 | 0.9477 | 2.104 | 4.505 |
The CLS-NR (Closed Loop Simulation - Not Reactive) performance of this checkpoint is as follows:
CLS-NR | drivable_area_compliance | driving_direction_compliance | ego_is_comfortable | ego_is_making_progress | ego_progress_along_expert_route | no_ego_at_fault_collisions | speed_limit_compliance | time_to_collision_within_bound | |
---|---|---|---|---|---|---|---|---|---|
STR(CKS)-16M-Diff.12e | 54.46 | 0.81216458 | 0.961091234 | 0.973166369 | 0.921288014 | 0.709096891 | 0.763416816 | 0.957543143 | 0.711091234 |
The CLS-R (Closed Loop Simulation - Reactive) performance of this checkpoint is as follows:
CLS-R | drivable_area_compliance | driving_direction_compliance | ego_is_comfortable | ego_is_making_progress | ego_progress_along_expert_route | no_ego_at_fault_collisions | speed_limit_compliance | time_to_collision_within_bound | |
---|---|---|---|---|---|---|---|---|---|
STR(CKS)-16M-Diff.12e | 57.34 | 0.810375671 | 0.96019678 | 0.972271914 | 0.913237925 | 0.708428908 | 0.813953488 | 0.955975959 | 0.770125224 |
--model_name
can choose from ["gpt-large","gpt-medium","gpt-small","gpt-mini"] and with prefix of scratch-
or pretrain-
to determine wether load pretrained weights from existed checkpoints, whose attributes is --model_pretrain_name_or_path
The common training settings are shown below.
python -m torch.distributed.run \
--nproc_per_node=8 runner.py \
--do_train --do_eval\
--model_name scratch-gpt-mini \
--saved_dataset_folder {PATH_TO_DATASET_FOLDER} \
--output_dir {PATH_TO_OUTPUT_FOLDER} \
--logging_dir {PATH_TO_LOG_FOLDER} \
--run_name {RUN_NAME} \
--num_train_epochs 100 \
--per_device_train_batch_size 16 --warmup_steps 50 \
--weight_decay 0.01 --logging_steps 2 --save_strategy steps \
--dataloader_num_workers 10 \
--save_total_limit 2 \
--dataloader_drop_last True \
--task nuplan \
--remove_unused_columns False \
--evaluation_strategy steps \
--eval_steps 1000 \
--per_device_eval_batch_size 8 \
--predict_yaw True \
--past_sample_interval 2 \
--x_random_walk 0.1 --y_random_walk 0.1 --augment_current_pose_rate 0.1
To choose different encoder, please set the attribute --encoder_type
. The choices are [raster
, vector
]. With different --task
setting, the encoder can be initilized with different classes.
To choose different decoder, please set the attribute --decoder_type
. The choices are [mlp
, diffusion
].
Training the diffusion decoder together with the backbone is not recommended due to extremely slow convergence. We recommand you to train the backbone and ecoders with the MLP decoder, and train the diffusion decoder after convergence.
After training with an MLP decoder, you need to generate the dataset for training the diffusion decoder using generate_diffusion_feature.py
: this is done using the same command for eval except that you need to set --generate_diffusion_dataset_for_key_points_decoder
to True and --diffusion_feature_save_dir
to the dir to save the pth files for training diffusion decoders.
An example:
python -m torch.distributed.run \
--nproc_per_node=8 generate_diffusion_feature.py \
--model_name pretrain-gpt-mini \
--model_pretrain_name_or_path {PATH_TO_PRETRAINED_CHECKPOINT} \
--saved_dataset_folder {PATH_TO_DATASET_FOLDER} \
--output_dir {PATH_TO_OUTPUT_FOLDER} \
--logging_dir {PATH_TO_LOG_FOLDER} \
--run_name {RUN_NAME} \
--dataloader_num_workers 10 \
--dataloader_drop_last True \
--task nuplan \
--future_sample_interval 2 \
--past_sample_interval 5 \
--evaluation_strategy steps \
--overwrite_output_dir \
--next_token_scorer True \
--pred_key_points_only False \
--diffusion_feature_save_dir {PATH_TO_DIFFUSION_FEATURE_SAVE_DIR} \
After running these, you are supposed to get three folders under diffusion_feature_save_dir
for train
, val
and test
diffusion features respectively.
diffusion_feature_save_dir
|--train
--future_key_points_[0-9]*.pth
--future_key_points_hidden_state_[0-9]*.pth
|--val
--future_key_points_[0-9]*.pth
--future_key_points_hidden_state_[0-9]*.pth
|--test
--future_key_points_[0-9]*.pth
--future_key_points_hidden_state_[0-9]*.pth
After saving the pth files, you need to run convert_diffusion_dataset.py
to convert them into arrow dataset which is consistent with the training format we are using here.
python3 convert_diffusion_dataset.py \
--save_dir {PATH_TO_SAVE_DIR} \
--data_dir {PATH_TO_DIFFUSION_FEATURE_SAVE_DIR} \
--num_proc 10 \
--dataset_name train \
--map_dir {PATH_TO_MAP_DIR} \
--saved_datase_folder {PATH_TO_DATASET_FOLDER} \
--use_centerline False \
--split train \
After which you are expected to see such structures in save_dir:
save_dir
|--generator
|--*
|--train
*.arrow
dataset_info.json
state.json
*.lock
Fianlly, please set --task
to train_diffusion_decoder
. In this case, the model is initilized without a transformer backbone(Reduce the infence time to build backbone feature).
In the meanwhile, please change the --saved_dataset_folder
to the folder which stores 'backbone features dataset', obtained previously by convert_diffusion_dataset.py
. In this case it should be consistent with save_dir
in the previous step.
python3 -m torch.distributed.run \
--nproc_per_node=8 runner.py \
--model_name pretrain-gpt-small \
--model_pretrain_name_or_path {PATH_TO_PRETRAINED_CHECKPOINT} \
--saved_dataset_folder {PATH_TO_DATASET_FOLDER} \
--output_dir {PATH_TO_OUTPUT_FOLDER} \
--logging_dir {PATH_TO_LOG_FOLDER} \
--run_name {RUN_NAME} \
--num_train_epochs 10 \
--weight_decay 0.00001 --learning_rate 0.0001 --logging_steps 2 --save_strategy steps \
--dataloader_num_workers 10 \
--per_device_train_batch_size 2 \
--save_total_limit 2 --predict_yaw True \
--dataloader_drop_last True \
--task train_diffusion_decoder \
--remove_unused_columns False \
--do_train \
--loss_fn mse \
--per_device_eval_batch_size 2 \
--past_sample_interval 2 \
--trajectory_loss_rescale 0.00001 \
After training the diffusion keypoint decoder separately, you may use such command to eval its performance:
export CUDA_VISIBLE_DEVICES=0; \
nohup python3 -m torch.distributed.run \
--nproc_per_node=1 runner.py \
--model_name pretrain-gpt-small-gen1by1 \
--model_pretrain_name_or_path {PATH_TO_PRETRAINED_CHECKPOINT} \
--saved_dataset_folder {PATH_TO_DATASET_FOLDER} \
--output_dir {PATH_TO_OUTPUT_FOLDER} \
--logging_dir {PATH_TO_LOG_FOLDER} \
--run_name {RUN_NAME} \
--num_train_epochs 1 \
--weight_decay 0.00 --learning_rate 0.00 --logging_steps 2 --save_strategy steps \
--dataloader_num_workers 10 \
--per_device_train_batch_size 2 \
--save_total_limit 2 --predict_yaw True \
--dataloader_drop_last True \
--task nuplan \
--remove_unused_columns False \
--do_eval \
--evaluation_strategy epoch \
--per_device_eval_batch_size 2 \
--past_sample_interval 2 \
--kp_decoder_type diffusion --key_points_diffusion_decoder_feat_dim 256 --diffusion_condition_sequence_lenth 1 \
--key_points_diffusion_decoder_load_from {KEY_POINT_DIFFUSION_CHECKPOINT_PATH}
export CUDA_VISIBLE_DEVICES=3; \
python -m torch.distributed.run \
--nproc_per_node=1 \
--master_port 12345 \
runner.py --model_name pretrain-gpt \
--model_pretrain_name_or_path data/example/training_results/checkpoint-xxxxx \
--saved_dataset_folder /localdata_ssd/nuplan/nsm_autoregressive_rapid \
--output_dir data/example/prediction_results/checkpoint-xxxxx \
--per_device_eval_batch_size 32 \
--dataloader_num_workers 16 \
--predict_trajectory True \
--do_predict \
--saved_valid_dataset_folder /localdata_ssd/nuplan/boston_test_byscenario/
--max_predict_samples 1000 \
--dataloader_drop_last True \
--with_traffic_light True \
--remove_unused_columns True \
--next_token_scorer True \
--trajectory_loss_rescale 1e-3 \
--pred_key_points_only False \
--specified_key_points True \
--forward_specified_key_points False \
Generate WOMD training data dictionary and index:
python waymo_generation.py --mode train --save_dict --agent_type 1 2 3
Here you got the original pickle data files. By default, we require the distribution of data and index files to be as follow.
WOMD_data_save_dir
|--index
--train
--val
--test
|--origin
--train
--val
--test
To accelerate the data loading part, we suggest to split the large pickle files into small ones per scenario by:
python split_scenario_waymo.py --delete_origin True
WARNING: This is still under development.
We use the NuPlan Garage environment to evaluate the NuPlan model on NuBoard. The NuPlan Garage environment is a wrapper around the NuPlan-Devkit environment. It is used to run the NuPlan model in the NuBoard environment. The NuPlan Garage environment is located in the nuplan_garage
folder.
run pip install -e .
from the root directory of Transformer4Planning.
(tested with v1.2)
run pip install -e .
from the root directory of NuPlan-Devkit.
Then install these additional packages:
pip install aioboto3
pip install retry
pip install aiofiles
pip install bokeh==2.4.1
Create a new yaml file for Hydra at: script/config/simulation/planner/str_planner.yaml
with:
control_tf_planner:
target: transformer4planning.submission.planner.STRPlanner
horizon_seconds: 10.0
sampling_time: 0.1
acceleration: [5.0, 5.0] # x (longitudinal), y (lateral)
thread_safe: true
Create a new yaml file for Hydra at: script/config/simulation/planner/rule_based_planner.yaml
with:
rule_based_planner:
target: transformer4planning.submission.rule_based_planner.RuleBasedPlanner
horizon_seconds: 10.0
sampling_time: 0.1
acceleration: [5.0, 5.0] # x (longitudinal), y (lateral)
thread_safe: true
- Install Transformer4Planning and NuPlan-Devkit
- (Optional) Copy the script folder from NuPlan's Official Repo to update
- Modify dataset path in the
run_simulation.py
and run it to evaluate the model with the STR plannerpython nuplan_garage/run_simulation.py 'planner=str_planner' \ 'scenario_filter=val14_split' \ 'job_name=closed_loop_nonreactive_agents' 'scenario_builder=nuplan' \ 'ego_controller=perfect_tracking_controller' 'observation=box_observation' \ hydra.searchpath="[pkg://nuplan_garage.planning.script.config.common, pkg://nuplan_garage.planning.script.config.simulation, pkg://nuplan.planning.script.config.common, pkg://nuplan.planning.script.config.simulation, pkg://nuplan.planning.script.experiments]"
'scenario_filter.limit_total_scenarios=1' 'scenario_filter.num_scenarios_per_type=1' \
nuplan/planning/script/config/common/default_experiment.yaml
job_name: open_loop_boxes
or
job_name: closed_loop_nonreactive_agents
or
job_name: closed_loop_reactive_agents
nuplan/planning/script/config/common/default_common.yaml
scenario_builder: nuplan
correspond to the yaml filename in scripts/config/common/scenario_builder
scenario_filter: all_scenarios
correspond to the yaml filenmae in scripts/config/common/scnario_filter
nuplan/planning/script/config/common/worker/ray_distributed.yaml
threads_per_node: 6
nuplan/planning/script/config/common/worker/single_machine_thread_pool.yaml
max_workers: 6
nuplan/planning/script/config/simulation/default_simulation.yaml
observation: box_observation
ego_controller: perfect_tracking_controller
planner: str_planner
nuplan/planning/script/config/common/default_common.yaml
debug mode - worker: sequential
multi-threading - worker: ray_distributed
nuplan/planning/script/config/common/scenario_filter/all_scenarios.yaml
num_scenarios_per_type: 1
limit_total_scenarios: 5
log_names: []
to filter the db files to run simulation
os.environ['USE_PYGEOS'] = '0'
import geopandas
os.environ['HYDRA_FULL_ERROR'] = '1'
os.environ['NUPLAN_DATA_ROOT'] = ''
os.environ['NUPLAN_MAPS_ROOT'] = ''
os.environ['NUPLAN_DB_FILES'] = ''
os.environ['NUPLAN_MAP_VERSION'] = ''
set scenario_builder
scenario_filter
to correspond scenario_builder yaml file and scenario_filter yaml file in script/config/common/default_common.yaml
. scenario_builder path is script/config/common/scenario_builder
and scenario_filter path is script/config/common/scenario_filter
. For example, choose builder yaml as nuplan, and filter yaml as all_scenarios, if you want to filter specify logs or maps, please add [log_name1, ..., log_namen]
to log_names.
python nuplan/planning/script/run_simulation.py
to set configs:
planner choice: planner=control_tf_planner
Optional [control_tf_planner, rule_based_planner]
chanllenge choice: +simulation=closed_loop_reactive_agents
Optional [closed_loop_reactive_agents, open_loop_boxes, closed_loop_nonreactive_agents]
python script/run_nuboard.py simulation_path='[/home/sunq/nuplan/exp/exp/simulation/test/2023.05.08.19.17.16]' 'scenario_builder=nuplan' 'port_number=5005'
or
python nuplan/planning/script/run_nuboard.py simulation_path='[/home/sunq/nuplan/exp/exp/simulation/open_loop_boxes/2023.04.21.21.47.58]'
python script/run_metric_scores.py --file_path ... --save_dir ... --exp_name ... --simulation_type ...
The default transformer backbone is the GPT2 model from the Hugging Face's Transformers Lib.
We also tried the recent Mamba backbone and it works even better (with some minor bugs to be fixed in the current version).
Follow the instructions from the official Mamba repo to install and change the model name from gpt
to mamba
to use it.
If you find this work useful in your research, please consider cite:
@article{sun2023large,
title={Large Trajectory Models are Scalable Motion Predictors and Planners},
author={Qiao Sun and Shiduo Zhang and Danjiao Ma and Jingzhe Shi and Derun Li and Simian Luo and Yu Wang and Ningyi Xu and Guangzhi Cao and Hang Zhao},
journal={arXiv preprint arXiv:2310.19620},
year={2023}
}