Skip to content

Latest commit

 

History

History

modelling

Baseline Modelling

This section includes 2 baseline models that help analyze the scenario data. There are 2 types of models that we use, one is a frame classifier and the other is an Autoencoder model that runs on a DTW representation of the scenarios.

Preliminary Dataprep

We provide scenarios as part of this dataset. We include 2 ways in which you can access data.

The first option is to use raw CARLA recordings of the scenarios. These are in the Carla Log format - https://carla.readthedocs.io/en/0.9.11/ref_recorder_binary_file_format/. These are generally named with a .log extension. We provide a scenario recorder script(recorder.py) that takes these recordings, runs them in a CARLA simulator and stores specific features about the scenario to a CSV or Parquet file. The user can refer to the get_metadata function in recorder.py to see what features we extract for each actor. This is just an example, and the user is free to extract any features or their from the simulator, as long as it helps with the anomaly detection task. Feel free to update the recorder.py script according to your use cases.

We also additionally provide scripts to set up scenarios directly in CARLA and record them to the logs. Users may choose to use these scripts directly instead of the CARLA Logs discussed above.

Generating Data

Here we will generate the frame and agent level data from the consolidated scenario data that was generated by the record_data.sh script. This frame and agent data will be generated in a folder called agent_maps/ in the root directory.

To run the preliminary dataprep process, follow the following steps.

  1. Run the record_data.sh script. If you want to customize what data is being stored from the simulation refer to the Customizing Recordings document.
  2. Update the folders list in the generate_agent_maps.py file in the root directory. Ensure to only list the directories that were generated after you run the record_data.sh script. If you customized what data was recorded in recorder.py, ensure that you update generate_agent_maps.py to ensure that data is being included in the agent maps as well. (Note, check the CarlaCsvParser class, in particular its run function)
  3. Step 2 will now have generated the individual agent maps, now we are ready to continue to the model specific steps. The model specific steps and the generate_agent_maps.py share some convention in the filenames, this helps in consistently and efficiently parsing the agent data. Refer to Agent Map Filename Convention for the details(though by default everything should work automatically).

Frame Classifier

This is a Random Forest based classifier model that runs on a set of precalculated features. Each scenario includes data for each "frame". A frame is the the data for a given timestamp. Each frame contains multiple agents and their positions, velocities etc. For we use the data for each scenario in the perspective of all vehicles in the scenario. A scenario with 5 vehicles will result in 5 different scenarios when we consider each vehicle to be the "ego" vehicle.

For extracting features for the Random Forest model, for each frame we identify the minimum and maximum values of acceleration, velocity, angular velocity in each of the 3 directions(x, y, z in the global frame). We use these as the features for our simple classification models. The features are - "max_velocity_x", "max_velocity_y", "max_velocity_z", "max_ang_velocity_x", "max_ang_velocity_y", "max_ang_velocity_z", "min_velocity_x", "min_velocity_y", "min_velocity_z", "min_ang_velocity_x", "min_ang_velocity_y", "min_ang_velocity_z", "max_acc_x", "max_acc_y", "max_acc_z", "min_acc_x", "min_acc_y", "min_acc_z".

To run this classifier, run the following commands from the root folder

  1. Ensure the Preliminary Dataprep steps are completed.
  2. python modelling/frame_classifier/run.py
  3. This will train a random forest classifier and also print the results. It will also show the permutation importance plots for the model. Refer to the individual function calls inside frame_classifier/run.py for an example on how to use the generated model. You may write your own "run" file to either save the model or even infer on another dataset.

The "Fast" and simple version of this(includes the dataprep on the raw recordings), is available in simple_frame_classifier.py. This file does not consider the transformations and the ego vehicle "vicinity" filtering, hence, in a way, is a "privileged" classifier which gives extremely high accuracy.

DTW Autoencoder

The DTW Autoencoder converts each "scenario" into a DTW map. The DTW map is always wrt a given "ego" vehicle. We can consider any vehicle in the scenario as an "ego" vehicle. In our case, for each scenario, we consider ALL vehicles as ego one by one. So for a scenario with 5 vehicles, we will have 5 different sets of data points (one for with each vehicle to have the ego perspective).

One DTW map includes a tensor of upto 10 interactions. An interaction is generated using the position information of the ego vehicle and another vehicle in the scenario. The trajectories of the 2 vehicles generates one DTW frame(we use the DTW cost map in the 2D form as one frame). 10 DTW frames make up one tensor, which we refer to as a DTW map. For one DTW map, we will have the current ego vehicle in all the interactions.

So in a scenario where we have 11 vehicles, the number of possible "ego" vehicles is 11. And for each "ego" vehicle, we have 10 "other" vehicles(all vehicles besides itself). Hence for each ego vehicle we will have 10 DTW frames. These 10 frames are stacked to produce a tensor which we refer to as the DTW map.

To train the Autoencoder, we run the following steps,

  1. Ensure the Preliminary Dataprep steps are completed
  2. python modelling/dtw_autoencoder/train.py - This will train the DTW model and save the model checkpoints.
  3. Use the autoencoder's "encoder" to return an embedding for a given scenario (using the ScenarioModel's embedding() function.) This embedding can further be used for clustering or even training a supervised model. You can use the inferer.py file to run a trained encoder on the data we generated during the preliminary dataprep steps.
  4. The supervised model is available in playground/embedding_classifier.ipynb. This shows 2 examples, including a t-SNE + NN classifier and a simple NN classifier on the 1024 dimension embedding