PyTorch implementation of GOMAA-Geo: GOal Modality Agnostic Active Geo-localization (NeurIPS 2024)
Anindya Sarkar*, Srikumar Sastry*, Aleksis Pirinen, Chongjie Zhang, Nathan Jacobs, Yevgeniy Vorobeychik (*Corresponding Author)
This repository is the official implementation of the NeurIPS 2024 paper GOMAA-Geo, a goal modality agnostic active geo-localization agent that can geo-localize a goal location -- specified as an aerial patch, ground-level image, or textual description -- by navigating partially observed aerial imagery.
- Update Gradio demo
- Release Models to 🤗 HuggingFace
- Release PyTorch
ckpt
files for all models
You can use the following commands to install the necessary dependencies to run the code:
conda create --name gomaa_geo
conda activate gomaa_geo
conda install python==3.11
pip install -r requirements.txt
To run the code with the Masa or xBD data, download the zip file at the following link: https://www.kaggle.com/datasets/balraj98/massachusetts-buildings-dataset To run the code with the xBD data, download the zip file at the following link: https://xview2.org/ (Note that, In order to download the dataset, first login using a valid email id) To run the code with our MM-GAG data, download the zip file at the following Anonymous Huggingface link: https://huggingface.co/datasets/Gomaa-Geo/MM-GAG
The uncompressed folder named data
should be placed at the root directory of this repository.
This folder includes processed data for the following active geo localization problems:
- Masa dataset in
masa_data
, xBD dataset in 'xBD_data'
This folder also includes other intermediate results to recreate our analyses and figures.
Before setting up data or running experiments, setup all parameters of interest in the file config.py
. This includes grid size, model configuration, training configuration etc.
Extract all data of interest in the folder gomaa_geo/data
Then, create patches (grids) for each image in a dataset using the following script:
python -m gamma_geo.data_utils.get_patches
Then get CLIP-MMFE embeddings for each patch:
python -m gamma_geo.data_utils.get_sat_embeddings_sat2cap
Using the same script and function get_ground_embeds
one can create embeddings for ground level images.
To create text embeddings, run the following script:
python -m gamma_geo.data_utils.get_text_embeddings
To run our pre-training procedure with GOMAA-Geo, use the following commands:
python -m gomaa_geo.pretrain
Again, all parameters of interest must be specified in the config.py
file.
The weights of the trained llm network at each iteration will be locally saved in gomaa_geo/checkpoint/
To run training of the pre-trained model, use the following command:
python -m gomaa_geo.train
To run inference, run the following command:
python -m gomaa_geo.validate
Set the path to the pre-trained llm in the variable: cfg.train.llm_checkpoint
To visualize exploration behaviour of the trained model, run the following script:
python -m gomaa_geo.viz_path --idx=77 --start=0 --end=24
where, idx
is image id, start
is the starting position and end
is the goal position.
Download GOMAA-Geo models from the links below: Coming Soon ...
@article{sarkar2024gomaa,
title={GOMAA-Geo: GOal Modality Agnostic Active Geo-localization},
author={Sarkar, Anindya and Sastry, Srikumar and Pirinen, Aleksis and Zhang, Chongjie and Jacobs, Nathan and Vorobeychik, Yevgeniy},
journal={arXiv preprint arXiv:2406.01917},
year={2024}
}
Check out our lab website for other interesting works on geospatial understanding and mapping: