Skip to content
/ _LARC Public
forked from samacqua/LARC

Language-annotated Abstraction and Reasoning Corpus

License

Notifications You must be signed in to change notification settings

geometor/_LARC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language-complete Abstraction and Reasoning Corpus (LARC)

This repository contains the LARC dataset and supporting assets

The entire dataset can be browsed at the explorer interface take a look!

Here's a quick 5 minutes slideslive video explaining this work

"How can we build intelligent systems that achieve human-level performance on challenging and structured domains (like ARC), with or without additional human guidance? We posit the answer may be found in studying natural programs - instructions humans give to each other to communicate how to solve a task. Like a computer program, these instructions can be reliably "executed" by others to produce intended outputs."

A comprehensive view of this dataset and its goals can be found in Communicating Natural Programs to Humans and Machines (Neurips Dataset and Benchmark, 2022)

LARC is curated from a communication game, where one participant, the describer solves an ARC task and describes the solution to a different participant, the builder, who must solve the task on the new input using the description alone. The successful descriptions are "language-complete" in a sense that it fully captures the underlying ARC task in the absence of the original input-output examples.

drawing

Citation

@article{acquaviva2021communicating,
  title={Communicating Natural Programs to Humans and Machines},
  author={Acquaviva, Samuel and Pu, Yewen and Kryven, Marta and Wong, Catherine and Ecanow, Gabrielle E and Nye, Maxwell and Sechopoulos, Theodoros and Tessler, Michael Henry and Tenenbaum, Joshua B},
  journal={arXiv preprint arXiv:2106.07824},
  year={2021}
}

The original ARC data can be found here The Abstraction and Reasoning Corpus

Contents

  • dataset contains the language-complete ARC tasks and successful natural program phrase annotations
  • explorer contains the explorer code that allows for easy browsing of the annotated tasks
  • collection contains the source code used to curate the dataset
  • bandit contains the formulation and environment for bandit algorithm used for collection

language-guided program synthesis code can be found here

GPT4 (vision only) program induction results can be found here

License

The dataset is licensed under the Creative Commons Attribution 4.0 International License

All supporting code follows the MIT License

About

Language-annotated Abstraction and Reasoning Corpus

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 68.6%
  • HTML 19.6%
  • CSS 6.7%
  • Python 5.1%