Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete instructions regarding 'Add New Bug Benchmark' #156

Open
jose opened this issue Dec 4, 2023 · 11 comments
Open

Incomplete instructions regarding 'Add New Bug Benchmark' #156

jose opened this issue Dec 4, 2023 · 11 comments

Comments

@jose
Copy link

jose commented Dec 4, 2023

Hi @nus-apr,

Although this page/document provides some initial instructions on how to add a new bug benchmark, it does not provide instructions for the second and third points listed.

  • Benchmark Image: a Dockerfile describing how to construct the benchmark container
  • Benchmark metadata file: A Json file containing an array of objects with the following features

(The last sentence ends weirdly). It would be great if you could provide more info.

@Marti2203
Copy link
Collaborator

Hi @jose ,
We appreciate your interest in Cerberus! I have pushed commit a111e52 to the dev branch with extra information in the aforementioned document, your feedback is highly appreciated

@jose
Copy link
Author

jose commented Dec 7, 2023

Thanks @Marti2203 for your prompt reply.

I believe there is a typo in that commit. Where it's written

Create a new file in app/core/drivers/benchmarks/ with the Benchmark name (i.e. NewBenchmark.py) that contains the following code:

should be

Create a new file in app/drivers/benchmarks/ with the Benchmark name (i.e. NewBenchmark.py) that contains the following code:

right?

Another question that's not yet clear to me is, how does one create the meta-data.json file? Manually?

@Marti2203
Copy link
Collaborator

Hi,
Thank you for catching that incorrect path. I will fix it in a commit now. To create the meta-data.json file, you can construct it manually or make a script that generates it. For benchmarks, which are more uniform in structure and metadata, I have used a script (examples are ITSP and Refactory benchmarks) .

@brojackvn
Copy link

brojackvn commented Sep 3, 2024

Hello @Marti2203 , I would like to add a benchmark to Cerberus for experiment. Until now, I have not known or found the functions on Cerberus to process the output from running test cases and compiling in Benchmark. I am curious how Cerberus understands my custom output from running test cases and compiling, because I have not known the fixed format for these outputs.

Thank you.

@Marti2203
Copy link
Collaborator

Hi @brojackvn ,
This processing has to be manually implemented in the Benchmark class.
In Cerberus, we have modified the scripts of the benchmarks to return a status code, which is subsequently read.
If this is not possible, you can read the standard output and standard error and process them if you know the benchmark's format. You would need to replace replace self.run_command with self.exec_command ( I see that the method does not exist but the point still stands, I can probably add this to the development branch if needed).

@brojackvn
Copy link

brojackvn commented Sep 26, 2024

Hello @Marti2203 , I noticed that you have created a version of Defects4J in https://github.com/nus-apr/defects4j. Could you kindly explain the main purpose of this? I also see that you have included it in all benchmarks.

Thank you so much.

@Marti2203
Copy link
Collaborator

Hi, we fork the benchmarks, such as Defects4J, to have a version of the repository where we can add additional metadata or files, which Cerberus uses. Initially, we stored a local copy but decided that using submodules would be more effective for storage.
I am not sure I understood the I also see that you have included it in all benchmarks

@brojackvn
Copy link

My point is that every benchmark has its own fork, similar to this repository https://github.com/nus-apr/defects4j.

In Defects4J, each bug includes files like "build_script", "config_script", "clean_script", and "test_script". Do I need to create a similar repository when adding a new benchmark? I'm still unclear on the purpose of this repository, especially when looking at the code for adding benchmark drivers here

@Marti2203
Copy link
Collaborator

Having a repository with files, such as build_script, config_script helps provide a unified interface such that the different tools can use it and not care about the specifics of the benchmark. The Defects4J driver was written before this and has not been updated.
For example, tools in the repository like LLMR or Prompter read from the metadata and use these scripts.
Hopefully this was helpful and am happy to answer more questions :)

@brojackvn
Copy link

brojackvn commented Sep 27, 2024

Thank you very much for your quick response.

I think I understand the requirements for adding a new benchmark. I am trying to run an example to understand the flow of Cerberus and make sure that I am able to run Cerberus for experiments. Here’s what I have done so far, step by step:

  1. Activated the source environment source activate
  2. Installed the required packages pip install -r requirements.txt in my environment.
  3. Ran the command: cerberus -task repair --tool=arja --benchmark=defects4j --bug-index=1.
    3.1. First, I received the following error message: "Could not get the submodule. Maybe the system asked for an SSH key, and it could not be provided." Instead of fixing this by adding an SSH key to GitHub, I traced the issue in the log file and ran git submodule update --init benchmark/defects4j manually, which resolved the error.
    3.2. Then, I encountered another error: "[error] Image was not built successfully. Please check whether the file builds outside of Cerberus." I am unsure why Cerberus cannot build the benchmark image. Here's the log output:
    [framework] loading benchmark defects4j Sep 27 22:54:49
    [framework] benchmark environment not found for defects4j-benchmark Sep 27 22:54:49
    [framework] building benchmark environment Sep 27 22:54:49
    [framework] building docker image defects4j-benchmark Sep 27 22:54:49
    Task repair failed Sep 27 22:55:38
    [error] Image was not built successfully. Please check whether the file builds outside of Cerberus Sep 27 22:55:38
    Error. Exiting... Sep 27 22:55:38
    Task repair failed Sep 27 22:55:38
    [error] unable to build image: unhandled exception Sep 27 22:55:38
    Runtime Error Sep 27 22:55:38
    Error. Exiting... Sep 27 22:55:38
    

Then I printed process of building image by adding a line of code "print(line["stream"])" at line 124 in container.py. I found an error here:

Get:129 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 software-properties-common all 0.99.9.12 [10.4 kB]
Get:130 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 unattended-upgrades all 2.3ubuntu0.3 [48.5 kB]
debconf: delaying package configuration, since apt-utils is not installed

Fetched 35.9 MB in 15s (2451 kB/s)
(Reading database ... 
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 4124 files and directories currently installed.)

Preparing to unpack .../libsystemd0_245.4-4ubuntu3.24_amd64.deb ...
Unpacking libsystemd0:amd64 (245.4-4ubuntu3.24) over (245.4-4ubuntu3.23) ...
Setting up libsystemd0:amd64 (245.4-4ubuntu3.24) ...
Selecting previously unselected package libssl1.1:amd64.
(Reading database ... 
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 4124 files and directories currently installed.)
Preparing to unpack .../libssl1.1_1.1.1f-1ubuntu2.23_amd64.deb ...
Unpacking libssl1.1:amd64 (1.1.1f-1ubuntu2.23) ...
Selecting previously unselected package libpython3.8-minimal:amd64.
Preparing to unpack .../libpython3.8-minimal_3.8.10-0ubuntu1~20.04.12_amd64.deb ...
Unpacking libpython3.8-minimal:amd64 (3.8.10-0ubuntu1~20.04.12) ...
Selecting previously unselected package libexpat1:amd64.
Preparing to unpack .../libexpat1_2.2.9-1ubuntu0.7_amd64.deb ...
Unpacking libexpat1:amd64 (2.2.9-1ubuntu0.7) ...
Selecting previously unselected package python3.8-minimal.
Preparing to unpack .../python3.8-minimal_3.8.10-0ubuntu1~20.04.12_amd64.deb ...
Unpacking python3.8-minimal (3.8.10-0ubuntu1~20.04.12) ...
Setting up libssl1.1:amd64 (1.1.1f-1ubuntu2.23) ...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7.)
debconf: falling back to frontend: Teletype
Setting up libpython3.8-minimal:amd64 (3.8.10-0ubuntu1~20.04.12) ...
Setting up libexpat1:amd64 (2.2.9-1ubuntu0.7) ...
Setting up python3.8-minimal (3.8.10-0ubuntu1~20.04.12) ...
dpkg: error processing package python3.8-minimal (--configure):
 installed python3.8-minimal package post-installation script subprocess was killed by signal (Killed)
Errors were encountered while processing:
 python3.8-minimal
E: Sub-process /usr/bin/dpkg returned an error code (1)

Could you help me figure out how to resolve this issue?

@Marti2203
Copy link
Collaborator

Hi, let's take this to a different issue. This sounds like some dependency is stale and I will need to try rebuilding the docker image used by the benchmark ( the image is described by the Dockerfile in the repository of the benchmark)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants