Incomplete instructions regarding 'Add New Bug Benchmark' #156

jose · 2023-12-04T01:13:26Z

Although this page/document provides some initial instructions on how to add a new bug benchmark, it does not provide instructions for the second and third points listed.

Benchmark Image: a Dockerfile describing how to construct the benchmark container

Benchmark metadata file: A Json file containing an array of objects with the following features

(The last sentence ends weirdly). It would be great if you could provide more info.

Marti2203 · 2023-12-04T15:45:51Z

Hi @jose ,
We appreciate your interest in Cerberus! I have pushed commit a111e52 to the dev branch with extra information in the aforementioned document, your feedback is highly appreciated

jose · 2023-12-07T00:21:24Z

Thanks @Marti2203 for your prompt reply.

I believe there is a typo in that commit. Where it's written

Create a new file in app/core/drivers/benchmarks/ with the Benchmark name (i.e. NewBenchmark.py) that contains the following code:

should be

Create a new file in app/drivers/benchmarks/ with the Benchmark name (i.e. NewBenchmark.py) that contains the following code:

right?

Another question that's not yet clear to me is, how does one create the meta-data.json file? Manually?

Marti2203 · 2023-12-07T08:13:48Z

Hi,
Thank you for catching that incorrect path. I will fix it in a commit now. To create the meta-data.json file, you can construct it manually or make a script that generates it. For benchmarks, which are more uniform in structure and metadata, I have used a script (examples are ITSP and Refactory benchmarks) .

brojackvn · 2024-09-03T05:30:52Z

Hello @Marti2203 , I would like to add a benchmark to Cerberus for experiment. Until now, I have not known or found the functions on Cerberus to process the output from running test cases and compiling in Benchmark. I am curious how Cerberus understands my custom output from running test cases and compiling, because I have not known the fixed format for these outputs.

Thank you.

Marti2203 · 2024-09-03T08:50:29Z

Hi @brojackvn ,
This processing has to be manually implemented in the Benchmark class.
In Cerberus, we have modified the scripts of the benchmarks to return a status code, which is subsequently read.
If this is not possible, you can read the standard output and standard error and process them if you know the benchmark's format. You would need to replace replace self.run_command with self.exec_command ( I see that the method does not exist but the point still stands, I can probably add this to the development branch if needed).

brojackvn · 2024-09-26T12:13:36Z

Hello @Marti2203 , I noticed that you have created a version of Defects4J in https://github.com/nus-apr/defects4j. Could you kindly explain the main purpose of this? I also see that you have included it in all benchmarks.

Thank you so much.

Marti2203 · 2024-09-26T12:18:50Z

Hi, we fork the benchmarks, such as Defects4J, to have a version of the repository where we can add additional metadata or files, which Cerberus uses. Initially, we stored a local copy but decided that using submodules would be more effective for storage.
I am not sure I understood the I also see that you have included it in all benchmarks

brojackvn · 2024-09-26T12:39:04Z

My point is that every benchmark has its own fork, similar to this repository https://github.com/nus-apr/defects4j.

In Defects4J, each bug includes files like "build_script", "config_script", "clean_script", and "test_script". Do I need to create a similar repository when adding a new benchmark? I'm still unclear on the purpose of this repository, especially when looking at the code for adding benchmark drivers here

Marti2203 · 2024-09-27T05:18:38Z

Having a repository with files, such as build_script, config_script helps provide a unified interface such that the different tools can use it and not care about the specifics of the benchmark. The Defects4J driver was written before this and has not been updated.
For example, tools in the repository like LLMR or Prompter read from the metadata and use these scripts.
Hopefully this was helpful and am happy to answer more questions :)

brojackvn · 2024-09-27T13:36:16Z

Thank you very much for your quick response.

I think I understand the requirements for adding a new benchmark. I am trying to run an example to understand the flow of Cerberus and make sure that I am able to run Cerberus for experiments. Here’s what I have done so far, step by step:

Activated the source environment source activate
Installed the required packages pip install -r requirements.txt in my environment.
Ran the command: cerberus -task repair --tool=arja --benchmark=defects4j --bug-index=1.
3.1. First, I received the following error message: "Could not get the submodule. Maybe the system asked for an SSH key, and it could not be provided." Instead of fixing this by adding an SSH key to GitHub, I traced the issue in the log file and ran git submodule update --init benchmark/defects4j manually, which resolved the error.
3.2. Then, I encountered another error: "[error] Image was not built successfully. Please check whether the file builds outside of Cerberus." I am unsure why Cerberus cannot build the benchmark image. Here's the log output:
```
[framework] loading benchmark defects4j Sep 27 22:54:49
[framework] benchmark environment not found for defects4j-benchmark Sep 27 22:54:49
[framework] building benchmark environment Sep 27 22:54:49
[framework] building docker image defects4j-benchmark Sep 27 22:54:49
Task repair failed Sep 27 22:55:38
[error] Image was not built successfully. Please check whether the file builds outside of Cerberus Sep 27 22:55:38
Error. Exiting... Sep 27 22:55:38
Task repair failed Sep 27 22:55:38
[error] unable to build image: unhandled exception Sep 27 22:55:38
Runtime Error Sep 27 22:55:38
Error. Exiting... Sep 27 22:55:38
```

Then I printed process of building image by adding a line of code "print(line["stream"])" at line 124 in container.py. I found an error here:

Get:129 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 software-properties-common all 0.99.9.12 [10.4 kB]
Get:130 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 unattended-upgrades all 2.3ubuntu0.3 [48.5 kB]
debconf: delaying package configuration, since apt-utils is not installed

Fetched 35.9 MB in 15s (2451 kB/s)
(Reading database ... 
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 4124 files and directories currently installed.)

Preparing to unpack .../libsystemd0_245.4-4ubuntu3.24_amd64.deb ...
Unpacking libsystemd0:amd64 (245.4-4ubuntu3.24) over (245.4-4ubuntu3.23) ...
Setting up libsystemd0:amd64 (245.4-4ubuntu3.24) ...
Selecting previously unselected package libssl1.1:amd64.
(Reading database ... 
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 4124 files and directories currently installed.)
Preparing to unpack .../libssl1.1_1.1.1f-1ubuntu2.23_amd64.deb ...
Unpacking libssl1.1:amd64 (1.1.1f-1ubuntu2.23) ...
Selecting previously unselected package libpython3.8-minimal:amd64.
Preparing to unpack .../libpython3.8-minimal_3.8.10-0ubuntu1~20.04.12_amd64.deb ...
Unpacking libpython3.8-minimal:amd64 (3.8.10-0ubuntu1~20.04.12) ...
Selecting previously unselected package libexpat1:amd64.
Preparing to unpack .../libexpat1_2.2.9-1ubuntu0.7_amd64.deb ...
Unpacking libexpat1:amd64 (2.2.9-1ubuntu0.7) ...
Selecting previously unselected package python3.8-minimal.
Preparing to unpack .../python3.8-minimal_3.8.10-0ubuntu1~20.04.12_amd64.deb ...
Unpacking python3.8-minimal (3.8.10-0ubuntu1~20.04.12) ...
Setting up libssl1.1:amd64 (1.1.1f-1ubuntu2.23) ...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7.)
debconf: falling back to frontend: Teletype
Setting up libpython3.8-minimal:amd64 (3.8.10-0ubuntu1~20.04.12) ...
Setting up libexpat1:amd64 (2.2.9-1ubuntu0.7) ...
Setting up python3.8-minimal (3.8.10-0ubuntu1~20.04.12) ...
dpkg: error processing package python3.8-minimal (--configure):
 installed python3.8-minimal package post-installation script subprocess was killed by signal (Killed)
Errors were encountered while processing:
 python3.8-minimal
E: Sub-process /usr/bin/dpkg returned an error code (1)

Could you help me figure out how to resolve this issue?

Marti2203 · 2024-09-27T14:33:03Z

Hi, let's take this to a different issue. This sounds like some dependency is stale and I will need to try rebuilding the docker image used by the benchmark ( the image is described by the Dockerfile in the repository of the benchmark)

brojackvn mentioned this issue Sep 27, 2024

Failed to build docker image for Defects4J benchmark #201

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incomplete instructions regarding 'Add New Bug Benchmark' #156

Incomplete instructions regarding 'Add New Bug Benchmark' #156

jose commented Dec 4, 2023

Marti2203 commented Dec 4, 2023

jose commented Dec 7, 2023

Marti2203 commented Dec 7, 2023

brojackvn commented Sep 3, 2024 •

edited

Loading

Marti2203 commented Sep 3, 2024

brojackvn commented Sep 26, 2024 •

edited

Loading

Marti2203 commented Sep 26, 2024

brojackvn commented Sep 26, 2024

Marti2203 commented Sep 27, 2024

brojackvn commented Sep 27, 2024 •

edited

Loading

Marti2203 commented Sep 27, 2024

Incomplete instructions regarding 'Add New Bug Benchmark' #156

Incomplete instructions regarding 'Add New Bug Benchmark' #156

Comments

jose commented Dec 4, 2023

Marti2203 commented Dec 4, 2023

jose commented Dec 7, 2023

Marti2203 commented Dec 7, 2023

brojackvn commented Sep 3, 2024 • edited Loading

Marti2203 commented Sep 3, 2024

brojackvn commented Sep 26, 2024 • edited Loading

Marti2203 commented Sep 26, 2024

brojackvn commented Sep 26, 2024

Marti2203 commented Sep 27, 2024

brojackvn commented Sep 27, 2024 • edited Loading

Marti2203 commented Sep 27, 2024

brojackvn commented Sep 3, 2024 •

edited

Loading

brojackvn commented Sep 26, 2024 •

edited

Loading

brojackvn commented Sep 27, 2024 •

edited

Loading