Skip to content

DoSC is a dataset for benchmarking software analysis techniques that dynamically discover semantic changes.

License

Notifications You must be signed in to change notification settings

Chenguang-Zhu/DoSC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DoSC

DoSc logo
DOI

About

DoSC is a dataset created for benchmarking techniques that dynamically discover semantic changes, including Semantic History Slicing techniques and Dynamic Feature Location techniques.

License

DoSC is distributed under Apache license. See license.txt for details.

People

Chenguang Zhu [email protected]
Yi Li [email protected]
Julia Rubin [email protected]
Marsha Chechik [email protected]  

Publication

C. Zhu, Y. Li, J. Rubin, and M. Chechik, “A Dataset for Dynamic Discovery of Semantic Changes in Version Controlled Software Histories,” in Proc. of MSR’17, May 2017, pp. 523–526.

Dataset

Usage

  1. Pick a functionality from the dataset. (See the table below.)
  2. View its meta-data file. See meta-data collections.
  3. Access the repository of the project via the link provided in the project url field of the meta-data.
  4. Use git clone command to clone the project repository to the user's local file system.
  5. Extract the names of all test methods listed in the test suite field of the meta-data.
  6. Use the extracted test cases and the history segment specified by the starting point (field history start) and the ending point (field history end) as the input on which to run the history slicing tool.
  7. Compare the resulting semantic history slice with the 1-minimal ground truth we provide (field history slice).
  8. Repeat the steps 1-6 until the evaluation is sufficient.

Overview of the dataset

Columns in the table:

  • Functionality ID: The JIRA issue key of the functionality - a unique identifier originally assigned by developers in the JIRA issue tracking system.
  • History Start: The starting point of the history segment where the functionality was developed. It is the SHA-1 ID of a release commit, which is the closest release version before the functionality was developed.
  • History End: The ending point of the history segment where the functionality was developed. It is the SHA-1 ID of the closest release version after the functionality was developed.
  • #Commits: The length of history of developing a functionality.
  • #Files Edited: The number of files changed during the development of the functionality.
  • #LOC +: The number of code lines inserted during the development of the functionality.
  • #LOC -: The number of code lines deleted during the development of the functionality.
  • #Test Cases: The number of test cases in the associated test suite of the functionality.
  • Slice Size: The size of the 1-minimal history slice of each functionality, expressed as the number of commits.
  • Reduction %: Reduction rate. It stands for the proportion of the commits unrelated to its implementation.
Functionality ID History Start History End #Commits #Files Edited #LOC + #LOC - #Test cases Slice Size Reduction %
LANG-825 bae9f7c3 15a51f1d 475 265 27630 11935 2 118 75.16
LANG-839 bae9f7c3 15a51f1d 475 265 27630 11935 2 200 57.89
LANG-841 bae9f7c3 15a51f1d 475 265 27630 11935 2 200 57.89
LANG-906 bae9f7c3 15a51f1d 475 265 27630 11935 5 1 99.79
LANG-834 c4ecd75 66a3717 179 121 6889 1807 12 12 93.3
LANG-944 c4ecd75 66a3717 179 121 6889 1807 1 24 86.59
LANG-993 24767d6 76cc69c 262 146 6741 2076 10 6 97.71
LANG-999 24767d6 76cc69c 262 146 6741 2076 5 15 94.27
LANG-1006 24767d6 76cc69c 262 146 6741 2076 2 14 94.66
LANG-1033 24767d6 76cc69c 262 146 6741 2076 1 22 91.6
LANG-1088 24767d6 76cc69c 262 146 6741 2076 2 1 99.62
LANG-536 24767d6 76cc69c 262 146 6741 2076 17 30 88.55
LANG-883 24767d6 76cc69c 262 146 6741 2076 1 36 86.26
LANG-1015 24767d6 76cc69c 262 146 6741 2076 9 39 85.11
LANG-1021 24767d6 76cc69c 262 146 6741 2076 16 28 89.31
LANG-1080 24767d6 76cc69c 262 146 6741 2076 8 38 85.5
LANG-1093 24767d6 76cc69c 262 146 6741 2076 2 63 75.95
LANG-1050 0d5d666 36f98d8 515 309 18885 6395 4 8 98.45
LANG-1074 0d5d666 36f98d8 515 309 18885 6395 9 6 98.83
LANG-1119 0d5d666 36f98d8 515 309 18885 6395 1 1 99.81
CALCITE-627 f10ea367 d60f2aa3 51 135 8274 1446 2 19 62.75
CALCITE-655 f10ea367 d60f2aa3 51 135 8274 1446 1 19 62.75
CALCITE-674 d60f2aa3 495f1859 59 196 14861 9173 1 11 81.36
CALCITE-718 495f1859 0c0c203d 92 304 21348 7686 1 14 84.78
CALCITE-758 495f1859 0c0c203d 92 304 21348 7686 1 1 98.91
CALCITE-811 495f1859 0c0c203d 92 304 21348 7686 1 1 98.91
CALCITE-803 495f1859 0c0c203d 92 304 21348 7686 1 1 98.91
CALCITE-925 0c0c203d ba6e43c6 120 468 68314 6096 3 1 99.17
CALCITE-767 ba6e43c6 c4d346b0 103 465 31647 13594 1 8 92.23
CALCITE-996 ba6e43c6 c4d346b0 103 465 31647 13594 1 1 99.03
CALCITE-1003 ba6e43c6 c4d346b0 103 465 31647 13594 25 14 86.41
CALCITE-1028 ba6e43c6 c4d346b0 103 465 31647 13594 1 6 94.17
CALCITE-1168 8eebfc6d aeb6bf14 122 399 30975 4800 3 2 98.36
CALCITE-1200 8eebfc6d aeb6bf14 122 399 30975 4800 3 2 98.36
CALCITE-991 aeb6bf14 08c56b15 78 295 14908 3637 5 1 98.72
CALCITE-1288 aeb6bf14 08c56b15 78 295 14908 3637 1 6 92.31
CALCITE-1309 aeb6bf14 08c56b15 78 295 14908 3637 8 7 91.03
CALCITE-1337 aeb6bf14 08c56b15 78 295 14908 3637 2 5 93.59
MNG-4904 b175144 308d4d4 51 78 1816 713 1 7 86.27
MNG-4909 b175144 308d4d4 51 78 1816 713 2 7 86.27
MNG-4910 b175144 308d4d4 51 78 1816 713 1 7 86.27
MNG-4953 38ced22 0023226 47 96 2448 329 1 6 87.23
MNG-5159 089a9f8 6d37598 120 318 3003 1098 4 2 98.33
MNG-5530 b7e3ce2 ea8b2b0 97 160 4431 4144 1 1 98.97
MNG-5549 b7e3ce2 ea8b2b0 97 160 4431 4144 1 13 86.6
MNG-5754 d13c288 cab6659 97 235 9500 3930 4 8 91.75
MNG-5755 d13c288 cab6659 97 235 9500 3930 5 7 92.78
MNG-5767 d13c288 cab6659 97 235 9500 3930 3 21 78.35
MNG-5805 0ddab5f bb52d85 98 341 3751 3030 2 11 88.78
COMPRESS-295 083e7a4 1dcab3f 169 181 6638 1580 2 1 99.41
COMPRESS-296 083e7a4 1dcab3f 169 181 6638 1580 3 37 78.11
COMPRESS-313 083e7a4 1dcab3f 169 181 6638 1580 3 40 76.33
COMPRESS-327 99bc508 b29395d 148 144 4644 2006 18 26 82.43
COMPRESS-368 99bc508 b29395d 148 144 4644 2006 6 12 91.89
COMPRESS-369 99bc508 b29395d 148 144 4644 2006 2 10 93.24
COMPRESS-373 99bc508 b29395d 148 144 4644 2006 1 14 90.54
COMPRESS-374 99bc508 b29395d 148 144 4644 2006 8 15 89.86
COMPRESS-375 99bc508 b29395d 148 144 4644 2006 2 1 99.32
FLUME-1710 31d45f1b f7560038 133 258 15949 2783 1 1 99.25
FLUME-2052 cda3bd10 31d45f1b 101 181 14742 3097 5 3 97.03
FLUME-2056 cda3bd10 31d45f1b 101 181 14742 3097 1 5 95.05
FLUME-2130 cda3bd10 31d45f1b 101 181 14742 3097 1 3 97.03
FLUME-2206 cda3bd10 31d45f1b 101 181 14742 3097 1 4 96.04
FLUME-2498 f7560038 5e400ea8 100 428 17341 8187 17 65 35
FLUME-2628 f7560038 5e400ea8 100 428 17341 8187 7 1 99
FLUME-2955 f7560038 5e400ea8 100 428 17341 8187 1 65 35
FLUME-2982 f7560038 5e400ea8 100 428 17341 8187 2 35 65
PDFBOX-3307 a281f71 9e102f2 37 42 1138 268 2 1 97.3
PDFBOX-3069 3b5ae83 5848e90 272 255 9737 5398 2 1 99.63
PDFBOX-3418 3b5ae83 5848e90 272 255 9737 5398 2 3 98.9
PDFBOX-3461 3b5ae83 5848e90 272 255 9737 5398 24 3 98.9
PDFBOX-3262 7c1a2c8 69a8e03 162 135 3295 814 1 2 98.77
CONFIGURATION-466 5270237 f81ff1a 252 694 79920 80096 3 13 94.84
CONFIGURATION-624 89428f1 9fb4ad8 50 34 1201 655 11 48 4
CONFIGURATION-626 89428f1 9fb4ad8 50 34 1201 655 4 1 98
NET-436 d8812a3 4c3860e 77 99 2357 774 5 7 90.9
NET-525 d483631 abd6711 269 233 6845 2393 14 40 85.13
NET-527 d483631 abd6711 269 233 6845 2393 1 40 85.13
CSV-159 b230a6f5 7310e5c6 79 28 1640 713 1 10 87.34
CSV-175 b230a6f5 7310e5c6 79 28 1640 713 11 48 39.24
CSV-179 b230a6f5 7310e5c6 79 28 1640 713 1 56 29.11
CSV-180 b230a6f5 7310e5c6 79 28 1640 713 2 56 29.11
IO-126 61519de4 f6724182 140 140 7365 1242 2 6 95.71
IO-129 61519de4 f6724182 140 140 7365 1242 7 10 92.86
IO-130 61519de4 f6724182 140 140 7365 1242 4 11 92.14
IO-135 61519de4 f6724182 140 140 7365 1242 4 23 83.57
IO-138 61519de4 f6724182 140 140 7365 1242 7 13 90.71
IO-144 61519de4 f6724182 140 140 7365 1242 2 1 99.29
IO-145 61519de4 f6724182 140 140 7365 1242 2 61 56.43
IO-148 61519de4 f6724182 140 140 7365 1242 2 30 78.57
IO-153 61519de4 f6724182 140 140 7365 1242 6 56 60
IO-173 8de491fc b1b9f1af 136 182 5647 1681 2 32 76.47
IO-275 8de491fc b1b9f1af 136 182 5647 1681 2 1 99.26
IO-288 8de491fc b1b9f1af 136 182 5647 1681 81 16 88.24
IO-290 8de491fc b1b9f1af 136 182 5647 1681 2 5 96.32
IO-291 8de491fc b1b9f1af 136 182 5647 1681 10 24 82.35
IO-297 8de491fc b1b9f1af 136 182 5647 1681 9 13 90.44
IO-305 8de491fc b1b9f1af 136 182 5647 1681 10 83 38.97

About

DoSC is a dataset for benchmarking software analysis techniques that dynamically discover semantic changes.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages