Skip to content

Commit

Permalink
Merge branch 'acl-org:master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
yufei118liu authored Jul 12, 2024
2 parents 17c4047 + f2b9278 commit f73c620
Show file tree
Hide file tree
Showing 118 changed files with 19,826 additions and 208 deletions.
Binary file added .coverage
Binary file not shown.
8 changes: 8 additions & 0 deletions .github/ISSUE_TEMPLATE/02-name-correction.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,14 @@ labels: ["correction", "metadata"]
assignees:
- anthology-assist
body:
- type: markdown
attributes:
value: >
This form will report author metadata issues to Anthology staff.
For simple cases (where paper metadata in the [XML](https://github.com/acl-org/acl-anthology/tree/master/data/xml)
record doesn't match the PDF, or
[`name_variants.yaml`](https://github.com/acl-org/acl-anthology/blob/master/data/yaml/name_variants.yaml) needs modification),
submitting a __pull request__ instead will expedite the process. Thanks!
- type: textarea
id: name_pages_affected
attributes:
Expand Down
18 changes: 16 additions & 2 deletions .github/ISSUE_TEMPLATE/04-ingestion-request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,19 @@ body:
placeholder: ex. emnlp, repl4nlp
validations:
required: true
- type: input
id: venue_sig
attributes:
label: "ACL SIG(s) sponsoring or endorsing the whole venue"
description: |
Provide a comma-separated list of any SIGs that apply to the whole venue. If there are multiple subvenues/volumes with different SIGs, provide the mapping under Supporting Information.
placeholder: ex. SIGLEX, SIGSEM
- type: input
id: volume_title
attributes:
label: Volume Title
description: |
What is the title of the volume that should be published?
What is the title of the (main) volume that should be published?
placeholder: ex. Proceedings of the 2019 Meeting of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
validations:
required: true
Expand Down Expand Up @@ -61,9 +68,16 @@ body:
description: |
When would you like the material to be published on the ACL Anthology? If you are submitting material that can be published immediately (e.g. for conferences that already happened in the past), you can leave this field blank.
placeholder: ex. 2023-12-31
- type: input
id: volume_address
attributes:
label: Location
description: |
What address should be included in bibliography entries, if any? For conferences this is the location of the conference. For a fully-online event use "Online", optionally following the host team location. Ensure the address field is consistent across submitted volumes.
placeholder: ex. Barcelona, Spain (Online)
- type: textarea
id: ingestion_information
attributes:
label: Supporting Information
description: |
If there is anything else we should know about this ingestion request, please provide the information here. You can also use this field to **provide links or attach files** of the material, if you already have them.
If there is anything else we should know about this ingestion request, please provide the information here. E.g. for venues with multiple volumes, list them with the volume identifier, volume title, and any SIGs for the volume. You can also use this field to **provide links or attach files** of the material, if you already have them.
11 changes: 11 additions & 0 deletions .github/ingestion-review-checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
1. [ ] In the Github sidebar, add workshop to work items and the current milestone
1. [ ] In the Github sidebar, make sure to link to a corresponding PR under "Development"
1. [ ] Make sure the branch is merged with the latest `master` branch
1. [ ] Ensure that there are editors listed in the `<meta>` block
1. [ ] If it's a workshop, add a `<venue>ws</venue>` tag
1. [ ] Add events to their relevant SIGs
1. [ ] Look at the venue listing for prior years, and ensure that the new volume titles are consistent. You can do this by clicking on the venue name from a paper page, which will take you to the vendor listing.
1. [ ] Navigate to the event page preview (e.g., https://preview.aclanthology.org/icnlsp-ingestion/events/icnlsp-2021/), and page through, to see if there are any glaring mistakes
1. [ ] Skim through the complete listing, looking for mis-parsed author names.
1. [ ] Download the frontmatter and verify that the table of contents matches at least three randomly-selected papers
1. [ ] Download 3–5 PDFs (including the first and last one) and make sure they are correct (title, authors, page numbers).
32 changes: 32 additions & 0 deletions .github/workflows/link-to-checklist.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: link-to-checklist

on:
workflow_dispatch:
pull_request_target:
types: [opened]

jobs:
add-review-checklist:
if : ${{ github.event_name == 'pull_request_target' && github.event.action == 'opened' && startsWith(github.event.pull_request.title, 'ingestion') == true}}
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v3
- name: Print event details
run: |
echo "Event Name: ${{ github.event_name }}"
echo "Action: ${{ github.event.action }}"
echo "PR Title: ${{ github.event.pull_request.title }}"
echo "Starts with ingestion: ${{ startsWith(github.event.pull_request.title, 'ingestion') }}"
- name: Add review checklist
uses: actions/github-script@v6
with:
script: |
const fs = require('fs');
const review_checklist = fs.readFileSync('.github/ingestion-review-checklist.md', 'utf8');
github.rest.pulls.update({
pull_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: context.payload.pull_request.body + review_checklist,
});
16 changes: 16 additions & 0 deletions .github/workflows/print-info.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: print-info

on:
workflow_dispatch:
pull_request_target:
types: [opened]

jobs:
add-review-checklist:
runs-on: ubuntu-latest
steps:
- name: Print event details
run: |
echo "Event Name: ${{ github.event_name }}"
echo "Action: ${{ github.event.action }}"
echo "PR Title: ${{ github.event.pull_request.title }}"
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Icon

#######################################
**/__pycache__
.idea

# generated website
/build/
Expand Down
13 changes: 7 additions & 6 deletions bin/ingest_mitpress.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ def get_article_journal_info(xml_front_node: etree.Element, is_tacl: bool) -> st
date=string_date_text,
)
logging.debug(format_string.format(**data))
return format_string.format(**data), issue_text
return format_string.format(**data), issue_text, volume_text


def process_xml(xml: Path, is_tacl: bool) -> Optional[etree.Element]:
Expand All @@ -266,7 +266,7 @@ def process_xml(xml: Path, is_tacl: bool) -> Optional[etree.Element]:
root = tree.getroot()
front = root.find("front", root.nsmap)

info, issue = get_article_journal_info(front, is_tacl)
info, issue, volume = get_article_journal_info(front, is_tacl)

paper = etree.Element("paper")

Expand Down Expand Up @@ -303,11 +303,11 @@ def process_xml(xml: Path, is_tacl: bool) -> Optional[etree.Element]:
pages.text = "–".join(pages_tuple) # en-dash, not hyphen!
paper.append(pages)

return paper, info, issue
return paper, info, issue, volume


def issue_info_to_node(
issue_info: str, year_: str, volume_id: str, venue: str
issue_info: str, year_: str, journal_issue: str, venue: str, volume: str
) -> etree.Element:
"""Creates the meta block for a new issue / volume"""
meta = make_simple_element("meta")
Expand Down Expand Up @@ -339,6 +339,7 @@ def issue_info_to_node(

make_simple_element("year", str(year_), parent=meta)
make_simple_element("venue", venue, parent=meta)
make_simple_element("journal-volume", volume, parent=meta)

return meta

Expand Down Expand Up @@ -378,7 +379,7 @@ def main(args):

papers = []
for xml in sorted(args.root_dir.glob("*.xml")):
papernode, issue_info, issue = process_xml(xml, is_tacl)
papernode, issue_info, issue, volume = process_xml(xml, is_tacl)
if papernode is None or papernode.find("title").text.startswith("Erratum: “"):
continue

Expand Down Expand Up @@ -416,7 +417,7 @@ def sort_papers_by_page(paper_tuple):
"volume", attrib={"id": issue, "type": "journal"}, parent=collection
)
volume_xml.append(
issue_info_to_node(issue_info, year, collection_id, venue)
issue_info_to_node(issue_info, year, issue, venue, volume)
)
paper_id = 1
else:
Expand Down
1 change: 1 addition & 0 deletions bin/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
filelock==3.15.1
black~=23.9.0
citeproc-py
citeproc-py-styles
Expand Down
2 changes: 1 addition & 1 deletion data/xml/2014.eamt.xml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@
<paper id="6">
<title>Translation model based weighting for phrase extraction</title>
<author><first>Saab</first><last>Mansour</last></author>
<author><first>Herman</first><last>Ney</last></author>
<author><first>Hermann</first><last>Ney</last></author>
<pages>35–43</pages>
<url hash="3956ad0c">2014.eamt-1.6</url>
<bibkey>mansour-ney-2014-translation</bibkey>
Expand Down
2 changes: 2 additions & 0 deletions data/xml/2020.aacl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -462,6 +462,7 @@
<bibkey>nadeem-etal-2020-systematic</bibkey>
<pwccode url="https://github.com/moinnadeem/characterizing-sampling-algorithms" additional="false">moinnadeem/characterizing-sampling-algorithms</pwccode>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="37">
<title><fixed-case>C</fixed-case>hinese Content Scoring: Open-Access Datasets and Features on Different Segmentation Levels</title>
Expand Down Expand Up @@ -1553,6 +1554,7 @@
<url hash="ba6e2aa3">2020.aacl-demo.6</url>
<bibkey>wang-etal-2020-fairseq</bibkey>
<pwccode url="https://github.com/pytorch/fairseq" additional="true">pytorch/fairseq</pwccode>
<pwcdataset url="https://paperswithcode.com/dataset/covost2">CoVoST2</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/librispeech">LibriSpeech</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/must-c">MuST-C</pwcdataset>
</paper>
Expand Down
4 changes: 3 additions & 1 deletion data/xml/2020.acl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -995,7 +995,7 @@
<doi>10.18653/v1/2020.acl-main.68</doi>
<video href="http://slideslive.com/38928912"/>
<bibkey>li-etal-2020-rigid</bibkey>
<pwccode url="https://github.com/lipiji/SongNet" additional="false">lipiji/SongNet</pwccode>
<pwccode url="https://github.com/lipiji/SongNet" additional="true">lipiji/SongNet</pwccode>
</paper>
<paper id="69">
<title>Syn-<fixed-case>QG</fixed-case>: Syntactic and Shallow Semantic Rules for Question Generation</title>
Expand Down Expand Up @@ -4006,6 +4006,7 @@
<bibkey>press-etal-2020-improving</bibkey>
<pwccode url="" additional="true"/>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="271">
<title>Single Model Ensemble using Pseudo-Tags and Distinct Vectors</title>
Expand Down Expand Up @@ -12696,6 +12697,7 @@
<bibkey>bhatt-etal-2020-much</bibkey>
<pwccode url="https://github.com/bhattg/Decay-RNN-ACL-SRW2020" additional="false">bhattg/Decay-RNN-ACL-SRW2020</pwccode>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="34">
<title>Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining</title>
Expand Down
1 change: 1 addition & 0 deletions data/xml/2020.clinicalnlp.xml
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,7 @@
<doi>10.18653/v1/2020.clinicalnlp-1.16</doi>
<video href="https://slideslive.com/38939821"/>
<bibkey>luo-etal-2020-knowledge</bibkey>
<pwccode url="" additional="true"/>
<pwcdataset url="https://paperswithcode.com/dataset/guesswhat">GuessWhat?!</pwcdataset>
</paper>
<paper id="17">
Expand Down
3 changes: 2 additions & 1 deletion data/xml/2020.coling.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4761,6 +4761,7 @@
<pwcdataset url="https://paperswithcode.com/dataset/glue">GLUE</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/squad">SQuAD</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="356">
<title>How <fixed-case>LSTM</fixed-case> Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text</title>
Expand Down Expand Up @@ -7997,7 +7998,7 @@
<url hash="355d274d">2020.coling-main.598</url>
<doi>10.18653/v1/2020.coling-main.598</doi>
<bibkey>chan-etal-2020-germans</bibkey>
<pwccode url="https://github.com/dbmdz/berts" additional="true">dbmdz/berts</pwccode>
<pwccode url="https://github.com/dbmdz/berts" additional="false">dbmdz/berts</pwccode>
</paper>
<paper id="599">
<title>Language Model Transformers as Evaluators for Open-domain Dialogues</title>
Expand Down
1 change: 1 addition & 0 deletions data/xml/2020.conll.xml
Original file line number Diff line number Diff line change
Expand Up @@ -643,6 +643,7 @@
<doi>10.18653/v1/2020.conll-1.49</doi>
<bibkey>eisape-etal-2020-cloze</bibkey>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="50">
<title>Disentangling dialects: a neural approach to <fixed-case>I</fixed-case>ndo-<fixed-case>A</fixed-case>ryan historical phonology and subgrouping</title>
Expand Down
6 changes: 4 additions & 2 deletions data/xml/2020.emnlp.xml
Original file line number Diff line number Diff line change
Expand Up @@ -413,12 +413,12 @@
<doi>10.18653/v1/2020.emnlp-main.26</doi>
<video href="https://slideslive.com/38938866"/>
<bibkey>madureira-schlangen-2020-incremental</bibkey>
<revision id="1" href="2020.emnlp-main.26v1" hash="09d22bbc"/>
<revision id="2" href="2020.emnlp-main.26v2" hash="3ba95a3f" date="2024-05-07">Added a few missing citations and fixed results of a previously wrong implementation of one secondary evaluation metric.</revision>
<pwccode url="https://github.com/briemadu/inc-bidirectional" additional="false">briemadu/inc-bidirectional</pwccode>
<pwcdataset url="https://paperswithcode.com/dataset/atis">ATIS</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/ontonotes-5-0">OntoNotes 5.0</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/snips">SNIPS</pwcdataset>
<revision id="1" href="2020.emnlp-main.26v1" hash="09d22bbc"/>
<revision id="2" href="2020.emnlp-main.26v2" hash="3ba95a3f" date="2024-05-07">Added a few missing citations and fixed results of a previously wrong implementation of one secondary evaluation metric.</revision>
</paper>
<paper id="27">
<title>Augmented Natural Language for Generative Sequence Labeling</title>
Expand Down Expand Up @@ -6348,6 +6348,7 @@
<bibkey>shen-etal-2020-blank</bibkey>
<pwccode url="https://github.com/Varal7/blank_language_model" additional="false">Varal7/blank_language_model</pwccode>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="421">
<title><fixed-case>COD3S</fixed-case>: Diverse Generation with Discrete Semantic Signatures</title>
Expand Down Expand Up @@ -9712,6 +9713,7 @@
<video href="https://slideslive.com/38938778"/>
<bibkey>khoury-etal-2020-vector</bibkey>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="641">
<title>The importance of fillers for text representations of speech transcripts</title>
Expand Down
1 change: 1 addition & 0 deletions data/xml/2020.eval4nlp.xml
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,7 @@
<bibkey>dudy-bedrick-2020-words</bibkey>
<pwccode url="https://github.com/shiranD/word_level_evaluation" additional="false">shiranD/word_level_evaluation</pwccode>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="14">
<title>On Aligning <fixed-case>O</fixed-case>pen<fixed-case>IE</fixed-case> Extractions with Knowledge Bases: A Case Study</title>
Expand Down
4 changes: 4 additions & 0 deletions data/xml/2020.findings.xml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@
<bibkey>huang-etal-2020-reducing</bibkey>
<pwcdataset url="https://paperswithcode.com/dataset/sst">SST</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="8">
<title>Improving Text Understanding via Deep Syntax-Semantics Communication</title>
Expand Down Expand Up @@ -3739,6 +3740,7 @@
<doi>10.18653/v1/2020.findings-emnlp.250</doi>
<bibkey>lioutas-etal-2020-improving</bibkey>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="251">
<title><fixed-case>P</fixed-case>harm<fixed-case>MT</fixed-case>: A Neural Machine Translation Approach to Simplify Prescription Directions</title>
Expand Down Expand Up @@ -6438,6 +6440,7 @@
<pwcdataset url="https://paperswithcode.com/dataset/ncbi-disease-1">NCBI Disease</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/sst">SST</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="435">
<title><fixed-case>E</fixed-case>xploiting <fixed-case>U</fixed-case>nsupervised <fixed-case>D</fixed-case>ata for <fixed-case>E</fixed-case>motion <fixed-case>R</fixed-case>ecognition in <fixed-case>C</fixed-case>onversations</title>
Expand Down Expand Up @@ -6468,6 +6471,7 @@
<pwcdataset url="https://paperswithcode.com/dataset/imdb-movie-reviews">IMDb Movie Reviews</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/sst">SST</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="437">
<title>Speaker or Listener? The Role of a Dialog Agent</title>
Expand Down
1 change: 1 addition & 0 deletions data/xml/2020.msr.xml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
<bibkey>mille-etal-2020-third</bibkey>
<pwccode url="https://gitlab.com/talnupf/ud2deep" additional="false">talnupf/ud2deep</pwccode>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="2">
<title><fixed-case>BME</fixed-case>-<fixed-case>TUW</fixed-case> at <fixed-case>SR</fixed-case>’20: Lexical grammar induction for surface realization</title>
Expand Down
1 change: 1 addition & 0 deletions data/xml/2020.scil.xml
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,7 @@
<bibkey>hu-etal-2020-closer</bibkey>
<pwccode url="https://github.com/jennhu/reflexive-anaphor-licensing" additional="false">jennhu/reflexive-anaphor-licensing</pwccode>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-103">WikiText-103</pwcdataset>
<pwcdataset url="https://paperswithcode.com/dataset/wikitext-2">WikiText-2</pwcdataset>
</paper>
<paper id="40">
<title><fixed-case>M</fixed-case>ona<fixed-case>L</fixed-case>og: a Lightweight System for Natural Language Inference Based on Monotonicity</title>
Expand Down
Loading

0 comments on commit f73c620

Please sign in to comment.