Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on the new Anthology website #170

Closed
mbollmann opened this issue Mar 10, 2019 · 118 comments
Closed

Feedback on the new Anthology website #170

mbollmann opened this issue Mar 10, 2019 · 118 comments
Assignees
Labels
help wanted Interesting but beyond current volunteer bandwidth

Comments

@mbollmann
Copy link
Member

mbollmann commented Mar 10, 2019

This thread is intended to collect all feedback, suggestions, bug reports, etc. for the new Anthology website in the static-rewrite branch.

(Edit: live demo here at http://aclweb.org/anthology)

If you do not have a GitHub account, you're also welcome to send me feedback via e-mail ([email protected]) or Twitter (@mmbollmann)!

Known Issues

  • The search functionality now uses Google Custom Search. We're still finetuning its settings and waiting for some pages to be indexed, so please don't report any weird search behaviour just yet.
  • Author name variations (Authors being stored under multiple spellings #86) are an open problem that we plan to address before the site launch.
@mbollmann mbollmann added the help wanted Interesting but beyond current volunteer bandwidth label Mar 10, 2019
@mbollmann mbollmann self-assigned this Mar 10, 2019
@mbollmann mbollmann pinned this issue Mar 10, 2019
@akoehn
Copy link
Member

akoehn commented Mar 11, 2019

I really like it, especially the speed!

There is a display:none span containing the text "bib" in the bibtex block inside the acl-paper-link-block block. When using a text browser, this leads to the text being BibTeXbib. That span should be removed.

As a minor comment: Could you specify the hardware requirements for building the anthology a bit? How much time & memory does building take? "a considerable amount of memory" could be 8GB or 512, depending on whom you ask :-)

@davidweichiang
Copy link
Collaborator

It looks great! On Safari, when you click on on pdf/bib link and then click the browser's back button, the little callout ("Open PDF" or "Export BibTeX") remains on.

@danielgildea
Copy link
Collaborator

awesome!!!!!!!!!

@texttheater
Copy link
Contributor

I think it would look better if the header had the same width as the content. I.e., the ACL logo would move to the left and the search box to the right, in order to align with the content.

@desilinguist
Copy link
Member

Looks awesome! Great work! 👏

@mjpost
Copy link
Member

mjpost commented Mar 11, 2019

What's the reason for inserting newlines in the bib field values? (for example, in booktitle here, and titles elsewhere).

@stevenbedrick
Copy link

Disclaimer: This is about search, but is not about weird search behavior as such. Is Google Custom Search the long-term search solution for the new version of the Anthology? It is inherently waaaaay less functional than the existing search system on the current Anthology- for example, the current search page has really great result faceting, etc.

@stevenbedrick
Copy link

And I just saw #165 - glad to see that something more flexible is on the roadmap/radar. In the meantime, we could also link to the DFKI "ACL Anthology Searchbench".

@aryamccarthy
Copy link
Member

On mobile, the magnifying glass of the search bar gets forced to the next row for me.

@aryamccarthy
Copy link
Member

Is the BibTeX generation handling special characters properly?

This entry has weird quotation marks in the abstract. http://aclweb.org/anthology/papers/C/C18/C18-1137.bib
This one has weird things going on in the title field. http://aclweb.org/anthology/papers/K/K18/K18-3001.bib

@danielhers
Copy link

When there is just one paper in a conference, the noun after the number should be singular "paper" and not "papers".
Example: Proceedings of the Pilot SENSEVAL 1 papers in http://www.aclweb.org/anthology/venues/semeval/

@rahular
Copy link

rahular commented Mar 12, 2019

Awesome work! One small issue I saw is that when I am browsing through papers in pages like this, there is no way for me to scroll back to the top instantly. The up button which is present at the beginning of the page could be floating around a corner.

@danielgildea
Copy link
Collaborator

Is the BibTeX generation handling special characters properly?

Fixed by [6bbc5a1]

@davidweichiang
Copy link
Collaborator

Re: #170 (comment), when I view in Chrome or iOS Safari, I see mojibake, but on macOS Safari, it looks fine.

Although @danielgildea's fix puts the .bib file into ASCII (as it should be), I wonder if, as a failsafe, can the server put Content-Type: application/x-bibtex; charset=utf-8 into the response header?

@danielgildea
Copy link
Collaborator

What's the reason for inserting newlines in the bib field values? (for example, in booktitle here, and titles elsewhere).

anth2bib.py is just passing through newlines that are in the titles in the xml files.
I can't figure out where they come from originally. Personally, I think they make the bibtex more readable anyway.

anth2bib.py does insert newlines between author names. I think this makes it more readable,
especially when names are in "Last, First" format.

@aryamccarthy
Copy link
Member

Is the BibTeX generation handling special characters properly?

Fixed by [6bbc5a1]

I'm seeing "CoNLL–SIGMORPHON" in macOS Safari, instead of "CoNLL–SIGMORPHON". Does the build script need to be re-run to show the fix?

@mbollmann
Copy link
Member Author

Does the build script need to be re-run to show the fix?

Absolutely. Fixes are not reflected on the live website until @mjpost rebuilds it and pushes it there.

@mjpost
Copy link
Member

mjpost commented Mar 12, 2019

I agree the one-line-per-author variant is more readable and is fine with me, as long as we make sure to use spaces and not tabs (per #16).

I'll rebuild soon, by tonight at the latest. Once we have continuous integration checks built (#102) and other checks against commits to the master branch, we can have it automated.

@mbollmann
Copy link
Member Author

mbollmann commented Mar 12, 2019

Thanks for all the feedback so far! I've implemented a bunch of minor layout fixes based on the comments here (with the same caveat as above: will not be live until Matt rebuilds).

Disclaimer: This is about search, but is not about weird search behavior as such. Is Google Custom Search the long-term search solution for the new version of the Anthology? It is inherently waaaaay less functional than the existing search system on the current Anthology- for example, the current search page has really great result faceting, etc.

I believe Google Custom Search is much more powerful than people give it credit for, and it offers customization options that should allow for similar result faceting and features as before. However, that requires some more work on my part, and it wasn't really possible to implement and test this earlier as, by its very nature, it requires the new site to be live and getting indexed by Google first.

I'd really like to advocate for some more patience here over the coming weeks as I'm hoping to improve this. Maintaining a custom-made search solution is a huge liability IMO, and I would really like for people to give the Google version a fair chance first.

@stevenbedrick
Copy link

@mbollmann That's totally fair, and thank you for the reply. I certainly see the value of using an off-the-shelf/hosted search platform in general, and also of using Google Custom Search in particular as a "getting things up and running" solution. For the sake of clarity, my concerns are less about the search behavior of GCS- if anybody can build a decent text search engine, it'd be Google! My concerns are more about search UI/UX- result faceting, etc. I'm happy to give GCS more of a chance, and am looking forward to seeing what we're able to do with GCS in terms of customization. Thank you (all of you!) for your efforts on this project; I do very much like the redesign overall and am excited to see it evolve!

@mjpost
Copy link
Member

mjpost commented Mar 12, 2019

Okay, rebuilt. I also merged in master which had some corrections.

@aryamccarthy
Copy link
Member

Unclear whether this is a parsing error or a data error: this BibTeX has no article title.

@mjpost
Copy link
Member

mjpost commented Mar 13, 2019

Thanks! The title appears in the HTML: (http://aclweb.org/anthology/D13-1088/) and is in the XML, so I'm not sure what's going on here.

@mbollmann
Copy link
Member Author

Thanks! The title appears in the HTML: (http://aclweb.org/anthology/D13-1088/) and is in the XML, so I'm not sure what's going on here.

Pretty sure it's related somehow to the title starting with <fixed-case>. It's fixed with the refactored BibTeX generation in 7cd20c3.

@mjpost
Copy link
Member

mjpost commented Mar 13, 2019

Ah, I was looking at the master branch. i’ll rebuild tonight.

@mjpost
Copy link
Member

mjpost commented Mar 13, 2019

Done, and the problem is indeed fixed. Thanks!

@akoehn akoehn mentioned this issue Jul 18, 2019
@LuckyJLin
Copy link

I can't read this PDF https://www.aclweb.org/anthology/W19-3604
There is nothing in the page.

@akoehn
Copy link
Member

akoehn commented Jul 30, 2019 via email

@LuckyJLin
Copy link

I can't read this PDF https://www.aclweb.org/anthology/W19-3604
There is nothing in the page.
There is a PDF, but the PDF is empty. Maybe an error in the ingestion
process?

I open this PDF from previous page https://www.aclweb.org/anthology/papers/W/W19/W19-3604/
but I still get an empty PDF.

@mjpost
Copy link
Member

mjpost commented Jul 30, 2019

That is the PDF that the Widening NLP workshop gave us. There are many other empty (W19-3601 W19-3604 W19-3607 W19-3618 W19-3629 W19-3644 W19-3648) and improperly-formatted papers.

@alphadl
Copy link

alphadl commented Jul 31, 2019

Thanks, I am trying to find the WMT19's papers in the set, but I can not find them.

@mjpost
Copy link
Member

mjpost commented Jul 31, 2019

They have not been ingested yet. Please see statmt.org/wmt19 where they are available.

@jihunchoi
Copy link

It seems that ACL 2019 tutorial abstracts have ACL 2017 publication information on their footer (see https://www.aclweb.org/anthology/P19-4#page=11 for example); is it intended?

@mjpost
Copy link
Member

mjpost commented Aug 6, 2019

@jihunchoi Forgotten rsync, fixed, thank you!

@sedimentation-fault
Copy link

I want to commend you and thank you for putting so much thought into a naming system of papers and their related information, like bibtexs. This is by no means self-evident - and in fact I have seen it only at ACL. It is incredible, but you are the only people on the planet who name related resources with the same basename! This puts you light years ahead of your time!

Let me explain:

Suppose I look at the aclweb.org site and say to myself: "WOW! What an incredible treasure. I would love to download it all and have it in my local paper library for my reading pleasure!". Well, that's easy. I remember looking at it 10 years ago - or even further back in the past. It has always been easy to "get them all". But that's only part of the story. Having PDFs named like "pennington-etal-2014-glove.pdf" does not help at all - you must rename them to some naming scheme amenable to searching, e.g.:

Venue Volume Issue Year DOI Authors Title

For example, for the above paper:

Proceedings 2014 Conference on Empirical Methods in Natural Language Processing EMNLP 2014 [doi 10.3115%2Fv1%2FD14-1162] Pennington, Jeffrey; Socher, Richard; Manning, Christopher -- Glove - Global Vectors for Word Representation.pdf

Notice that you can reconstruct a basic bibtex from the above name, knowing that semicolons delimit author names, the tile comes after ' -- ', the DOI is the URL-encoded string XXX in '[doi XXX]', the year is the 4-digit string before the DOI part and "Venue" is before that.

For this to work, you need bibliographic information for each paper, say in the form of a .bib file. You have that - everybody has that. But what you have - and everybody else is still missing - is this:

The paper and its associated bibliographic information have the same basename! That is, if the above paper has a URL

https://www.aclweb.org/anthology/D14-1162

then I know that the paper is at

https://www.aclweb.org/anthology/D14-1162.pdf

and its associated bibtex at

https://www.aclweb.org/anthology/D14-1162.bib

I can get those two just by looking at the 'url={...}' lines of the 'cumulative' bibtex at

https://www.aclweb.org/anthology/anthology.bib.gz

and as soon as I have two files, one PDF and one BIB, with the same basename

D14-1162.pdf
D14-1162.bib

I know they are connected!

It's so simple, but its impact is immense. Imagine you would have a PDF

D14-1162.pdf

but its bibtex would have a different basename, say

pennington-etal-2014-glove.bib

How on earth would you know they belong together? You would have to resort to web scraping: parse each proceedings HTML page and, for each PDF link on it, find the '.bib' HTML link that is visually 'nearest' to it. This is programming hell.

Having a local paper collection, with papers renamed as above, makes searching (an issue that has been the subject of quite a few postings above) a dream: just list your local papers and pipe the list to a text file. Now use that text file as a "poor man's index" using, say, grep. You can grep it with any regular expression you like

grep -E 'your regexp' index.txt

If you rename your local papers as above, you will be amazed at what you can find by such a simple method!

So thank you for making local collections possible with such genial ideas like providing a cumulative .bib file and using consistent names across the whole site for both papers and their bibliographic information. Forget OAI-PMH, federated repository aggregators and all that! All a truly open access paper repository needs is those two simple things!

@lucy3
Copy link

lucy3 commented Nov 11, 2020

It's possible my brain is pudding right now, but is there a way to navigate to EMNLP Findings papers from the homepage of the ACL anthology? I see they're posted here: https://www.aclweb.org/anthology/volumes/2020.findings-emnlp/, but ctrl-F for "Findings" on the main page or the EMNLP page doesn't lead to any results.

@mjpost
Copy link
Member

mjpost commented Nov 11, 2020

Hi @lucy3—it's not currently linked from the front page, but will be soon.

@Pranav-Goel
Copy link

A paper I have not authored was wrongly assigned to my ACL Anthology page because I have the exact same name as the first author on that paper. How do I get it removed from my profile?

@mbollmann
Copy link
Member Author

@Pranav-Goel Please open a new issue for that, and make sure to include the Anthology ID(s) of the paper(s) in question. We can disambiguate authors in the metadata then. If you have an academic website and/or an ORCID ID, feel free to include a link to it too, as it might help us with the disambiguation process.

@AmyOlex
Copy link

AmyOlex commented Apr 23, 2021

When search results come up for a keyword search it would be helpful to see the data of publication and the list of authors. Some of the PDFs don't have any dates in the footer. Also, would there be a way to subscribe to a certain search result and get email updates when new papers are posted that match?

@BramVanroy
Copy link
Contributor

The form that is linked to in the side bar "The Anthology can archive your poster or presentation! Please submit them in PDF format by filling out this form." is not accessible anymore.

@AGalassi
Copy link

Not sure if this is the right place where to ask these things, please redirect me if this is the wrong place:

  • Would it be possible to show the ORCID identifier of the author on the author page?
  • Are ACL anthology publications indexed in services such as Scopus and Web of Science?

@akoehn
Copy link
Member

akoehn commented Nov 12, 2021

ORCID: We do not have orcid data for authors, so currently not. See eg.g #1179 for WIP.

@zixiu-alex-wu
Copy link

Hi, first of all, thank you for your work on the ACL Anthology website --- it's amazing!

My question is about the (lack of) Scopus indexing of my paper (https://aclanthology.org/2021.clpsych-1.22/), accepted to the Seventh Workshop on Computational Linguistics and Clinical Psychology, co-located with NAACL 2021.

Right now, the paper is NOT indexed on Scopus, and upon further digging, I have found that neither the workshop itself (2021 occurrence) nor NAACL 2021 is Scopus-indexed, despite the fact that previous proceedings of both are indexed. I was wondering if you could help to make sure that my paper and the proceedings of the workshop and NAACL 2021 are indexed on Scopus? Without the indexing, my paper will not count towards my PhD degree.

I have tried to contact [email protected] about this multiple times, but I have not heard from them.

Thank you!

Best regards,

Zixiu Wu

@mbollmann
Copy link
Member Author

@zixiu-alex-wu Have you tried asking Scopus about this? I'm not aware of anything we do on our side related to indexing papers in other databases, and would be surprised if we had any control about this.

@mjpost
Copy link
Member

mjpost commented Jan 11, 2022

We have been manually submitting proceedings here and there, as we find motivated volunteers to do so. I’m in the process of working with a volunteer to be more systematic about this, but it’s currently something we are not handling well. It is on our 2022 roadmap, however!

@zixiu-alex-wu
Copy link

@zixiu-alex-wu Have you tried asking Scopus about this? I'm not aware of anything we do on our side related to indexing papers in other databases, and would be surprised if we had any control about this.

Hi Marcel, thank you for your response! I have actually asked Scopus already, and they have an investigation underway, so I thought I'd ask the ACL anthology people about this as well.

In fact, the workshop's organiser referred me to the chairs of NAACL 2021, who in turn referred me to "the ACL Anthology folks", because, as he put it, "they are maybe the only ones who would know the most recent year of NAACL is not indexed yet in Scopus".

Once Scopus informs me of the results of their investigation, I will put an update here.

Thank you again for your response!

@zixiu-alex-wu
Copy link

We have been manually submitting proceedings here and there, as we find motivated volunteers to do so. I’m in the process of working with a volunteer to be more systematic about this, but it’s currently something we are not handling well. It is on our 2022 roadmap, however!

Hi Matt,

Thank you for your response!

So, if I am not mistaken, the proceedings of ACL conferences such as NAACL, as well as the proceedings of the co-locating workshops, have mostly been submitted to Scopus manually, which has resulted in the indexing of the proceedings of the conferences in previous years.

In that case, I was wondering if you could perhaps arrange for the proceedings of the workshop in question (https://aclanthology.org/volumes/2021.clpsych-1/) as well as the proceedings of NAACL 2021 to be submitted to Scopus, so that both of them as well as my workshop paper (https://aclanthology.org/2021.clpsych-1.22/) would be indexed?

Thank you!

Best regards,

Zixiu Wu

@AGalassi
Copy link

We have been manually submitting proceedings here and there, as we find motivated volunteers to do so. I’m in the process of working with a volunteer to be more systematic about this, but it’s currently something we are not handling well. It is on our 2022 roadmap, however!

Hi, does this apply only for the main ACL conferences or also for non-ACL events, such as COLING?

@mjpost
Copy link
Member

mjpost commented Jan 12, 2022

The ACL can only assume responsibility for ACL events, unless some arrangement is made.

@AGalassi
Copy link

The ACL can only assume responsibility for ACL events, unless some arrangement is made.

Thank you! Since these other events are present in the ACL anthology as well I was not sure if they were managed independently or not.

@mbollmann
Copy link
Member Author

Closing this as the “new” website is now over 4 years old. Feedback is still welcome, but can go into more specific issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Interesting but beyond current volunteer bandwidth
Projects
None yet
Development

No branches or pull requests