-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardizing attributions for display on scaife.perseus.org #2308
Comments
Hi @jacobwegner The working header template is here: https://docs.google.com/document/d/16fZThJUuTwJJKFiJgi1cLW0ONKeVnE3MdqMHVdCcmDM/edit#heading=h.c5otxj8nwtfd Overall the header consistency is a challenge that extends beyond the credits. We know there is a lot of work to be done on the older file headers. As you note, we have inconsistent labels. Some describe roles, others describe tasks. (The practice of using the dates is not something we want to preserve.) We need better labels that fully capture the contributions of the students, so we have to settle on the vocabulary. I don't know that we can do much with the older work. There are just some roles/labels there that we no longer use. I think much of the older stuff was boilerplate. I prefer a lighter and streamlined header, so I would lean against splitting up a description such as "Proofreading and CTS conversion." I also like keeping Greg, Lenny, and Bruce together in their grouping — it makes for less bulk and more readability. If it doesn't work, that's fine. I presume the file headers should read in order of the desired presentation, if so, that impacts how files are exported from Lace. I do not know if that affects Zenodo display — I believe we added the three top names as part of the bibliographic cleanup so that we would have a consistent set of names to appear at the top of our releases. Will people be displayed as role/name/org or name/org/role? If it's the former, perhaps we could toggle the roles, so that |
@lcerrato Thanks for your feedback and the link to the existing header template–very helpful!
Part of my goal in providing the spreadsheet was to demonstrate some of the variance between labels and headers; I know the "hard" part is building that vocabulary, as you've said, but hopefully the bulk update process helps to simplify updating them once the vocabulary is established.
I don't intend to enforce a "Scaife-specific" convention on the headers, and the readability issue is a good one to bring up. In the Scaife attributions model, we tie each attribution to a person and or organization. See for example, tlg0093.tlg005.1st1K-grc1: If the desire is to have those principals and OGL as the first <respStmt>
<resp>Published original versions of the electronic texts</resp>
<orgName ref="https://www.opengreekandlatin.org">
Open Greek and Latin
<persName role="principal">Gregory Crane</persName>
<persName role="principal">Leonard Muellner</persName>
<persName role="principal">Bruce Robertson</persName>
</orgName>
</respStmt> I think we could account for that type of nested
That's correct, but I could also see us making use of the <respStmt>
<resp>Published original versions of the electronic texts</resp>
<orgName ref="https://www.opengreekandlatin.org">
Open Greek and Latin
<persName role="principal">Gregory Crane</persName>
<persName role="principal">Leonard Muellner</persName>
<persName role="principal">Bruce Robertson</persName>
</orgName>
</respStmt>
<respStmt n="1">
<resp>Proofreading</resp>
<orgName>Mt. Allison University</orgName>
<persName>Kirsten Mason</persName>
</respStmt>
<respStmt n="2">
<resp>CTS conversion</resp>
<orgName>Center for Hellenic Studies</orgName>
<persName>Michael Konieczny</persName>
</respStmt> (Where a So ideally I'd like to come up with a convention that works for you / OGL to allow the Scaife Viewer environment to extract attributions from the files directly, but in the case of the "principals" header and sorting of proofreading / CTS conversions, I could also see having a repository-level mapping configuration file (much like Another way to say this is that we want to place as much control as possible of the display of attributions with the providing data source (so the Scaife dev team is not a bottleneck on updating mappings, etc). With either the convention or the mapping file, I'd expect to end up with something more like this in Scaife: (With the near-term goal of being able to click on Kristen's or Michael's name to see other works they have contributed to)
I don't know for certain either, but it looks like .zenodo.json controls the Zenodo display directly.
I think you may have got cut off here, but currently we display name + org and then role: |
@jacobwegner I would like other input on this but I like the middle display and the notion of adding the weighted attribute. My only concern on that would be that is an easy thing to miss in file review and it wouldn't be in the testing regime. |
@lcerrato I'd be happy to change the scope of this bulk update to just cleaning up / standardizing the existing attributions. If they're standardized, I'd be happy to write small config file that could handle the "weighting" (e.g.):
That would keep you / other repo maintainers in control of the weighting without having to rely on keeping the attributes up to date. |
@jacobwegner |
Hi @jacobwegner! I believe that others who have CTS conversion or something similar in the line should be "Digital editor" — (not entirely sure we settled on that). This would be the second level of display. I can get to work spotting other issues. One thing that @ThomasK81 mentioned is that while the group preferred the middle display example, (with Kirsten Mason first), the OGL itself is in parentheses with the three director names. Note on the last screen shot, the Org name appears first. Would it be possible to have If I/we want to regularize other information, it should be on the first tab, yes? (I saw some minor inconsistencies and started fixing them.) Edit: I see your notes above. I will make notations to my changes and then go from there. Let me know the next steps. |
@jacobwegner |
(As I mentioned to @lcerrato in chat, outside of the |
@jacobwegner Thanks! Will resume next week when I'm less distracted. |
Should we just use the publication statement to get the OGL info rather than "hack" the OGL into the responsibility statement? |
@ahanhardt @brobertson |
@jacobwegner |
@lcerrato I have just removed the locks so you can add rows. I hadn't been thinking about adding new entries from the sheet (just updating), but as long as you populate the urns cell for each entry and leave the key blank, that should work. Happy to hop on a chat or call to discuss further. |
@jacobwegner |
@lcerrato: I'm happy to do add them now or wait until the first batch is done. If we do it now (and we're adding a name rather than modifying an existing name), let's use the "New Attributions" worksheet: |
@jacobwegner I did not edit the other tabs excepting when I first fixed Greg's name and the new tab you added above. |
@lcerrato Great! I'll try and circle back to this tomorrow. As far as someone having two different organizations, that may be desirable for OGL, but completely optional / allowable to the underlying data model in Scaife Viewer. We'd show that a person affiliated with Org A contributed on Text 1 and then that same person affiliated with Org B on Text 2. |
@jacobwegner |
@jacobwegner |
@lcerrato I'll circle back here this week; apologies for the delay. |
- Extracted from the Google Spreadsheet (OpenGreekAndLatin/First1KGreek#2308) - Compared substitutions to extractions - Added explicit lookups for the required matches
@lcerrato Apologies that it has taken awhile for me to finish this task. Here's a status update: Originally, I had hoped to apply the changes made in the spreadsheet "in place" on the existing XML files, updating or appending additional Unfortunately, there isn't a good consistent way to do this because:
Talking with @jtauber, I came up with what I think is a good compromise. I've created a configuration file that is currently in the scaife-viewer/scaife-viewer. This file has substitutions that map the I will soon create a pull request that moves that file to this repo and provides a brief overview of how it works. I think that will help us "clean up" the older attribution data without bloating up the Git history. I can also document our discussion on this thread on the "preferred" structure of a If there are further tweaks that need to be made to the substitutions list, any repo maintainer here can make changes to the config file, and Scaife Viewer will pick up the changes the next time that this repository is ingested. The development instance (https://scaife-dev.perseus.org) has these substitutions applied. When I circle back to move the config file to this repo, I can also ensure we're ingesting the latest release onto the production scaife.perseus.org. |
@jacobwegner |
@jacobwegner |
Just recapping the current options as I see them: 1) Use a configuration file that maps existing 2) Reformat and update headers 3) Generate an archive file containing the updated resp stmts to be applied manually We're using "1)" on the current deployment at https://scaife-dev.perseus.org/. I think we'd continue to use the config file to "promote" certain roles (which would allow the existing resp "Published original versions of the electronic texts" structured preferred for Zenodo, etc). Please let me know if I can help answer further questions or if you'd like me to be part of a call to discuss further. |
@jacobwegner Future headers: I don't see any substantive changes to the header itself. I know we discussed reordering things, weighting attributes, consistent presentation, etc., but if we were to create a file today, is there something that needs to change in the header structure or markup itself? I get a sense I am missing something there. Existing headers: I don't think there is way around manually applying the consistent role vocabulary to the files. Otherwise, we'd have xml files that have outdated headers. (I guess your point 3) is what is going to assist with that?) The group would like me to work backwards since the most important stuff is the recent student work. Do I have that right? Is that work going to create conflicts in the way the software handles things? Both of these points have to do with what I can do now and any implications for what you are doing on the SV side. I don't want to resume work using bad headers and I don't want to fix old headers in a way that conflicts with the configuration file. TL:DR What should I be doing now that would be most useful for you? Should I go ahead and run some new headers or other edits by you? |
I've created this issue to track updates to the underlying attribution data that we're now extracting / displaying on scaife.perseus.org
Overview
I've extracted the existing attributions (from
respStmt
elements) and exported them to a Google Spreadsheet, OGL - First1kGreek Attributions. I can grant access to the appropriate persons within OGL to perform bulk edits to the data.Once the preferred edits have been made to the spreadsheet, I will use the spreadsheet to bulk update the underlying XML files with the new attribution information and open a pull request.
If this workflow works well, we can do it for other OGL repos (and ideally any other repos contributing texts to scaife.perseus.org)
Desired data model
Here are a few samples of what the updated
respStmt
elements will look like:Thibault Clérice, Lead Developer (University of Leipzig) 2015 - 2017
From https://github.com/OpenGreekAndLatin/First1KGreek/blob/master/data/tlg0062/tlg001/tlg0062.tlg001.1st1K-grc1.xml#L28
to:
Notes:
from
andto
attrs to denote the timeframe of the resp.persName.ref
Simona Stoyanova, Project Manager (University of Leipzig), 2015, Project Assistant (University of Leipzig), 2013-2014
From https://github.com/OpenGreekAndLatin/First1KGreek/blob/master/data/stoa0146d/stoa001/stoa0146d.stoa001.opp-grc1.xml#L47
to:
Notes:
resp
elements to a 1:1 relationship betweenrespStmt
andresp
when
andfrom|to
attrs denote the resp. timeframeGregory Crane, Leonard Muellner, Bruce Robertson, Published original versions of the electronic texts, Open Greek and Latin
From
First1KGreek/data/tlg0093/tlg005/tlg0093.tlg005.1st1K-grc1.xml
Line 12 in 3f5519b
to:
Notes:
respStmt
containing multiplepersName
elements to a 1:1 relationship betweenrespStmt
andpersName
.orgName
in eachrespStmt
Implementation
Extraction process
Each row in the
attributions-data
worksheet corresponds to a set of URNs extracted from the underlying XML files.There are "key" and "urn" fields which should not be modified and will be used to perform the bulk update.
Editing attribution data in the spreadsheet
I went through and made an initial pass to clean up the data. This involved fixing small typos in organization names, normalizing names (Mt. Allison vs Mount Allison, etc) and restructuring data to fit the desired model (discussed below).
The
unique-*
worksheets show uniquevalues for theresp
,orgName
andpersName
.Ideally, we can standardize on "Proofreading" vs "proofreader" vs "Proofreading and CTS conversion" as appropriate. If proofreading and CTS conversion are two distinct responsibilities for a given text, I would suggest:
Adding an additional row beneath "Proofreading and CTS conversion"
Edit the original
resp
to ProofreadingSet the
resp
in the new row toCTS conversion
Copy the other relevant fields (
resp
,orgName
andpersName
) to the new rowLeave a comment on the row so I can ensure that the
urn
andkey
fields are also populated.There are also several instances where slight variants in a person's name are used, or
resp
possibly contains data better suited fororgName
.We should not delete any rows; if there are duplicate rows in the spreadsheet, we'll use the
urn
andkey
fields to de-duplicate data.Bulk update process
Once edits have been finalized in the spreadsheet, I'll use the
urn
andkey
fields to map the edits back to the desired data model (see below)I will also perform a reordering of the desired "proofreading / conversion" role(s) so that they are weighted before any other roles.
I'll open up a PR and link it back to this issue. The PR can be merged and then the updated attributions will be made available on scaife.perseus.org
Closing thoughts
I'm not sure if there is "template" for future XML files, but I would also be happy to take the examples in Desired data model above and integrate them into that template.
As long as the XML files have
respStmt
withresp
and one ofpersName
ororgName
, we can extract attributions for display on scale.perseus.org.The text was updated successfully, but these errors were encountered: