Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove deleted entries from index #159

Open
acka47 opened this issue Oct 2, 2018 · 7 comments
Open

Remove deleted entries from index #159

acka47 opened this issue Oct 2, 2018 · 7 comments
Assignees

Comments

@acka47
Copy link
Contributor

acka47 commented Oct 2, 2018

There happen several deletions without redirect to an existing entry in the GND. Here are the numbers for the last months provied by S. Hartmann in http://jira.dnb.de/browse/GND-63:

09.2018: 123 GND-Datensätze
08.2018: 89 GND-Datensätze
07.2018: 41 GND-Datensätze
06.2018: 83 GND-Datensätze
05.2018: 104 GND-Datensätze
04.2018: 80 GND-Datensätze

These are only removed when building a whole new index with a new GND dump. When updating the data on a day-to-day basis the deleted entries aren't removed. We should see how we get the information on deleted entries via OAI-MPH and remove deleted entries with each update.

@acka47
Copy link
Contributor Author

acka47 commented Dec 18, 2018

@fsteeg fsteeg added working and removed ready labels Dec 20, 2018
@fsteeg
Copy link
Member

fsteeg commented Dec 20, 2018

The DNB repository does not seem to provide that information. It declares its level of support for deletions as transient, which means "the repository does not guarantee that a list of deletions is maintained persistently or consistently" (see https://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords).

The response header information does not contain the optional status attribute (see https://www.openarchives.org/OAI/openarchivesprotocol.html#header), they all look like this:

<header>
  <identifier>oai:dnb.de/authorities/000460265</identifier>
  <datestamp>2018-12-20T04:38:17Z</datestamp>
  <setSpec>authorities</setSpec>
</header>

Maybe the info is available in other formats like MARC21-xml or PicaPlus-xml? But we can't get these, only RDFxml works, others give 403 (Forbidden). @acka47 maybe this is something to bring up in the GND dev expert group?

@fsteeg fsteeg assigned acka47 and unassigned fsteeg Dec 20, 2018
@fsteeg fsteeg removed the working label Dec 20, 2018
fsteeg added a commit that referenced this issue Dec 20, 2018
@fsteeg fsteeg self-assigned this Dec 20, 2018
@fsteeg fsteeg added the working label Dec 20, 2018
@fsteeg fsteeg removed their assignment Dec 20, 2018
@fsteeg fsteeg removed the working label Dec 20, 2018
@acka47
Copy link
Contributor Author

acka47 commented Jan 7, 2019

It is probably the best approach to open an issue in the DNB Jira where we ask for support of deletions via OAI-PMH. I will do this.

@acka47
Copy link
Contributor Author

acka47 commented Mar 5, 2019

The Jira issue is at https://jira.dnb.de/browse/GND-77 (login required).

@acka47 acka47 added the upstream changes in upstream data/API needed label Mar 5, 2019
@acka47
Copy link
Contributor Author

acka47 commented Jun 21, 2021

There is an update on the Jira issue which reads:

Wird mit dem nächsten Release 2021.03 realisiert. Vorabankündigung mit den notwendigen Informationen kommt am 28.6.2021.

@acka47
Copy link
Contributor Author

acka47 commented Jul 8, 2021

From Metadatendienste: Änderungen im Format RDF ab 28. September 2021(Export-Release 2021.03):

Mit Release 2021_03 wird es nun möglich, Aussagen über gelöschte Datensätze in der GND über die Schnittstellen (OAI- bzw. SRU-Schnittstelle)10 zu erhalten. Hierfür wurde die neue Klasse „dnbt:DeletedResource“ eingeführt.Beispiel:

<rdf:Description rdf:about="https://d-nb.info/gnd/1109770197">
  <rdf:type rdf:resource= "https://d-nb.info/standards/elementset/dnb#DeletedResource"/>
</rdf:Description>

The release will drop on 2021-09-28. Already assigning @fsteeg but leaving the issue in backlog.

@acka47 acka47 assigned fsteeg and unassigned acka47 Jul 8, 2021
@acka47
Copy link
Contributor Author

acka47 commented Sep 28, 2021

The Jira issue is at https://jira.dnb.de/browse/GND-77 (login required).

This issue was just closed with this comment:

Löschungen werden nun über OAI kommuniziert.

Bsp.:
GET https://services.dnb.de/oai/repository?verb=GetRecord&metadataPrefix=RDFxml&identifier=oai:dnb.de/authorities/1231757663

<rdf:Description rdf:about="https://d-nb.info/gnd/1231757663">
    <rdf:type rdf:resource="https://d-nb.info/standards/elementset/dnb#DeletedResource"/>
</rdf:Description>

@acka47 acka47 removed the upstream changes in upstream data/API needed label Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants