Skip to content

Match affiliation strings to institutes (powered by Wikidata and GeoNames)

License

Notifications You must be signed in to change notification settings

qtux/instmatcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

instmatcher

This library provides means to find an institution matching an affiliation string, a string consisting of for example a name, an address or similar information associated with the institution. Its main features are:

  1. Search for an institution by its name in a list based on Wikidata and GeoNames.
  2. Parse affiliation strings using grobid to retrieve the corresponding name and address.
  3. Geocode the parsed data to enrich it with geographical coordinates based on GeoNames.

Installation

To install instmatcher simply clone the git repository and install it using pip:

git clone https://github.com/qtux/instmatcher.git
cd instmatcher
pip install .

Usage Example

The match function may be used to search for a matching institution for a given affiliation string. Note that this example assumes a grobid server listening on http://0.0.0.0:8080.

import instmatcher
response = instmatcher.match('TU Berlin, Institute of Mathematics, Berlin, Germany')
print(response)

Depending on how well grobid is trained, executing the code above will most likely print:

{'name': 'Technical University of Berlin', 'lat': '52.511944444444', 'lon': '13.326388888889',...

Development

In order to run the tests execute:

python setup.py test

In order to build the documentation install the required packages

pip install .[docs]

and use the Makefile in the docs folder to build the documentation.

Attribution

  1. The list of institutions is queried from Wikidata (available under CC0).
  2. The list of institutions is enhanced using the reverse-geocoder library which contains GeoNames data (available under CC BY 3.0).
  3. The list of cities and the list of countries are taken from GeoNames (available under CC BY 3.0).

https://raw.githubusercontent.com/qtux/instmatcher/master/attribution.png

License

This software is licensed under the Apache License, Version 2.0.

About

Match affiliation strings to institutes (powered by Wikidata and GeoNames)

Resources

License

Stars

Watchers

Forks

Packages

No packages published