pymarcspec

Summary

An implementation of MarcSpec on top of pymarc for searching MARC records.

Usage

The idea is to easily use strings to search over MARC without writing complicated code to handle data.

import sys
from pymarcspec import MarcSearchParser
from pymarc import MARCReader

parser = MarcSearchParser()
spec = parser.parse('650$a$0')
with open(sys.argv[1], 'rb') as f:
    for record in MARCReader(f):
        subjects = spec.search(record)
        print(subjects)

The TextStyle class governs how results are combined into strings (or not). You can subclass TextStyle or BaseTextStyle to do anything you want with combining the results, or you can handle it yourself.

There is also a MarcSearch object that memoizes each search expression, so that you can conveniently run a number of different searches without creating several parsed specs. For example:

import csv
import sys
from pymarcspec import MarcSearch, TextStyle
from pymarc import MARCReader

writer = csv.writer(sys.stdout, dialect='unix', quoting=csv.QUOTE_MINIMAL)
writer.writerow(['id', 'title', 'subjects'])

style = TextStyle(field_delimiter=':')
marcsearch = MarcSearch(style)
with open(sys.argv[1], 'rb') as f:
    for record in MARCReader(f):
        control_id = marcsearch.search('100', record)
        title = marcsearch.search('245[0]$a-c', record)
        subjects = marcsearch.search('650$a', record)
        writer.writerow([control_id, title, subjects])

Development

Building the Parser

To build the parser, run:

python -m tatsu -o marcparser/parser.py marcparser/marcparser.ebnf

Note that this builds a class MarcSpecParser, which implements the full specification from MarcSpec, the MarcSearchParser is a subclass that builds an instance of MarcSpec; building this structure has some restrictions for what I needed when I wrote it.

Testing for freshness

The test in test/test_ebnf.py compiles the parser from the EBNF into a temporary path, which makes sure that coffee driven programmers like me remember to compile the parser and check in the changes.

Performance

It is not obvious this is needed. It may be fine for instance to use XPath expressions. Suppose we are going to do a lot of these conversions - if XPath is fast enough, the work of converting from a pymarc.Record to MARCXML will be amoritized by many searches. Jupyter Notebooks have a %timeit magic that allows us to check this:

Let us check the performance of the simplest such XPath expression:

In [34]: %timeit ''.join(doc.xpath('./controlfield[@tag="001"]/text()'))                                                                                  
19.4 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

And compare it to parsing a spec and searching:

In [37]: from pymarcspec import MarcSearchParser                                                    

In [38]: parser = MarcSearchParser()                                                                

In [39]: spec = parser.parse('001')                                                                 

In [40]: spec.search(record)                                                                        
Out[40]: '1589530'

In [41]: %timeit spec.search(record)                                                                
7.89 µs ± 253 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

So, from a performance perspective this is clearly a win, and the expression is much closer to library IT.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
bin		bin
pymarcspec		pymarcspec
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
MANIFEST.in		MANIFEST.in
README.md		README.md
VERSION.txt		VERSION.txt
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pymarcspec

Summary

Usage

Development

Building the Parser

Testing for freshness

Performance

About

Releases 3

Packages

Languages

danizen/pymarcspec

Folders and files

Latest commit

History

Repository files navigation

pymarcspec

Summary

Usage

Development

Building the Parser

Testing for freshness

Performance

About

Resources

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages