Skip to content

Commit

Permalink
Update documentation for delphin.itsdb
Browse files Browse the repository at this point in the history
  • Loading branch information
goodmami committed Mar 28, 2019
1 parent 03c23d3 commit ed63805
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 31 deletions.
42 changes: 25 additions & 17 deletions delphin/itsdb.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,16 @@
* Selecting data by table name, record index, and column name or index:
>>> items = ts['item'] # get the items table
>>> rec = items[0] # get the first record
>>> rec['i-input'] # input sentence of the first item
>>> items = ts['item'] # get the items table
>>> rec = items[0] # get the first record
>>> rec['i-input'] # input sentence of the first item
'雨 が 降っ た .'
>>> rec['i-id'] # key-access does not cast types
'11'
>>> rec.get('i-id') # nor does get() by default
'11'
>>> rec.get('i-id', cast=True) # but it does with cast=True
>>> rec[0] # values are cast on index retrieval
11
>>> rec.get('i-id') # and on key retrieval
11
>>> rec.get('i-id', cast=False) # unless cast=False
'11'
* Selecting data as a query (note that types are cast by default):
Expand Down Expand Up @@ -69,14 +69,15 @@
easy to load the tables of a testsuite into memory, inspect its
contents, modify or create data, and write the data to disk.
By default, the `itsdb` module expects testsuites to use the
standard [incr tsdb()] schema. Testsuites are always read and written
according to the associated or specified relations file, but other
things, such as default field values and the list of "core" tables,
are defined for the standard schema. It is, however, possible to
define non-standard schemata for particular applications, and most
functions will continue to work.
By default, the `itsdb` module expects testsuites to use the standard
[incr tsdb()] schema. Testsuites are always read and written according
to the associated or specified relations file, but other things, such
as default field values and the list of "core" tables, are defined for
the standard schema. It is, however, possible to define non-standard
schemata for particular applications, and most functions will continue
to work. One notable exception is the :meth:`TestSuite.process`
method, for which a new :class:`~delphin.interfaces.base.FieldMapper`
class must be defined.
"""

from __future__ import print_function
Expand Down Expand Up @@ -625,7 +626,7 @@ class Table(object):
and other operations where high-speed random-access is required.
See the :meth:`attach` and :meth:`detach` methods for more
information. The :meth:`is_attached` method is useful for
determining which mode a table is in.
determining the mode of a table.
Args:
relation: the Relation schema for this table
Expand Down Expand Up @@ -848,18 +849,21 @@ def list_changes(self):
if row is not None]

def _sync_with_file(self):
"""Clear in-memory structures so table is synced with the file."""
self._records = []
i = -1
for i, line in self._enum_lines():
self._records.append(None)
self._last_synced_index = i

def _enum_lines(self):
"""Enumerate lines from the attached file."""
with _open_table(self.path, self.encoding) as lines:
for i, line in enumerate(lines):
yield i, line

def _enum_attached_rows(self, indices):
"""Enumerate on-disk and in-memory records."""
records = self._records
i = 0
# first rows covered by the file
Expand All @@ -886,6 +890,7 @@ def __getitem__(self, index):
return self._getitem(index)

def _iterslice(self, slice):
"""Yield records from a slice index."""
indices = range(*slice.indices(len(self._records)))
if self.is_attached():
rows = self._enum_attached_rows(indices)
Expand All @@ -899,6 +904,7 @@ def _iterslice(self, slice):
yield Record._make(fields, row, self, i)

def _getitem(self, index):
"""Get a single non-slice index."""
row = self._records[index]
if row is not None:
pass
Expand All @@ -918,12 +924,14 @@ def _getitem(self, index):
return Record._make(self.relation, row, self, index)

def __setitem__(self, index, value):
# first normalize the arguments for slices and regular indices
if isinstance(index, slice):
values = list(value)
else:
self._records[index] # check for IndexError
values = [value]
index = slice(index, index + 1)
# now prepare the records for being in a table
fields = self.relation
for i, record in enumerate(values):
values[i] = _cast_record_to_str_tuple(record, fields)
Expand Down
1 change: 1 addition & 0 deletions docs/api/delphin.itsdb.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ databases:

.. automethod:: from_file
.. automethod:: write
.. automethod:: commit
.. automethod:: attach
.. automethod:: detach
.. automethod:: is_attached
Expand Down
39 changes: 25 additions & 14 deletions docs/tutorials/itsdb.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,12 @@ Working with [incr tsdb()] Testsuites

[incr tsdb()] is the canonical software for managing
**testsuites**---collections of test items for judging the performance
of an implemented grammar---within DELPH-IN. There are several words in
use to describe testsuites:
of an implemented grammar---within DELPH-IN. While the original
purpose of testsuites is to aid in grammar development, they are also
useful more generally for batch processing. PyDelphin has good support
for a range of [incr tsdb()] functionality.

There are several words in use to describe testsuites:

:skeleton:

Expand Down Expand Up @@ -131,17 +135,19 @@ By default, writing a table deletes any previous contents, so the
entire file contents need to be written at once. If you want to write
results one-by-one, the `append` parameter is useful. You many need to
clear the in-memory table before appending the first time, and this can
be done with a standard list operations:
be done by writing an empty list with `append=False`:

>>> ts['result'].clear() # list.clear() is Python3 only
>>> ts.write({'result': [], append=False) # to erase on-disk table
>>> ts.['result'][:] = [] # to clear in-memory table
>>> for record in result_records:
... ts.write({'result': [record]}, append=True)

Processing Testsuites with ACE
------------------------------

PyDelphin has the ability to process testsuites using ACE, similar to
the `art <http://sweaglesw.org/linguistics/libtsdb/art>`_ utility and
PyDelphin has the ability to process testsuites using `ACE
<http://sweaglesw.org/linguistics/ace>`_, similar to the
`art <http://sweaglesw.org/linguistics/libtsdb/art>`_ utility and
`[incr tsdb()] <http://www.delph-in.net/itsdb/>`_ itself. The simplest
method is to pass in a running
:class:`~delphin.interfaces.ace.AceProcess` instance to
Expand All @@ -160,27 +166,32 @@ attribute) and select the appropriate inputs from the testsuite.
NOTE: parsed 2 / 3 sentences, avg 887k, time 0.04736s
>>> ts.write(path='tsdb/current/matrix')

Note that processing does not write results to disk, but stores them in
memory. By writing with TestSuite's
:meth:`~delphin.itsdb.TestSuite.write` method using the `path`
parameter, the results can be written to a new profile.
Processing a testsuite that has a path (that is, backed by files on
disk) will write the results to disk. Processing an in-memory
testsuite will store the results in-memory. For other options please
see the API documentation for :meth:`TestSuite.process
<delphin.itsdb.TestSuite.process>`, specifically the `buffer_size`
parameter. When the results are all in-memory, you can write them
to disk with TestSuite's :meth:`~delphin.itsdb.TestSuite.write` method
with the `path` parameter.

.. warning::

PyDelphin does not prevent or warn you about overwriting skeletons or
gold profiles, so take care when using the `write()` method without
the `path` parameter.

If you have a testsuite object `ts` and call `ts.process()`, the
results of the processing will be stored in `ts`. For parsing this
isn't a problem, but when transfering or generating, you may want to
If you have a testsuite object `ts` and call `ts.process()`, both the
source items and the results are stored in `ts`. For parsing this
isn't a problem because the source items and results are located in
different tables, but for transfering or generating you may want to
use the `source` parameter in order to select inputs from a separate
testsuite than the one where results will be stored:

>>> from delphin.interfaces import ace
>>> from delphin import itsdb
>>> src_ts = itsdb.TestSuite('tsdb/current/mrs')
>>> tgt_ts = itsdb.TestSuite('tsdb/skeletons/mrs')
>>> tgt_ts = itsdb.TestSuite('tsdb/current/mrs-gen')
>>> with ace.AceGenerator('jacy-0.9.27.dat') as cpu:
... tgt_ts.process(cpu, source=src_ts)
...
Expand Down

0 comments on commit ed63805

Please sign in to comment.