praatIO

Questions? Comments? Feedback?

A library for working with praat, time aligned audio transcripts, and audio files that comes with batteries included.

Praat uses a file format called textgrids, which are time aligned speech transcripts. This library isn't just a data struct for reading and writing textgrids--many utilities are provided to make it easy to work with with transcripts and associated audio files. This library also provides some other tools for use with praat.

Praat is an open source software program for doing phonetic analysis and annotation of speech. Praat can be downloaded here

create or augment textgrids using data from other sources
found that you clipped your audio file five seconds early and have added it back to your wavefile but now your textgrid is misaligned? Add five seconds to every interval in the textgrid
```
tg = textgrid.openTextgrid("path_to_textgrid", False)
moddedTG = tg.editTimestamps(5)
moddedTG.save('output_path_to_textgrid', 'long_textgrid', True)
```

utilize the klattgrid interface to raise all speech formants by 20%

kg = klattgrid.openKlattGrid("path_to_klattgrid")
incrTwenty = lambda x: x * 1.2
kg.tierDict["oral_formants"].modifySubtiers("formants",incrTwenty)
kg.save(join(outputPath, "bobby_twenty_percent_less.KlattGrid"))

replace labeled segments in a recording with silence or delete them
- see /examples/deleteVowels.py
use set operations (union, intersection, difference) on textgrid tiers
- see /examples/textgrid_set_operations.py
see /praatio/praatio_scripts.py for various ready-to-use functions such as
- splitAudioOnTier(): split an audio file into chunks specified by intervals in one tier
- spellCheckEntries(): spellcheck a textgrid tier
- tgBoundariesToZeroCrossings(): adjust all boundaries and points to fall at the nearest zero crossing in the corresponding audio file
- alignBoundariesAcrossTiers(): for handmade textgrids, sometimes entries may look as if they are aligned at the same time but actually are off by a small amount, this will correct them

Output types

PraatIO supports 4 textgrid output file types: short textgrid, long textgrid, json, and textgrid-like json.

Short textgrids and long textgrids are both formats that are natively supported by praat. Short textgrids are meant to be more concise while long textgrids are meant to be more human-readable. For more information on these file formats, please see praat's official documentation

JSON and textgrid-like JSON are more developer-friendly formats, but they are not supported by praat. The default JSON format is more minimal while the textgrid-like JSON is formatted with information similar to a textgrid file.

The default JSON format does not support one use-case: a textgrid has a specified minimum and maximum timestamp. The textgrid's tiers also have a specified minimum and maximum timestamp. Under most circumstances, they are the same, but the user can specify them to be different and praat will respect this. If you have such textgrids, you should use the textgrid-like JSON.

Here is the schema for the JSON output file:

{
    "start": 0.0,
    "end": 1.8,
    "tiers": {
        "phone": {
            "type": "IntervalTier",
            "entries": [[0.0, 0.3, ""], [0.3, 0.38, "m"]]
        },
        "pitch": {
            "type": "TextTier",
            "entries": [[0.32, "120"], [0.37, "85"]]
        }
    }
}

Here is the schema for the Textgrid-like JSON output file. Notably, tiers is a list of hashes, rather than a hash of hashes. Also, each tier specifies it's name, and a min and max time.

{
    "xmin": 0.0,
    "xmax": 1.8,
    "tiers": [
        {
            "class": "IntervalTier",
            "name": "phone",
            "xmin": 0.0,
            "xmax": 1.8,
            "entries": [[0.0, 0.3, ""], [0.3, 0.38, "m"]]
        },
        {
            "class": "TextTier",
            "name": "pitch",
            "xmin": 0.0,
            "xmax": 1.8,
            "entries": [[0.32, "120"], [0.37, "85"]]
        }
    ]
}

Citing praatIO

PraatIO is general purpose coding and doesn't need to be cited but if you would like to, it can be cited like so:

Tim Mahrt. PraatIO. https://github.com/timmahrt/praatIO, 2016.

Acknowledgements

Development of PraatIO was possible thanks to NSF grant BCS 12-51343 to Jennifer Cole, José I. Hualde, and Caroline Smith and to the A*MIDEX project (n° ANR-11-IDEX-0001-02) to James Sneed German funded by the Investissements d'Avenir French Government program, managed by the French National Research Agency (ANR).

Name		Name	Last commit message	Last commit date
Latest commit History 448 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
praatio		praatio
tests		tests
tutorials		tutorials
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DEVELOP.md		DEVELOP.md
LICENSE		LICENSE
README.md		README.md
UPGRADING.md		UPGRADING.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

praatIO

Table of contents

Documentation

Tutorials

Version History

Requirements

Installation

Upgrading

Usage

Common Use Cases

Output types

Citing praatIO

Acknowledgements

About

Releases

Packages

Used by 336

Contributors 4

Languages

License

timmahrt/praatIO

Folders and files

Latest commit

History

Repository files navigation

praatIO

Table of contents

Documentation

Tutorials

Version History

Requirements

Installation

Upgrading

Usage

Common Use Cases

Output types

Citing praatIO

Acknowledgements

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Used by 336

Contributors 4

Languages

Packages