Skip to content
This repository has been archived by the owner on Jun 15, 2021. It is now read-only.

Using Local Source Content

Stéfan Sinclair edited this page Apr 19, 2018 · 2 revisions

N.B. This functionality is experimental and may change.

Voyant allows users to open an existing corpus or to create their own corpus by pasting in texts, URLs or by uploading documents. All of the corpus creation mechanisms require network transfers, which can significantly slow down the process.

VoyantServer allows you to provide a local source for documents, following a specific pattern. This is especially useful when you're integrating Voyant with an existing collection (it could be a subset of Gutenberg, for instance).

There are two important parameters for using local sources:

  1. localSource: a name (letter characters only) that defines the collection
  2. input one or more URLs that have the filename of the local source to use

VoyantServer has a data directory (by default it's a first-level subdirectory within the zip archive that you downloaded; the location can also be overridden in the server-settings.txt file). Within that you can create a directory called trombone-local-sources (if it's not there already) and within that would be a folder with the same name as the value you specify for localSource (you can have several such collections and local sources).

The URL for the input assumes one of two formats:

  1. It's either the filename (last part of the URL) which would be a file directly under the localSource folder
  2. It's a subdirectory that's defined by the path that follows the localSource in the URL.

Examples:

localSource=gutenberg
input=http://examples.com/austen.zip

There's a local file called data/trombone-local-sources/gutenberg/austen.zip (filename)

localSource=gutenberg
input=http://examples.com/texts/emma.txt

There's a local file called data/trombone-local-sources/gutenberg/emma.txt (filename)

localSource=gutenberg
input=http://examples.com/texts/gutenberg/19th/persuasion.pdf

There's a local file called data/trombone-local-sources/gutenberg/19th/persuasion.pdf (path)

Note that if the local file can't be found, an attempt is made to fetch the given URL if it's starts with http or https, so you can use this technique even if not all the files are locally available.

Note also that with both URL formats it's possible to provide multiple input values:

localSource=gutenberg
input=http://examples.com/texts/gutenberg/19th/persuasion.txt
input=http://examples.com/texts/gutenberg/19th/emma.txt

This functionality is not currently available in the production release but should be available with the next release.

Clone this wiki locally