-
-
Notifications
You must be signed in to change notification settings - Fork 33
Documentation
- Java 8 only (JDK 11 now for head revision)
- Only tested on *nix platforms YMMV elsewhere
To process data and turn it into documents in a search index with JesterJ the following general process is intended for basic usage:
- Identify your data source(s) (Filesystem, Database, etc)
- Identify your output targets (Search index/collections you wish to populate)
- Write a very simple Java class that implements PlanProvider and
- Configures a Scanner to read from your source(s)
- Configures some DocumentProcessors to parse/manipulate/enrich the data on it's way to the search index
- Configures a processor that writes to a search engine
- Make a Jar containing this single java class
- Pass the location of that jar file to jesterj on startup
- Begin searching your data.
As your transformations get more interesting and perhaps require custom code, that code is meant to be plugged in by simply implementing DocumentProcessor interface, configuring your custom class in the PlanProvider implementation like any of the standard processors, and then including your custom class in the same jar file as the plan. The intention is that no special configuration or changes other than actually writing your code are required.
The present directory structure in the repository was created long ago, and was very forward looking. Presently the only area of any interest is in /code/ingest. An example is available in /code/examples/shakespeare Everything else should be ignored for now.
- import jesterj classes into your java project (i.e.
compile ('org.jesterj:jesterj-ingest:1.0-beta2')
for gradle) - In your project create a java class that implements PlanProvider
- Annotate your PlanProvider class with @JavaPlanConfig
- In your project produce a jar file that contains your PlanProvider (myPlan.jar)
- Download the "node" jar for our latest release
- Run:
java -jar jesterj-node-1.0-beta2.jar myPlan.jar NODENAME NODEPASS
NODENAME and NODEPASS are not actually in use, in the future they will serve as a means for identifying instances (nodes) and controlling access to the cooperating set of nodes. But that's all still in the works. For now I just use "foo" and "bar".
The convenient thing about this is you can also include your custom processors or scanners (see below) in the myPlan.jar. JesterJ will scan the jar and execute the plan from the first annotated class. This may sound restrictive, but it's worth remembering that plans are Directed Acyclic graphs and need not be fully connected, so any number of pipelines can be included in one plan. Also since it's java you are free to create other cooperating classes that contribute portions of the config if one java class seems to be getting unwieldy.
In the event that you need to add dependencies to support your plan's classes they need to be added to the JesterJ node jar file. For example if you use JDBC scanner you need to add a driver jar like this
$ mkdir lib
$ cp /Users/gus/.gradle/<blah blah>/postgresql-42.2.5.jar lib
$ jar uf jesterj-node-1.0-beta2.jar lib/postgresql-42.2.5.jar
Comming soon (alongside JDK11 support) you will be able to package your code and it's dependencies using uno-jar, eliminating the need to customize our node jar entirely.