Skip to content

Commit

Permalink
DHFPROD-646 3.x documentation, Tutorial
Browse files Browse the repository at this point in the history
Additional updates to tutorial from this task.
Other updates to tutorial were done here: Marklogic-retired#924
Comments added for TODOs.
  • Loading branch information
wooldridge committed Apr 24, 2018
1 parent 2286d3e commit 6ca9a99
Show file tree
Hide file tree
Showing 8 changed files with 35 additions and 21 deletions.
14 changes: 8 additions & 6 deletions _pages/tutorial/3x/3x.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ lead_text: ''
permalink: /tutorial/
---

## Introducing QuickStart

This tutorial uses QuickStart, an easy-to-use user interface that you can run locally to start working with the Data Hub Framework quickly. With QuickStart, you will have a working data hub in a matter of minutes. No need to worry about deployment strategies or configuration details. Simply run the QuickStart .war (Java web application archive) and point it at your MarkLogic installation.

_QuickStart is a DevOps tool. It is meant to be run on your development machine to aid you in quickly deploying your hub._

## Before You Start
You might want to check out our high-level introductions before starting this tutorial:

Expand All @@ -27,11 +33,6 @@ You will take the following approach:
### In a Hurry?
The finished version of this tutorial is available for you to download and play with: [Finished Online Shopping Hub Example](https://github.com/marklogic-community/marklogic-data-hub/tree/develop/examples/online-store){:target="_blank"}

### Introducing QuickStart
This tutorial uses QuickStart, an easy-to-use user interface that you can run locally to start working with the Data Hub Framework quickly. With QuickStart, you will have a working data hub in a matter of minutes. No need to worry about deployment strategies or configuration details. Simply run the QuickStart .war (Java web application archive) and point it at your MarkLogic installation.

_QuickStart is a DevOps tool. It is meant to be run on your development machine to aid you in quickly deploying your hub._

## Prerequisites

Before you can begin this tutorial and work with the Data Hub Framework, you need to have some software installed.
Expand Down Expand Up @@ -64,12 +65,13 @@ Before you can begin this tutorial and work with the Data Hub Framework, you nee
Chrome or Firefox works best. Use IE at your own risk.

## Common Concerns
**I have a MarkLogic instance but it already has awesome stuff in it. Will this tutorial mess that up?**
**I have a MarkLogic instance, but it already has awesome stuff in it. Will this tutorial mess that up?**
No. The Data Hub Framework is installed on isolated databases and application servers. It is possible that the default DHF ports (8010, 8011, 8012, 8013) may already be in use. In that case you will be warned about the conflicts and given the opportunity to change them. The DHF will not harm any existing settings.

**How difficult is it to remove this tutorial when I am finished?**
Easy. Just click Settings at the top of QuickStart and then click Uninstall.

<!--- DHFPROD-646 TODO add navigation to the header/footer of tutorial pages to avoid having to click back to the TOC -->

## Table of Contents
1. [Install the Data Hub Framework](./install/)
Expand Down
25 changes: 15 additions & 10 deletions _pages/tutorial/3x/harmonizing-order-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,19 @@ Note that this time we used the default option of **Create Structure from Entity

### Collector Plugin

Because each order can consist of multiple rows which are then turned into multiple documents in MarkLogic, we cannot do a 1:1 mapping like we did for products. This means we cannot simply return a list of URIs. Instead we need to return a unique list of all of the values from the relation **id** column.
Because each order can consist of multiple rows which are then turned into multiple documents in MarkLogic, we cannot do a 1:1 mapping like we did for products. This means we cannot simply return a list of URIs. Instead, we need to return a unique list of all of the values from the relation **id** column.

We use the [jsearch library](https://docs.marklogic.com/guide/search-dev/javascript) to run our query.
We can use the [jsearch library](https://docs.marklogic.com/guide/search-dev/javascript) library to run our query. The following code finds all the values of id in the Order collection:

This code is simply returning all unique values in the **id** field. The one tricky bit is the `slice()` call:
```$javascript
jsearch
.values('id')
.where(cts.collectionQuery(options.entity))
.slice(0, Number.MAX_SAFE_INTEGER)
.result();
```

`.slice(0, Number.MAX_SAFE_INTEGER)`

By default jsearch will paginate results. The slice is telling it to return all results from 0 to a really big number.
By default jsearch will paginate results. The `slice()` call tells jsearch to return all results from 0 to a really big number.

Here is the final collector.sjs code:

Expand All @@ -53,14 +57,16 @@ Here is the final collector.sjs code:
### Content Plugin
For the Order entity, the id is the id from the original relational system. Instead of a 1:1 mapping of source documents, we must find all source documents that match the given id.

After we get all of the matching documents we must then build up an array of the products while also summing the total price.
After we get all of the matching documents we must build up an array of the products while also summing the total price.

Once again we use the [jsearch library](https://docs.marklogic.com/guide/search-dev/javascript) to run our query.
Once again, we use the [jsearch library](https://docs.marklogic.com/guide/search-dev/javascript) to run our query.

Note how we query all Order documents containing the matching id. We use the `map` function to extract out the original content (stored in the instance part of the envelope). The `orders` variable will contain an array of original JSON objects.
Note how the `createContent()` function queries all Order documents containing the matching id. We use the `map` function to extract out the original content (stored in the instance part of the envelope). The `orders` variable will contain an array of original JSON objects.

You can also see how we iterate over the orders to sum up the price and add pointers to the Product entities into the `products` array.

<!--- DHFPROD-646 TODO https://github.com/marklogic/marklogic-data-hub/issues/790#issuecomment-373142377 -->

The final content plugin looks like:

<div class="embed-git lang-js" href="//raw.githubusercontent.com/marklogic-community/marklogic-data-hub/develop/examples/online-store/plugins/entities/Order/harmonize/Harmonize Orders/content/content.sjs"></div>
Expand Down Expand Up @@ -91,7 +97,6 @@ You might also want to explore your harmonized data.

1. <i class="fa fa-hand-pointer-o"></i> Click **Browse Data**.
1. Change the database to **Final**.
1. <i class="fa fa-hand-pointer-o"></i> Click **Search**{:.blue-button}.
1. <i class="fa fa-hand-pointer-o"></i> Click the **Order** facet to filter the results.

You should see harmonized documents in the search results.
Expand Down
9 changes: 5 additions & 4 deletions _pages/tutorial/3x/harmonizing-product-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ Now that we have modeled the Product entity we can use the Data Hub Framework's

This time we want to use the default option of **Create Structure from Entity Definition**. This means that the Data Hub Framework will create boilerplate code based on our entity model. The code will prepopulate the fields we need to add.

<!--- DHFPROD-646 TODO Pre-populate them in what/where? Add to what/where? -->

![Create Product Harmonize Flow]({{site.baseurl}}/images/3x/harmonizing-product-data/create-product-harmonize-flow.png)

1. <i class="fa fa-hand-pointer-o"></i> Click the **Harmonize Products** flow.
Expand All @@ -25,9 +27,9 @@ You can run the harmonize flow from the **Flow Info** tab. The other tabs allow

![Harmonize Flow Overview]({{site.baseurl}}/images/3x/harmonizing-product-data/harmonize-flow-overview.png)

Harmonize flows were designed to be run as batch jobs. To support this batch running, the Data Hub Framework exposes a collector plugin whose purpose is to return a list of things to batch over. The Data Hub Framework then breaks the list of things into parallel batches of a configurable size and sends each and every single thing to the (content, headers, triples, writer) plugins as a transaction. The main plugin receives id values from the collector and orchestrates the behavior of the other plugins.
Harmonize flows were designed to be run as batch jobs. To support this batch running, the Data Hub Framework exposes a collector plugin whose purpose is to return a list of things to operate on. The Data Hub Framework then breaks the list of things into parallel batches of a configurable size and sends each and every single thing to the (content, headers, triples, writer) plugins as a transaction. The main plugin receives id values from the collector and orchestrates the behavior of the other plugins.

If you are not interested in running harmonization flows as batches we do provide ways for running them on-demand for single items.
If you are not interested in running harmonization flows as batches we do [provide ways](../../faqs/#how-can-i-run-a-harmonize-flow-immediately-for-1-document) for running them on-demand for single items.

![Harmonize Flow Overview]({{site.baseurl}}/images/3x/harmonizing-product-data/harmonize-flow-diagram.png)

Expand Down Expand Up @@ -66,7 +68,7 @@ The default options passed in to the plugin are:

### Content Plugin

The content code receives an id as the first parameter. This id happens to be the URI for a staging product document. The id can be anything: a URI, a relational row id, a twitter handle, a random number. It's up to you to decide how to use that id to harmonize your data.
The `createContent()` function receives an id as the first parameter. The id can be anything: a URI, a relational row id, a twitter handle, a random number. It's up to you to decide how to use that id to harmonize your data. For this flow, the id is the URI for a staging product document.

The only modification we need to make to this file is to change the way we look up the sku.

Expand Down Expand Up @@ -106,7 +108,6 @@ Now let's explore our harmonized data.

1. <i class="fa fa-hand-pointer-o"></i> Click **Browse Data** to view your data.
1. Select **Final** in the database menu.
1. <i class="fa fa-hand-pointer-o"></i> Click **Search**{:.blue-button}.

The search results should show the harmonized documents.

Expand Down
2 changes: 1 addition & 1 deletion _pages/tutorial/3x/load-products-as-is.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Similar to the Jobs view, the Traces view offers <strong>free-text search</stron

![Trace View]({{site.baseurl}}/images/3x/load-products-as-is/trace-view.png)

<i class="fa fa-hand-pointer-o"></i> Click one of the rows in the Traces table so see a detailed view of the trace.
<i class="fa fa-hand-pointer-o"></i> Click one of the rows in the Traces table to see a detailed view of the trace.

The trace detail view allows you to click each plugin in the flow to see the inputs and outputs. You can also see the **identifier** that was being processed as well as the time each plugin took to execute.

Expand Down
2 changes: 2 additions & 0 deletions _pages/tutorial/3x/serve-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ lead_text: ''
permalink: /tutorial/serve-data/
---

<!--- DHFPROD-646 https://github.com/marklogic/marklogic-data-hub/issues/790#issuecomment-373201418 -->

You have just successfully loaded two data sources and harmonized them both.

Now you can access your data via several REST endpoints. Your harmonized data is available on the Final HTTP server on port 8011 by default. A full list of REST endpoints is described in the [Client API documentation](https://docs.marklogic.com/REST/client){:target="_blank"}.
Expand Down
4 changes: 4 additions & 0 deletions _pages/tutorial/3x/wrapping-up.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ You just created a data hub.

A finished version of this tutorial is available here: [Finished Online Shopping Hub Example](https://github.com/marklogic-community/marklogic-data-hub/tree/develop/examples/online-store){:target="_blank"}

## Uninstalling QuickStart

You can uninstall QuickStart by clicking Settings in the top navigation and then Uninstall Hub on the page that appears.

## Next Steps

There are more resources available to help you on your MarkLogic Data Hub journey:
Expand Down
Binary file modified images/3x/harmonizing-order-data/harmonized-orders.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/3x/harmonizing-product-data/harmonized-products.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 6ca9a99

Please sign in to comment.