Skip to content

Commit

Permalink
Revert "Fixes #1372: apoc.load.html ability to read runtime structure…
Browse files Browse the repository at this point in the history
… of the page (#1990)" (#2005) (#2006)

This reverts commit 8b64f01.

Co-authored-by: Andrea Santurbano <[email protected]>
  • Loading branch information
2 people authored and conker84 committed Jun 16, 2021
1 parent c2be0bc commit 9b77682
Show file tree
Hide file tree
Showing 19 changed files with 79 additions and 903 deletions.
64 changes: 1 addition & 63 deletions docs/asciidoc/modules/ROOT/partials/usage/apoc.load.html.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -224,66 +224,4 @@ a|
]
}
----
|===

If we have a `.html` file with a jQuery script like:

[source,html]
----
<!DOCTYPE html>
<head>
<script type="text/javascript">
$(() => {
var newP = document.createElement("strong");
var textNode = document.createTextNode("This is a new text node");
newP.appendChild(textNode);
document.getElementById("appendStuff").appendChild(newP);
});
</script>
<meta charset="UTF-8"/>
</head>
<body onLoad="loadData()" class="mediawiki ltr sitedir-ltr mw-hide-empty-elt ns-0 ns-subject page-Aap_Kaa_Hak rootpage-Aap_Kaa_Hak skin-vector action-view">
<div id="appendStuff"></div>
</body>
</html>
----

we can read the generated js through the `browser` config.
Note that to use a browser, you have to install <<selenium-depencencies,this dependencies>>:

[source,cypher]
----
CALL apoc.load.html("test.html",{strong: "strong"}, {browser: "FIREFOX"});
----
.Results
[opts="header"]
|===
| Output
a|
[source,json]
----
{
"strong": [
{
"tagName": "strong",
"text": "This is a new text node"
}
]
}
----
|===

If we can parse a tag from a slow async call, we can use `wait` config to waiting for 10 second (in this example):

[source,cypher]
----
CALL apoc.load.html("test.html",{asyncTag: "#asyncTag"}, {browser: "FIREFOX", wait: 10});
----

[[selenium-depencencies]]
== Dependencies

To use the `apoc.load.html` proceduree with `browser` config (not `NONE`), you have to add additional dependencies.

This dependency is included in https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/{apoc-release}/apoc-selenium-dependencies-{apoc-release}.jar[apoc-selenium-dependencies-{apoc-release}.jar^], which can be downloaded from the https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/tag/{apoc-release}[releases page^].
Once that file is downloaded, it should be placed in the `plugins` directory and the Neo4j Server restarted.
|===
Original file line number Diff line number Diff line change
@@ -1,19 +1,10 @@
The procedure support the following config parameters:

.Config parameters
[opts="header",cols="1m,2m,1m,4"]
[opts=header]
|===
| name | type | default | description
| browser | Enum [NONE, CHROME, FIREFOX] | NONE | If it is set to "CHROME" or "FIREFOX", is used https://www.selenium.dev/documentation/en/webdriver/[Selenium Web Driver] to read the dynamically generated js.
In case it is "NONE" (default), it is not possible to read dynamic contents.
Note that to use the Chrome or Firefox driver, you need to have them installed on your machine and you have to download additional jars into the plugin folder. <<selenium-depencencies, See below>>
| wait | long | 0 | If greater than 0, it waits until it finds at least one element for each of those entered in the query parameter
(up to a maximum of defined seconds, otherwise it continues execution).
Useful to handle elements which can be rendered after the page is loaded (i.e. slow asynchronous calls).
| charset | String | "UTF-8" | the character set of the page being scraped, if `http-equiv` meta-tag is not set.
| headless | boolean | true | Valid with `browser` not equal to `NONE`, allow to run browser in https://chromium.googlesource.com/chromium/src/+/lkgr/headless/README.md[headless mode],
that is without actually opening the browser UI (recommended).
| acceptInsecureCerts | boolean | true | If true, allow to read html from insecure certificates
| charset | String | "UTF-8" | the character set of the page being scraped
| baseUri | String | "" | Base URI used to resolve relative paths
| failSilently | Enum [FALSE, WITH_LOG, WITH_LIST] | FALSE | If the parse fails with one or more elements, using `FALSE` it throws a `RuntimeException`, using `WITH_LOG` a `log.warn` is created for each incorrect item and using `WITH_LIST` an `errorList` key is added to the result with the failed tags.
|===
22 changes: 0 additions & 22 deletions extra-dependencies/selenium/build.gradle

This file was deleted.

Empty file.

This file was deleted.

183 changes: 0 additions & 183 deletions extra-dependencies/selenium/gradlew

This file was deleted.

Loading

0 comments on commit 9b77682

Please sign in to comment.