-
-
Notifications
You must be signed in to change notification settings - Fork 29
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The whole documentation is rewritten in this change. Closes: #147
- Loading branch information
Showing
44 changed files
with
643 additions
and
783 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
.. include:: /Includes.rst.txt | ||
.. index:: Configuration | ||
.. _configuration-tika-check: | ||
|
||
Check if it works | ||
================= | ||
|
||
TYPO3 Reports | ||
------------- | ||
|
||
First of all check the TYPO3 Reports module for any errors reported by the extension. | ||
You will find them as reported from "Apache Tika". | ||
|
||
The extension checks whether you have Java installed when using the Tika app or Tika server. | ||
|
||
It will also check your configuration, whether the configured paths for Tika app and Tika server are | ||
available and whether Tika Server and Solr server can be reached depending on what you're using. | ||
|
||
If all is configured as expected, you'll get following in TYPO3 Reports: | ||
|
||
.. figure:: /Images/BE_Reports_Tika_OK.png | ||
:class: with-shadow | ||
:alt: EXT:tika Check configs - OK | ||
|
||
EXT:tika Check configs - OK | ||
|
||
Real test via Tika Preview | ||
-------------------------- | ||
|
||
If all is fine, you can try to extract really via :ref:`Tika Preview <index-editors-and-tika-preview>` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
.. include:: /Includes.rst.txt | ||
|
||
.. _configuration: | ||
|
||
============= | ||
Configuration | ||
============= | ||
|
||
All the settings for the extension can be made through the TYPO3 Extension Configuration module. | ||
Simply select what service you would like to use, either | ||
|
||
* *Tika App(not recommended)* | ||
* *Tika Server(recommended)* | ||
* *Solr Server*. | ||
|
||
Depending on that, configure the necessary settings for your service on the | ||
according settings tab. | ||
|
||
About Tika variants | ||
=================== | ||
|
||
Each variant has its advantages and its drawbacks. | ||
|
||
App - variant (not recommended) | ||
------------------------------- | ||
|
||
So for example the App requires Java Runtime to exec and spawn a new java process for each processed file, | ||
but no network traffic for send files via wire. | ||
|
||
Solr Cell - variant | ||
------------------- | ||
|
||
Apache Solr Content Extraction Library (Solr Cell) variant does not support all the features supported by the App and by Server variants, | ||
but does not require to run and maintain any additional service/stack, if EXT:solr is already configured. | ||
Any connection/core used by EXT:solr can be reused there. | ||
Possible implications can be found on `Apache Solr docs page <https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html#solr-cell-performance-implications>`_ | ||
|
||
Server - variant (recommended) | ||
------------------------------ | ||
|
||
The Server variant is the best one by set on supported features and is more performant as the App, | ||
but requires additional service and maintenance. | ||
|
||
.. toctree:: | ||
:maxdepth: 5 | ||
:titlesonly: | ||
|
||
TikaApp | ||
TikaServer | ||
SolrCell | ||
Check | ||
TikaAllServices | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
.. include:: /Includes.rst.txt | ||
.. index:: Configuration | ||
.. _configuration-tika-solr-cell: | ||
|
||
|
||
Configuration of Solr Cell | ||
========================== | ||
|
||
Requirements | ||
------------ | ||
|
||
* Running and configured Apache Solr service. | ||
|
||
.. tip:: | ||
|
||
For example `the dkds Hosted-Solr <https://hosted-solr.com/en/>`_ | ||
|
||
* Setting EXT:tika to use the Apache Solr server connection. | ||
|
||
Setup EXT:tika for Solr Server | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Open Extension settings for EXT:tika **General** tab and choose **"Solr Server"** as **Extractor.** | ||
|
||
|
||
.. figure:: /Images/BE_Settings_ExtensionConfiguration_General.png | ||
:class: with-shadow | ||
:alt: Extension configuration for EXT:tika - Choosing Solr Server extractor in General tab | ||
:width: 60% | ||
|
||
Extension configuration for EXT:tika - Choosing Solr Server extractor in General tab | ||
|
||
|
||
After that open the **Solr** tab and paste the connection infos/datas according fields. | ||
|
||
|
||
.. figure:: /Images/BE_Settings_ExtensionConfiguration_Solr.png | ||
:class: with-shadow | ||
:alt: Extension configuration for EXT:tika - Provide the connection infos/datas for Solr Server | ||
|
||
Extension configuration for EXT:tika - Provide the connection infos/datas for Solr Server | ||
|
||
|
||
.. tip:: | ||
|
||
All settings of Solr accept the :php:`%env(<SOME_SOLR_ENV_VAR>)%` syntax like on site-config now. | ||
|
||
If the settings for :php:`solrUsername` or :php:`solrPassword` do not contain the :php:`%env(<SOME_SOLR_ENV_VAR>)%`, | ||
then they are blinded/hidden, to avoid the accidental release of secrets and credentials via TYPO3 backend configuration Tools like: | ||
|
||
* Extension Settings module | ||
* Configuration module | ||
|
||
|
||
See :ref:`Check if it works <configuration-tika-check>` for test instructions. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
.. include:: /Includes.rst.txt | ||
.. index:: Configuration | ||
.. _configuration-tika-services: | ||
|
||
|
||
Configuring Tika Services | ||
========================= | ||
|
||
**General information about how to configure the Tika Services can be found in the** | ||
`official Tika documentation <https://tika.apache.org/1.28/configuring.html>`_ | ||
|
||
.. tip:: | ||
|
||
The :file:`tika-config.xml` can be applied on all variants of Tika services. | ||
|
||
In case you want to exclude certain mime types from being processed by Tika, | ||
you can do the following: | ||
|
||
Create the file :file:`/etc/tika/tika-config.xml` with this content: | ||
|
||
.. code-block:: xml | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<properties> | ||
<parsers> | ||
<parser class="org.apache.tika.parser.DefaultParser"> | ||
<mime-exclude>application/zip</mime-exclude> | ||
</parser> | ||
<parser class="org.apache.tika.parser.EmptyParser"> | ||
<mime>application/zip</mime> | ||
</parser> | ||
</parsers> | ||
</properties> | ||
This tells Tika to exclude zip files from DefaultParser and use EmptyParser instead, | ||
who does basically nothing. | ||
|
||
Apply tika-config.xml | ||
--------------------- | ||
|
||
.. tip:: | ||
|
||
`Tika docs "Using a Tika Configuration XML file" <https://tika.apache.org/1.28/configuring.html#Using_a_Tika_Configuration_XML_file>`_ | ||
provides information how to apply the tika-config.xml file, however pan_env can make the things simpler. | ||
|
||
Adding following line to :file:`/etc/security/pam_env.con`, makes the TIKA_CONFIG env variable global on host. | ||
|
||
.. code-block:: bash | ||
TIKA_CONFIG DEFAULT="/etc/tika/tika-config.xml" | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
.. include:: /Includes.rst.txt | ||
.. index:: Configuration | ||
.. _configuration-tika-app: | ||
|
||
|
||
Configuration of Tika App (not recommended) | ||
=========================================== | ||
|
||
Requirements | ||
------------ | ||
|
||
* Java runtime on host TYPO3 is running on. Please refer to the Apache Tika docs or other sources. | ||
* Tika App jar file. See: Download instructions | ||
* Setting EXT:tika to use the downloaded jar file for data extraction. | ||
|
||
Download Tika App | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
Following command will download and verify the integrity of :file:`tika-app-<required-version>.jar` file in :file:`/opt/tika` directory. | ||
|
||
.. code-block:: bash | ||
composer --working-dir="$(composer config vendor-dir)/apache-solr-for-typo3/tika" tika:download:app -- -C -D /opt/tika | ||
# or alternatively, change into the EXT:tika directory and run | ||
# composer tika:download:app -- -C -D /opt/tika | ||
Setup EXT:tika for Tika App | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Open Extension settings for EXT:tika **General** tab and choose **"Tika App"** as **Extractor.** | ||
|
||
|
||
.. figure:: /Images/BE_Settings_ExtensionConfiguration_General.png | ||
:class: with-shadow | ||
:alt: Extension configuration for EXT:tika - Choosing App extractor in General tab | ||
:width: 60% | ||
|
||
Extension configuration for EXT:tika - Choosing App extractor in General tab | ||
|
||
|
||
After that open the **Jar** tab and paste the path from downloaded :file:`tika-app-<required-version>.jar` into **Tika App Jar Path** input field. | ||
|
||
|
||
.. figure:: /Images/BE_Settings_ExtensionConfiguration_Jar.png | ||
:class: with-shadow | ||
:alt: Extension configuration for EXT:tika - Provide the path to downloaded App file | ||
|
||
Extension configuration for EXT:tika - Provide the path to downloaded App file | ||
|
||
|
||
See :ref:`Check if it works <configuration-tika-check>` for test instructions. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
.. include:: /Includes.rst.txt | ||
.. index:: Configuration | ||
.. _configuration-tika-server: | ||
|
||
|
||
Configuration of Tika Server | ||
============================ | ||
|
||
Requirements | ||
------------ | ||
|
||
* Running and configured Apache Tika service. | ||
For example `the docker container <https://hub.docker.com/r/apache/tika>`_ | ||
|
||
.. note:: | ||
|
||
It is possible to run and manage the Tika Server on TYPO3 host, **(not recommended)**. | ||
if the "Tika Server Jar Path" is provided. | ||
|
||
**This feature is still available but will be removed soon.** | ||
See: `#135 <https://github.com/TYPO3-Solr/ext-tika/issues/135>`_ | ||
|
||
.. seealso:: | ||
|
||
Refer to our `solr-ddev-site Tika integration <https://github.com/TYPO3-Solr/solr-ddev-site/tree/main/packages/introduction_tika>`_ | ||
to setup the Tika Service via Docker on hosts with ARM-Based processors. | ||
|
||
* Setting EXT:tika to use the Apache Tika server connection. | ||
|
||
Setup EXT:tika for Tika Server | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Open Extension settings for EXT:tika **General** tab and choose **"Tika Server"** as **Extractor.** | ||
|
||
|
||
.. figure:: /Images/BE_Settings_ExtensionConfiguration_General.png | ||
:class: with-shadow | ||
:alt: Extension configuration for EXT:tika - Choosing Server extractor in General tab | ||
:width: 60% | ||
|
||
Extension configuration for EXT:tika - Choosing Server extractor in General tab | ||
|
||
|
||
After that open the **Server** tab and paste the connection infos/datas according fields. | ||
|
||
|
||
.. figure:: /Images/BE_Settings_ExtensionConfiguration_Server.png | ||
:class: with-shadow | ||
:alt: Extension configuration for EXT:tika - Provide the connection infos/datas for Tika Server | ||
|
||
Extension configuration for EXT:tika - Provide the connection infos/datas for Tika Server | ||
|
||
See :ref:`Check if it works <configuration-tika-check>` for test instructions. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
.. include:: /Includes.rst.txt | ||
|
||
.. _for-editors: | ||
|
||
=========== | ||
For Editors | ||
=========== | ||
|
||
.. _index-editors-and-tika-preview: | ||
|
||
Tika Preview | ||
------------ | ||
|
||
The editors can preview the extractable contents and meta-data in Filelist BE module | ||
in context menu of a file: | ||
|
||
.. figure:: /Images/TikaPreviewOnFileContextMenu.png | ||
:class: with-shadow | ||
:alt: File context menu - Tika Preview | ||
|
||
Tika Preview button on file context menu. | ||
|
||
By clicking on "Tika Preview" button the file will be processed by Tika and the extracted data will be listed in pop-up window. | ||
This pop-up window contains the extracted file contents and meta-data: | ||
|
||
.. figure:: /Images/TikaPreviewExtractedData.png | ||
:class: with-shadow | ||
:alt: Extracted data from Tika Preview | ||
|
||
Extracted data from Tika Preview. | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Oops, something went wrong.