Skip to content

Commit

Permalink
[DOC] reafactor the docs
Browse files Browse the repository at this point in the history
The whole documentation is rewritten in this change.

Closes: #147
  • Loading branch information
dkd-kaehm committed Oct 20, 2023
1 parent 6618965 commit c899cf8
Show file tree
Hide file tree
Showing 44 changed files with 643 additions and 783 deletions.
4 changes: 3 additions & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
/.gitattributes export-ignore
/.github export-ignore
/.styleci.yml export-ignore
/Build/ export-ignore
# Do not add Build/Helpers, it is used by composer.json
/Build/Test/ export-ignore
/Build/generate_documentation.sh export-ignore
/CONTRIBUTING.md export-ignore
/Dockerfile export-ignore
/Tests/ export-ignore
14 changes: 7 additions & 7 deletions Build/Helpers/download_tika_binaries.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ Help()
cat <<-EOF
Usage:
$(basename ${COMPOSER_BINARY}) tika:download
$(basename ${COMPOSER_BINARY}) tika:download [--] [<flags>]
$(basename ${COMPOSER_BINARY}) tika:download [--] [<flags>] [<option> <parameter>]
$(basename "${COMPOSER_BINARY}") tika:download
$(basename "${COMPOSER_BINARY}") tika:download [--] [<flags>]
$(basename "${COMPOSER_BINARY}") tika:download [--] [<flags>] [<option> <parameter>]
Options:
--tika-version <tika-version> Specific TIKA version. Default: ${REQUIRED_TIKA_VERSION}
Expand All @@ -27,10 +27,10 @@ Flags:
Note: imports Apaches TIKA public keys
Examples:
$(basename ${COMPOSER_BINARY}) tika:download -- -D /tmp/tika-jars
$(basename ${COMPOSER_BINARY}) tika:download -- -D /tmp/tika-jars
$(basename ${COMPOSER_BINARY}) tika:download -- -D /tmp/tika-jars -C -a
$(basename ${COMPOSER_BINARY}) tika:download -- -D /tmp/tika-jars -C -a --tika-version 1.24.1
$(basename "${COMPOSER_BINARY}") tika:download -- -D /tmp/tika-jars
$(basename "${COMPOSER_BINARY}") tika:download -- -D /tmp/tika-jars
$(basename "${COMPOSER_BINARY}") tika:download -- -D /tmp/tika-jars -C -a
$(basename "${COMPOSER_BINARY}") tika:download -- -D /tmp/tika-jars -C -a --tika-version 1.24.1
EOF
exit
Expand Down
30 changes: 30 additions & 0 deletions Documentation/Configuration/Check.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
.. include:: /Includes.rst.txt
.. index:: Configuration
.. _configuration-tika-check:

Check if it works
=================

TYPO3 Reports
-------------

First of all check the TYPO3 Reports module for any errors reported by the extension.
You will find them as reported from "Apache Tika".

The extension checks whether you have Java installed when using the Tika app or Tika server.

It will also check your configuration, whether the configured paths for Tika app and Tika server are
available and whether Tika Server and Solr server can be reached depending on what you're using.

If all is configured as expected, you'll get following in TYPO3 Reports:

.. figure:: /Images/BE_Reports_Tika_OK.png
:class: with-shadow
:alt: EXT:tika Check configs - OK

EXT:tika Check configs - OK

Real test via Tika Preview
--------------------------

If all is fine, you can try to extract really via :ref:`Tika Preview <index-editors-and-tika-preview>`
53 changes: 53 additions & 0 deletions Documentation/Configuration/Index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
.. include:: /Includes.rst.txt

.. _configuration:

=============
Configuration
=============

All the settings for the extension can be made through the TYPO3 Extension Configuration module.
Simply select what service you would like to use, either

* *Tika App(not recommended)*
* *Tika Server(recommended)*
* *Solr Server*.

Depending on that, configure the necessary settings for your service on the
according settings tab.

About Tika variants
===================

Each variant has its advantages and its drawbacks.

App - variant (not recommended)
-------------------------------

So for example the App requires Java Runtime to exec and spawn a new java process for each processed file,
but no network traffic for send files via wire.

Solr Cell - variant
-------------------

Apache Solr Content Extraction Library (Solr Cell) variant does not support all the features supported by the App and by Server variants,
but does not require to run and maintain any additional service/stack, if EXT:solr is already configured.
Any connection/core used by EXT:solr can be reused there.
Possible implications can be found on `Apache Solr docs page <https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html#solr-cell-performance-implications>`_

Server - variant (recommended)
------------------------------

The Server variant is the best one by set on supported features and is more performant as the App,
but requires additional service and maintenance.

.. toctree::
:maxdepth: 5
:titlesonly:

TikaApp
TikaServer
SolrCell
Check
TikaAllServices

55 changes: 55 additions & 0 deletions Documentation/Configuration/SolrCell.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
.. include:: /Includes.rst.txt
.. index:: Configuration
.. _configuration-tika-solr-cell:


Configuration of Solr Cell
==========================

Requirements
------------

* Running and configured Apache Solr service.

.. tip::

For example `the dkds Hosted-Solr <https://hosted-solr.com/en/>`_

* Setting EXT:tika to use the Apache Solr server connection.

Setup EXT:tika for Solr Server
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Open Extension settings for EXT:tika **General** tab and choose **"Solr Server"** as **Extractor.**


.. figure:: /Images/BE_Settings_ExtensionConfiguration_General.png
:class: with-shadow
:alt: Extension configuration for EXT:tika - Choosing Solr Server extractor in General tab
:width: 60%

Extension configuration for EXT:tika - Choosing Solr Server extractor in General tab


After that open the **Solr** tab and paste the connection infos/datas according fields.


.. figure:: /Images/BE_Settings_ExtensionConfiguration_Solr.png
:class: with-shadow
:alt: Extension configuration for EXT:tika - Provide the connection infos/datas for Solr Server

Extension configuration for EXT:tika - Provide the connection infos/datas for Solr Server


.. tip::

All settings of Solr accept the :php:`%env(<SOME_SOLR_ENV_VAR>)%` syntax like on site-config now.

If the settings for :php:`solrUsername` or :php:`solrPassword` do not contain the :php:`%env(<SOME_SOLR_ENV_VAR>)%`,
then they are blinded/hidden, to avoid the accidental release of secrets and credentials via TYPO3 backend configuration Tools like:

* Extension Settings module
* Configuration module


See :ref:`Check if it works <configuration-tika-check>` for test instructions.
52 changes: 52 additions & 0 deletions Documentation/Configuration/TikaAllServices.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
.. include:: /Includes.rst.txt
.. index:: Configuration
.. _configuration-tika-services:


Configuring Tika Services
=========================

**General information about how to configure the Tika Services can be found in the**
`official Tika documentation <https://tika.apache.org/1.28/configuring.html>`_

.. tip::

The :file:`tika-config.xml` can be applied on all variants of Tika services.

In case you want to exclude certain mime types from being processed by Tika,
you can do the following:

Create the file :file:`/etc/tika/tika-config.xml` with this content:

.. code-block:: xml
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<mime-exclude>application/zip</mime-exclude>
</parser>
<parser class="org.apache.tika.parser.EmptyParser">
<mime>application/zip</mime>
</parser>
</parsers>
</properties>
This tells Tika to exclude zip files from DefaultParser and use EmptyParser instead,
who does basically nothing.

Apply tika-config.xml
---------------------

.. tip::

`Tika docs "Using a Tika Configuration XML file" <https://tika.apache.org/1.28/configuring.html#Using_a_Tika_Configuration_XML_file>`_
provides information how to apply the tika-config.xml file, however pan_env can make the things simpler.

Adding following line to :file:`/etc/security/pam_env.con`, makes the TIKA_CONFIG env variable global on host.

.. code-block:: bash
TIKA_CONFIG DEFAULT="/etc/tika/tika-config.xml"
52 changes: 52 additions & 0 deletions Documentation/Configuration/TikaApp.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
.. include:: /Includes.rst.txt
.. index:: Configuration
.. _configuration-tika-app:


Configuration of Tika App (not recommended)
===========================================

Requirements
------------

* Java runtime on host TYPO3 is running on. Please refer to the Apache Tika docs or other sources.
* Tika App jar file. See: Download instructions
* Setting EXT:tika to use the downloaded jar file for data extraction.

Download Tika App
~~~~~~~~~~~~~~~~~

Following command will download and verify the integrity of :file:`tika-app-<required-version>.jar` file in :file:`/opt/tika` directory.

.. code-block:: bash
composer --working-dir="$(composer config vendor-dir)/apache-solr-for-typo3/tika" tika:download:app -- -C -D /opt/tika
# or alternatively, change into the EXT:tika directory and run
# composer tika:download:app -- -C -D /opt/tika
Setup EXT:tika for Tika App
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Open Extension settings for EXT:tika **General** tab and choose **"Tika App"** as **Extractor.**


.. figure:: /Images/BE_Settings_ExtensionConfiguration_General.png
:class: with-shadow
:alt: Extension configuration for EXT:tika - Choosing App extractor in General tab
:width: 60%

Extension configuration for EXT:tika - Choosing App extractor in General tab


After that open the **Jar** tab and paste the path from downloaded :file:`tika-app-<required-version>.jar` into **Tika App Jar Path** input field.


.. figure:: /Images/BE_Settings_ExtensionConfiguration_Jar.png
:class: with-shadow
:alt: Extension configuration for EXT:tika - Provide the path to downloaded App file

Extension configuration for EXT:tika - Provide the path to downloaded App file


See :ref:`Check if it works <configuration-tika-check>` for test instructions.
53 changes: 53 additions & 0 deletions Documentation/Configuration/TikaServer.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
.. include:: /Includes.rst.txt
.. index:: Configuration
.. _configuration-tika-server:


Configuration of Tika Server
============================

Requirements
------------

* Running and configured Apache Tika service.
For example `the docker container <https://hub.docker.com/r/apache/tika>`_

.. note::

It is possible to run and manage the Tika Server on TYPO3 host, **(not recommended)**.
if the "Tika Server Jar Path" is provided.

**This feature is still available but will be removed soon.**
See: `#135 <https://github.com/TYPO3-Solr/ext-tika/issues/135>`_

.. seealso::

Refer to our `solr-ddev-site Tika integration <https://github.com/TYPO3-Solr/solr-ddev-site/tree/main/packages/introduction_tika>`_
to setup the Tika Service via Docker on hosts with ARM-Based processors.

* Setting EXT:tika to use the Apache Tika server connection.

Setup EXT:tika for Tika Server
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Open Extension settings for EXT:tika **General** tab and choose **"Tika Server"** as **Extractor.**


.. figure:: /Images/BE_Settings_ExtensionConfiguration_General.png
:class: with-shadow
:alt: Extension configuration for EXT:tika - Choosing Server extractor in General tab
:width: 60%

Extension configuration for EXT:tika - Choosing Server extractor in General tab


After that open the **Server** tab and paste the connection infos/datas according fields.


.. figure:: /Images/BE_Settings_ExtensionConfiguration_Server.png
:class: with-shadow
:alt: Extension configuration for EXT:tika - Provide the connection infos/datas for Tika Server

Extension configuration for EXT:tika - Provide the connection infos/datas for Tika Server

See :ref:`Check if it works <configuration-tika-check>` for test instructions.
31 changes: 31 additions & 0 deletions Documentation/Editor/Index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
.. include:: /Includes.rst.txt

.. _for-editors:

===========
For Editors
===========

.. _index-editors-and-tika-preview:

Tika Preview
------------

The editors can preview the extractable contents and meta-data in Filelist BE module
in context menu of a file:

.. figure:: /Images/TikaPreviewOnFileContextMenu.png
:class: with-shadow
:alt: File context menu - Tika Preview

Tika Preview button on file context menu.

By clicking on "Tika Preview" button the file will be processed by Tika and the extracted data will be listed in pop-up window.
This pop-up window contains the extracted file contents and meta-data:

.. figure:: /Images/TikaPreviewExtractedData.png
:class: with-shadow
:alt: Extracted data from Tika Preview

Extracted data from Tika Preview.

Binary file added Documentation/Images/ApacheTikaLogo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Documentation/Images/BE_Configuration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Documentation/Images/BE_Reports_Tika_OK.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Documentation/Images/TikaPreviewExtractedData.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Loading

0 comments on commit c899cf8

Please sign in to comment.