Skip to content

Commit

Permalink
TIKA-4138 -- move BoilerpipeContentHandler (#1355)
Browse files Browse the repository at this point in the history
  • Loading branch information
tballison authored Sep 22, 2023
1 parent 6871c91 commit e04c478
Show file tree
Hide file tree
Showing 14 changed files with 86 additions and 87 deletions.
5 changes: 5 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
Release 3.0.0-BETA - ??

BREAKING CHANGES

* Require Java 11 (TIKA-4128).

* The boilerpipe handler has been moved to tika-handler-boiler-pipe

Other Changes/Updates
* Fix bug in DateUtils that stripped timezone information from
incoming Calendar objects (TIKA-4126).

Expand Down
1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
<module>tika-example</module>
<module>tika-java7</module>
<module>tika-detectors</module>
<module>tika-handlers</module>
</modules>

<profiles>
Expand Down
2 changes: 1 addition & 1 deletion tika-app/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>tika-parser-html-commons</artifactId>
<artifactId>tika-handler-boilerpipe</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
Expand Down
2 changes: 1 addition & 1 deletion tika-bom/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parser-html-commons</artifactId>
<artifactId>tika-handler-boilerpipe</artifactId>
<version>3.0.0-SNAPSHOT</version>
</dependency>
<dependency>
Expand Down
2 changes: 1 addition & 1 deletion tika-bundles/tika-bundle-standard/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>tika-parser-html-commons</artifactId>
<artifactId>tika-handler-boilerpipe</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
Expand Down
2 changes: 2 additions & 0 deletions tika-handlers/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
This package is intended to hold non-standard handlers. These may have dependencies that some don't want,
or they may have a focus that isn't general enough to warrant adding them to tika-core
48 changes: 48 additions & 0 deletions tika-handlers/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>3.0.0-SNAPSHOT</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

<artifactId>tika-handlers</artifactId>

<name>Apache Tika handlers</name>
<packaging>pom</packaging>

<modules>
<module>tika-handler-boilerpipe</module>
</modules>

<dependencies>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>tika-core</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
<!---
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
Expand All @@ -16,7 +17,24 @@
specific language governing permissions and limitations
under the License.
-->
This module only contains the BoilerPipeContentHandler. The boilerpipe dependency is no
longer maintained and contains clashes with NekoHTML.
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-handlers</artifactId>
<version>3.0.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

In Tika 3.x, we should rename this module to tika-handler-boilerpipe or similar.
<artifactId>tika-handler-boilerpipe</artifactId>

<dependencies>
<dependency>
<groupId>de.l3s.boilerpipe</groupId>
<artifactId>boilerpipe</artifactId>
<version>1.1.0</version>
</dependency>
</dependencies>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@
</dependency>
</dependencies>
<modules>
<module>tika-parser-html-commons</module>
<module>tika-parser-jdbc-commons</module>
<module>tika-parser-digest-commons</module>
<module>tika-parser-mail-commons</module>
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>tika-parser-html-commons</artifactId>
<artifactId>tika-handler-boilerpipe</artifactId>
<version>${project.version}</version>
<scope>test</scope>
</dependency>
Expand Down
2 changes: 1 addition & 1 deletion tika-server/tika-server-core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>tika-parser-html-commons</artifactId>
<artifactId>tika-handler-boilerpipe</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
Expand Down
6 changes: 3 additions & 3 deletions tika-server/tika-server-standard/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parser-html-commons</artifactId>
<groupId>${project.groupId}</groupId>
<artifactId>tika-handler-boilerpipe</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
Expand Down Expand Up @@ -128,7 +128,7 @@
<exclude>org.apache.tika:tika-parsers-standard-package:jar:</exclude>
<exclude>org.apache.tika:tika-serialization:jar:</exclude>
<exclude>org.apache.tika:tika-langdetect-optimaize:jar:</exclude>
<exclude>org.apache.tika:tika-parser-html-commons:jar:</exclude>
<exclude>org.apache.tika:tika-handler-boilerpipe:jar:</exclude>
<exclude>org.apache.tika:tika-parser-digest-commons:jar:</exclude>
<exclude>org.apache.tika:tika-parser-zip-commons:jar:</exclude>
<exclude>commons-codec:commons-codec:jar:</exclude>
Expand Down

0 comments on commit e04c478

Please sign in to comment.