Skip to content

Commit

Permalink
Merge pull request #2 from apache/master
Browse files Browse the repository at this point in the history
merge master in
  • Loading branch information
fengjian428 authored Jan 24, 2022
2 parents eb14274 + cfde45b commit c281194
Show file tree
Hide file tree
Showing 812 changed files with 39,621 additions and 8,315 deletions.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/SUPPORT_REQUEST.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ labels: question

**_Tips before filing an issue_**

- Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)?
- Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?

- Join the mailing list to engage in conversations and get faster support at [email protected].

Expand Down
8 changes: 8 additions & 0 deletions .github/workflows/bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,16 @@ jobs:
include:
- scala: "scala-2.11"
spark: "spark2"
- scala: "scala-2.11"
spark: "spark2,spark-shade-unbundle-avro"
- scala: "scala-2.12"
spark: "spark3.1.x"
- scala: "scala-2.12"
spark: "spark3.1.x,spark-shade-unbundle-avro"
- scala: "scala-2.12"
spark: "spark3"
- scala: "scala-2.12"
spark: "spark3,spark-shade-unbundle-avro"
steps:
- uses: actions/checkout@v2
- name: Set up JDK 8
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ local.properties
# IntelliJ specific files/directories #
#######################################
.out
.idea
.idea/*
!.idea/vcs.xml
*.ipr
*.iws
*.iml
Expand Down
36 changes: 36 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -159,3 +159,9 @@ its NOTICE file:
This product includes software developed at
StreamSets (http://www.streamsets.com/).

--------------------------------------------------------------------------------

This product includes code from hilbert-curve project
* Copyright https://github.com/davidmoten/hilbert-curve
* Licensed under the Apache-2.0 License

10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Prerequisites for building Apache Hudi:
* Unix-like system (like Linux, Mac OS X)
* Java 8 (Java 9 or 10 may work)
* Git
* Maven
* Maven (>=3.3.1)

```
# Checkout code and build
Expand All @@ -78,12 +78,16 @@ The default Scala version supported is 2.11. To build for Scala 2.12 version, bu
mvn clean package -DskipTests -Dscala-2.12
```

### Build with Spark 3.0.0
### Build with Spark 3

The default Spark version supported is 2.4.4. To build for Spark 3.0.0 version, build using `spark3` profile
The default Spark version supported is 2.4.4. To build for different Spark 3 versions, use the corresponding profile

```
# Build against Spark 3.2.0 (the default build shipped with the public Spark 3 bundle)
mvn clean package -DskipTests -Dspark3
# Build against Spark 3.1.2
mvn clean package -DskipTests -Dspark3.1.x
```

### Build without spark-avro module
Expand Down
43 changes: 27 additions & 16 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,14 @@ variables:
MAVEN_OPTS: '-Dmaven.repo.local=$(MAVEN_CACHE_FOLDER) -Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true'
SPARK_VERSION: '2.4.4'
HADOOP_VERSION: '2.7'
SPARK_HOME: $(Pipeline.Workspace)/spark-$(SPARK_VERSION)-bin-hadoop$(HADOOP_VERSION)
SPARK_ARCHIVE: spark-$(SPARK_VERSION)-bin-hadoop$(HADOOP_VERSION)

stages:
- stage: test
jobs:
- job: UT_FT_1
displayName: UT FT common & flink & UT client/spark-client
timeoutInMinutes: '90'
steps:
- task: Cache@2
displayName: set cache
Expand All @@ -47,7 +48,7 @@ stages:
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
options: -DskipTests
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
Expand All @@ -71,6 +72,7 @@ stages:
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
- job: UT_FT_2
displayName: FT client/spark-client
timeoutInMinutes: '90'
steps:
- task: Cache@2
displayName: set cache
Expand All @@ -85,7 +87,7 @@ stages:
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
options: -DskipTests
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
Expand All @@ -99,7 +101,8 @@ stages:
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
- job: UT_FT_3
displayName: UT FT cli & utilities & sync/hive-sync
displayName: UT FT clients & cli & utilities & sync/hive-sync
timeoutInMinutes: '90'
steps:
- task: Cache@2
displayName: set cache
Expand All @@ -114,30 +117,31 @@ stages:
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
options: -DskipTests
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
- task: Maven@3
displayName: UT cli & utilities & sync/hive-sync
displayName: UT clients & cli & utilities & sync/hive-sync
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Punit-tests -pl hudi-cli,hudi-utilities,hudi-sync/hudi-hive-sync
options: -Punit-tests -pl hudi-client/hudi-client-common,hudi-client/hudi-flink-client,hudi-client/hudi-java-client,hudi-cli,hudi-utilities,hudi-sync/hudi-hive-sync
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
- task: Maven@3
displayName: FT cli & utilities & sync/hive-sync
displayName: FT clients & cli & utilities & sync/hive-sync
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Pfunctional-tests -pl hudi-cli,hudi-utilities,hudi-sync/hudi-hive-sync
options: -Pfunctional-tests -pl hudi-client/hudi-client-common,hudi-client/hudi-flink-client,hudi-client/hudi-java-client,hudi-cli,hudi-utilities,hudi-sync/hudi-hive-sync
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
- job: UT_FT_4
displayName: UT FT other modules
timeoutInMinutes: '90'
steps:
- task: Cache@2
displayName: set cache
Expand All @@ -152,7 +156,7 @@ stages:
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
options: -DskipTests
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
Expand All @@ -161,7 +165,7 @@ stages:
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Punit-tests -pl !hudi-common,!hudi-flink,!hudi-client/hudi-spark-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync
options: -Punit-tests -pl !hudi-common,!hudi-flink,!hudi-client/hudi-spark-client,!hudi-client/hudi-client-common,!hudi-client/hudi-flink-client,!hudi-client/hudi-java-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
Expand All @@ -170,16 +174,23 @@ stages:
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Pfunctional-tests -pl !hudi-common,!hudi-flink,!hudi-client/hudi-spark-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync
options: -Pfunctional-tests -pl !hudi-common,!hudi-flink,!hudi-client/hudi-spark-client,!hudi-client/hudi-client-common,!hudi-client/hudi-flink-client,!hudi-client/hudi-java-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
- job: IT
steps:
- task: AzureCLI@2
displayName: Prepare for IT
inputs:
azureSubscription: apachehudici-service-connection
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
echo 'Downloading $(SPARK_ARCHIVE)'
az storage blob download -c ci-caches -n $(SPARK_ARCHIVE).tgz -f $(Pipeline.Workspace)/$(SPARK_ARCHIVE).tgz --account-name apachehudici
tar -xvf $(Pipeline.Workspace)/$(SPARK_ARCHIVE).tgz -C $(Pipeline.Workspace)/
mkdir /tmp/spark-events/
- script: |
echo 'Downloading spark-$(SPARK_VERSION)-bin-hadoop$(HADOOP_VERSION)'
wget https://archive.apache.org/dist/spark/spark-$(SPARK_VERSION)/spark-$(SPARK_VERSION)-bin-hadoop$(HADOOP_VERSION).tgz -O $(Pipeline.Workspace)/spark-$(SPARK_VERSION).tgz
tar -xvf $(Pipeline.Workspace)/spark-$(SPARK_VERSION).tgz -C $(Pipeline.Workspace)/
mkdir /tmp/spark-events/
mvn $(MAVEN_OPTS) -Pintegration-tests verify
displayName: IT
26 changes: 26 additions & 0 deletions conf/hudi-defaults.conf.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Default system properties included when running Hudi jobs.
# This is useful for setting default environmental settings.

# Example:
# hoodie.datasource.hive_sync.jdbcurl jdbc:hive2://localhost:10000
# hoodie.datasource.hive_sync.use_jdbc true
# hoodie.datasource.hive_sync.support_timestamp false
# hoodie.index.type BLOOM
# hoodie.metadata.enable false
5 changes: 5 additions & 0 deletions doap_HUDI.rdf
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,11 @@
<created>2021-08-26</created>
<revision>0.9.0</revision>
</Version>
<Version>
<name>Apache Hudi 0.10.0</name>
<created>2021-12-08</created>
<revision>0.10.0</revision>
</Version>
</release>
<repository>
<GitRepository>
Expand Down
93 changes: 93 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
<!--
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
-->

# Docker Demo for Hudi

This repo contains the docker demo resources for building docker demo images, set up the demo, and running Hudi in the
docker demo environment.

## Repo Organization

### Configs for assembling docker images - `/hoodie`

The `/hoodie` folder contains all the configs for assembling necessary docker images. The name and repository of each
docker image, e.g., `apachehudi/hudi-hadoop_2.8.4-trinobase_368`, is defined in the maven configuration file `pom.xml`.

### Docker compose config for the Demo - `/compose`

The `/compose` folder contains the yaml file to compose the Docker environment for running Hudi Demo.

### Resources and Sample Data for the Demo - `/demo`

The `/demo` folder contains useful resources and sample data use for the Demo.

## Build and Test Image locally

To build all docker images locally, you can run the script:

```shell
./build_local_docker_images.sh
```

To build a single image target, you can run

```shell
mvn clean pre-integration-test -DskipTests -Ddocker.compose.skip=true -Ddocker.build.skip=false -pl :<image_target> -am
# For example, to build hudi-hadoop-trinobase-docker
mvn clean pre-integration-test -DskipTests -Ddocker.compose.skip=true -Ddocker.build.skip=false -pl :hudi-hadoop-trinobase-docker -am
```

Alternatively, you can use `docker` cli directly under `hoodie/hadoop`. Note that, you need to manually name your local
image by using `-t` option to match the naming in the `pom.xml`, so that you can update the corresponding image
repository in Docker Hub (detailed steps in the next section).

```shell
# Run under hoodie/hadoop, the <tag> is optional, "latest" by default
docker build <image_folder_name> -t <hub-user>/<repo-name>[:<tag>]
# For example, to build trinobase
docker build trinobase -t apachehudi/hudi-hadoop_2.8.4-trinobase_368
```

After new images are built, you can run the following script to bring up docker demo with your local images:

```shell
./setup_demo.sh dev
```

## Upload Updated Image to Repository on Docker Hub

Once you have built the updated image locally, you can push the corresponding this repository of the image to the Docker
Hud registry designated by its name or tag:

```shell
docker push <hub-user>/<repo-name>:<tag>
# For example
docker push apachehudi/hudi-hadoop_2.8.4-trinobase_368
```

You can also easily push the image to the Docker Hub using Docker Desktop app: go to `Images`, search for the image by
the name, and then click on the three dots and `Push to Hub`.

![Push to Docker Hub](push_to_docker_hub.png)

Note that you need to ask for permission to upload the Hudi Docker Demo images to the repositories.

You can find more information on [Docker Hub Repositories Manual](https://docs.docker.com/docker-hub/repos/).

## Docker Demo Setup

Please refer to the [Docker Demo Docs page](https://hudi.apache.org/docs/docker_demo).
Loading

0 comments on commit c281194

Please sign in to comment.