Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tispark Publish 1.0.1 #407

Merged
merged 17 commits into from
Aug 24, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 28 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,29 @@
## What is TiSpark?
# TiSpark
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.pingcap.tispark/tispark-core/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.pingcap.tispark/tispark-core)
[![Javadocs](http://javadoc.io/badge/com.pingcap.tispark/tispark-core.svg)](http://javadoc.io/doc/com.pingcap.tispark/tispark-core)
[![License](https://img.shields.io/github/license/pingcap/tispark.svg)](https://github.com/pingcap/tispark/blob/master/LICENSE)

TiSpark is a thin layer built for running Apache Spark on top of TiDB/TiKV to answer the complex OLAP queries. It takes advantages of both the Spark platform and the distributed TiKV cluster, at the same time, seamlessly glues to TiDB, the distributed OLTP database, to provide a Hybrid Transactional/Analytical Processing (HTAP) to serve as a one-stop solution for online transactions and analysis.

## Getting TiSpark
The current stable version is 1.0.1.

If you are using maven, add the following to your pom.xml:
```xml
<dependency>
<groupId>com.pingcap.tispark</groupId>
<artifactId>tispark-core</artifactId>
<version>1.0.1</version>
</dependency>
```

If you're using SBT, add the following line to your build file:
```scala
libraryDependencies += "com.pingcap.tispark" % "tispark-core" % "1.0.1"
```

For other build tools, you can visit search.maven.org and search with GroupId [![Maven Search](https://img.shields.io/badge/com.pingcap-tikv/tispark-green.svg)](http://search.maven.org/#search%7Cga%7C1%7Cpingcap)(This search will also list all available modules of TiSpark including tikv-client).

## TiSpark Architecture

![architecture](./docs/architecture.png)
Expand All @@ -17,17 +39,17 @@ TiSpark is a thin layer built for running Apache Spark on top of TiDB/TiKV to an

TiSpark depends on the existence of TiKV clusters and PDs. It also needs to setup and use Spark clustering platform.

A thin layer of TiSpark. Most of the logic is inside tikv-java-client library.
https://github.com/pingcap/tikv-client-lib-java
A thin layer of TiSpark. Most of the logic is inside tikv-client library.
https://github.com/pingcap/tispark/tree/master/tikv-client


Uses as below
## Quick Start
From Spark-shell:
```
./bin/spark-shell --jars /wherever-it-is/tispark-${version}-jar-with-dependencies.jar
```

```

import org.apache.spark.sql.TiContext
val ti = new TiContext(spark)

Expand Down Expand Up @@ -93,6 +115,7 @@ Below configurations can be put together with spark-defaults.conf or passed in t
| spark.tispark.plan.downgrade.index_threshold | 10000 | If index scan ranges on one region exceeds this limit in original request, downgrade this region's request to table scan rather than original planned index scan |
| spark.tispark.type.unsupported_mysql_types | "time,enum,set,year,json" | A comma separated list of mysql types TiSpark does not support currently, refer to `Unsupported MySQL Type List` below |
| spark.tispark.request.timezone.offset | Local Timezone offset | An integer, represents timezone offset to UTC time(like 28800, GMT+8), this value will be added to requests issued to TiKV |
| spark.tispark.show_rowid | Show implicit row Id | If to show implicit row Id if exists |

## Unsupported MySQL Type List

Expand Down
49 changes: 40 additions & 9 deletions core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
<properties>
<scalatest.version>3.0.4</scalatest.version>
<scalaj.version>2.3.0</scalaj.version>
<mysql.connector.version>5.1.18</mysql.connector.version>
<mysql.connector.version>5.1.44</mysql.connector.version>
<play.version>2.6.8</play.version>
</properties>
<dependencies>
Expand Down Expand Up @@ -104,6 +104,12 @@
<goal>testCompile</goal>
</goals>
</execution>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>doc-jar</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
Expand Down Expand Up @@ -173,11 +179,10 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.1.2</version>
<version>3.0.1</version>
<executions>
<execution>
<id>attach-sources</id>
<phase>deploy</phase>
<goals>
<goal>jar-no-fork</goal>
</goals>
Expand All @@ -194,13 +199,38 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.8</version>
<version>2.9.1</version>
<configuration>
<charset>UTF-8</charset>
<encoding>UTF-8</encoding>
<docencoding>UTF-8</docencoding>
<locale>zh_CN</locale>
<skip>${javadoc.skip}</skip>
</configuration>
<executions>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>jar</goal>
</goals>
<configuration> <!-- add this to disable checking -->
<additionalparam>-Xdoclint:none</additionalparam>
</configuration>
</execution>
</executions>
</plugin>
<!--GPG Signed Components-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-gpg-plugin</artifactId>
<version>1.5</version>
<configuration>
<skip>${gpg.skip}</skip>
</configuration>
<executions>
<execution>
<id>sign-artifacts</id>
<goals>
<goal>sign</goal>
</goals>
</execution>
</executions>
</plugin>
<!-- Assembly Plug-in -->
<plugin>
Expand Down Expand Up @@ -239,7 +269,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.7</version>
<version>2.22.0</version>
<configuration>
<skipTests>true</skipTests>
</configuration>
Expand All @@ -252,6 +282,7 @@
<reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
<junitxml>.</junitxml>
<filereports>WDF TestSuite.txt</filereports>
<argLine>-Dfile.encoding=UTF-8</argLine>
</configuration>
<executions>
<execution>
Expand Down
2 changes: 1 addition & 1 deletion core/scripts/tispark-sql
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
#

TISPARK_JAR=tispark-1.0.1-jar-with-dependencies.jar
TISPARK_JAR=tispark-core-1.0.1-jar-with-dependencies.jar

if [ -z "${SPARK_HOME}" ]; then
source "$(dirname "$0")"/find-spark-home
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,5 @@ object TiConfigConst {
val UNSUPPORTED_TYPES: String = "spark.tispark.type.unsupported_mysql_types"
val ENABLE_AUTO_LOAD_STATISTICS: String = "spark.tispark.statistics.auto_load"
val CACHE_EXPIRE_AFTER_ACCESS: String = "spark.tispark.statistics.expire_after_access"
val SHOW_ROWID: String = "spark.tispark.show_rowid"
}
5 changes: 5 additions & 0 deletions core/src/main/scala/com/pingcap/tispark/TiUtils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ object TiUtils {
case _: EnumType => sql.types.LongType
case _: SetType => sql.types.LongType
case _: YearType => sql.types.LongType
case _: JsonType => sql.types.StringType
}
}

Expand Down Expand Up @@ -214,6 +215,10 @@ object TiUtils {
val priority = CommandPri.valueOf(conf.get(TiConfigConst.REQUEST_COMMAND_PRIORITY))
tiConf.setCommandPriority(priority)
}

if (conf.contains(TiConfigConst.SHOW_ROWID)) {
tiConf.setShowRowId(conf.get(TiConfigConst.SHOW_ROWID).toBoolean)
}
tiConf
}

Expand Down
18 changes: 14 additions & 4 deletions core/src/main/scala/org/apache/spark/sql/TiContext.scala
Original file line number Diff line number Diff line change
Expand Up @@ -129,15 +129,25 @@ class TiContext(val session: SparkSession) extends Serializable with Logging {
}
}

// tidbMapTable does not do any check any meta information
// it just register table for later use
def tidbMapTable(dbName: String, tableName: String): Unit = {
def getDataFrame(dbName: String, tableName: String): DataFrame = {
val tiRelation = new TiDBRelation(
tiSession,
new TiTableReference(dbName, tableName),
meta
)(sqlContext)
sqlContext.baseRelationToDataFrame(tiRelation).createTempView(tableName)
sqlContext.baseRelationToDataFrame(tiRelation)
}

// tidbMapTable does not do any check any meta information
// it just register table for later use
def tidbMapTable(dbName: String, tableName: String): DataFrame = {
val df = getDataFrame(dbName, tableName)
df.createOrReplaceTempView(tableName)
df
}

def tidbMapDatabase(dbName: String, dbNameAsPrefix: Boolean): Unit = {
tidbMapDatabase(dbName, dbNameAsPrefix, autoLoad)
}

def tidbMapDatabase(dbName: String,
Expand Down
2 changes: 1 addition & 1 deletion core/src/main/scala/org/apache/spark/sql/TiStrategy.scala
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ class TiStrategy(context: SQLContext) extends Strategy with Logging {

def typeBlackList: TypeBlacklist = {
val blacklistString =
sqlConf.getConfString(TiConfigConst.UNSUPPORTED_TYPES, "time,enum,set,year,json")
sqlConf.getConfString(TiConfigConst.UNSUPPORTED_TYPES, "time,enum,set,year")
new TypeBlacklist(blacklistString)
}

Expand Down
Loading