Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support for update/delete #6

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

scwf
Copy link
Contributor

@scwf scwf commented Aug 18, 2015

todos:

  • test
  • update/delete optimize, just insert the update column

@scwf scwf changed the title [WIP] draft support for update/delete [WIP] Support for update/delete Aug 18, 2015
@yzhou2001
Copy link
Member

Thanks for the PR. It's a very nice implementation. A few comments:

  1. For DELETE, it seems that there is no guarantee that all row keys are returned from child query plan. You need to add all row key columns in the child's projection if not already present to guarantee it.
  2. closeHTable should not be called inside HBaseRelation.deleteFromHBase: the HTable handle is a cached entity and managed by HBaseRelation.
  3. HBaseRelation.insert now assumes "overwrite" to be always false, which actually should be the opposite. This is a legacy issue and you could leave it as is and defer to a future fix, if you wish.
  4. We need somewhat comprehensive test coverage of these two important features.
  5. Please also update the design doc about the two features, including the "revision history" to add yourself as a updater.

@yzhou2001
Copy link
Member

Regarding the call to closeHTable, I think you can leave it as is now to achieve a sure flush. Thanks.

@@ -304,3 +304,6 @@ case class BulkLoadIntoTableCommand(
override def output = Nil
}




Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i should revert changes of this file

@scwf
Copy link
Contributor Author

scwf commented Aug 20, 2015

i have tested this patch in my own cluster, and the functionality is ok.

@@ -19,13 +19,14 @@ package org.apache.spark.sql.hbase
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.{HBaseConfiguration, _}
import org.apache.hadoop.hbase.client.{Get, HTable, Put, Result, Scan}
import org.apache.hadoop.hbase.client._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not use wildcard for import. follow the coding convention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants