Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added R wrappers for released dataset ops #40

Merged
merged 13 commits into from
Jan 7, 2019

Conversation

terrytangyuan
Copy link
Member

@terrytangyuan terrytangyuan commented Dec 26, 2018

This PR is on top of #39. The wrappers were scaffolded using scripts created in #39 and then manually edited for cosmetic purposes as well as additional checks for arguments.

  • Added R wrappers for released dataset ops, including kafka, kinesis, hadoop, and ignite. Note that only wrappers for the released dataset ops are included here - parquet and libsvm support are not included yet.
  • Re-exported useful functions from tfdatasets package so tfio would play nicely with tfdatasets.
  • Removed unnecessary dependencies and fixed R CMD check errors and warnings.

This is part of #6.

@terrytangyuan
Copy link
Member Author

@yongtang Just rebased and resolved conflicts. Could you take a look when you get a chance?

@yongtang
Copy link
Member

yongtang commented Jan 6, 2019

I don't have a lot of experience with R other than some brief knowledge many year ago. I tried locally but had the error:

> dataset <- sequence_file_dataset("/home/ubuntu/io/tensorflow_io/hadoop/python/kernel_tests/testdata/string.seq")
Error in sequence_file_dataset("/home/ubuntu/io/tensorflow_io/hadoop/python/kernel_tests/testdata/string.seq") : 
  attempt to apply non-function

I am wondering if there is a simple readme about the usage?

I think sequence_file_dataset is appropriate as an example, as it is pretty straightforward and only need a file name. (Kafka/Ignite requires additional setup).

@terrytangyuan
Copy link
Member Author

Good call. I’ll add some test cases and instructions (maybe on top of your README PR). I am marking this PR as WIP for now and will ping when ready.

@terrytangyuan terrytangyuan changed the title Added R wrappers for released dataset ops [WIP] Added R wrappers for released dataset ops Jan 6, 2019
@yongtang
Copy link
Member

yongtang commented Jan 6, 2019

Thanks @terrytangyuan. The repo have a simple sequence file:
https://github.com/tensorflow/io/tree/master/tensorflow_io/hadoop/python/kernel_tests/testdata

The sample sequence file consists of key-value pairs:

The file is generated with `org.apache.hadoop.io.Text` for key/value.
There are 25 records in the file with the format of:
key = XXX
value = VALUEXXX
where XXX is replaced as the line number (starts with 001).

Let me know if I can be of any help. I am interested in playing with R as well 😃

@terrytangyuan terrytangyuan changed the title [WIP] Added R wrappers for released dataset ops Added R wrappers for released dataset ops Jan 7, 2019
@terrytangyuan
Copy link
Member Author

@yongtang Thanks. I fixed the error you encountered and added instruction to README.md as an alternative to the instruction you added in the other PR. Also added a very simple test case (we can add more strict ones later).

I haven't used tensorflow/tensorflow:custom-op image too often yet but the r-base image is used more often in the R community and solves some issues with certain R package dependencies that we don't have to deal with. We can investigate some time to make tensorflow/tensorflow:custom-op work too later on.

@yongtang
Copy link
Member

yongtang commented Jan 7, 2019

@terrytangyuan I tried locally and it works. Thanks! We could add some additional test cases later.

Copy link
Member

@yongtang yongtang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yongtang yongtang merged commit 8fa6568 into tensorflow:master Jan 7, 2019
@terrytangyuan terrytangyuan deleted the r-wrappers-impl branch January 7, 2019 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants