Skip to content

SpringerPE/cf-rclone-buildpack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cf-rclone-buildpack

Cloudfoundry buildpack to manage buckets S3, GCS ... based on rclone

Functionalities of this buildpack:

  • Automatically configure Rclone from AWS and GCP service brokers services
  • Provide a web interface to explore the contents of the buckets
  • Enable serving of remote objects via HTTP
  • Clone data from one bucket to another keeping it sync periodically
  • Use a Rclone server with a HTTP API

Using it

Example manifest.yml:

---
applications:
- name: rclone
  memory: 512M
  instances: 1
  stack: cflinuxfs3
  random-route: true
  buildpacks:
  - https://github.com/SpringerPE/cf-rclone-buildpack.git
  services:
  - jose-rclone-gcs
  - jose-rclone-aws
  env:
    AUTH_USER: "admin"
    AUTH_PASSWORD: "admin"
    CLONE_SOURCE_SERVICE: "jose-rclone-aws"
    CLONE_DESTINATION_SERVICE: "jose-rclone-gcs"
    CLONE_MODE: sync
    CLONE_TIMER: 600

With this configuration, the program will run rclone sync to synchronize data from the bucket jose-rclone-aws to jose-rclone-gcs every 10 minutes. As each service offers only one bucket, you do not need to know the bucket name.

Environment variables

The web service always requires authentication. If AUTH_USER is not defined, it defaults to admin and AUTH_PASSWORD will be autogenerated and printed in stdout (you can see it with cf logs) and stored in /home/vcap/auth/${AUTH_USER}.password

GCS_PROJECT_NUMBER is predefined, but if you have your own project in GCP you will need to redefine it

CLONE_SOURCE_SERVICE and CLONE_DESTINATION_SERVICE should match the name of the services bound to the application and both need to be set in order to run the clone operation.

RCLONE_MODE is one option of:

  • copy (default): copies data from one bucket to another, just adding files to the new bucket. It does not delete files in source neither in destination buckets. See rclone copy
  • sync: synchronizes data from source to destination, making both identical, modifying destination only. Destination is updated to match source, including deleting files if necessary. See rclone sync
  • move: Moves the contents of the source bucket to the destination bucket. Source contents will deleted as soon as they are copied to destination, rclone move

Be careful with CLONE_MODE=sync or CLONE_MODE=move, those are destructive options

CLONE_TIMER specifies amount of seconds to wait to re-run the clone operation, by default is 0, so the clone process will not run periodically, just once after the program starts. The process will wait after the previous run has finished, it is not queuing jobs, so if the clone process takes one hour, the next run will be in 10 minutes (see previous manifest).

Extra rclone parameters can be defined via environment variables. See https://rclone.org/docs/#environment-variables, but be aware that the automatic CLONE process uses the rclone API so most likely those environment variables will be ignored.

This buildpack does not allow more than one instance, deploying more than one, will cause the extra intances will fail.

What if ...

... my service(bucket) is not defined/available in the current platform

Just copy the environment variable VCAP_SERVICES from the other CF platform and create a file called VCAP_SERVICES in the root of the application with the contents of the variable. When start, the buildpack will merge the contents of the file with the environment variable and setup the rclone configuration.

... my bucket is not provided by CF service brokers, there is no VCAP_SERVICES variable

Create a rclone configuration file rclone.conf with the parameters of the bucket, something like:

# S3 example, please fill the access key and key id
[s3-service]
type = s3
provider = AWS
access_key_id = <S3-KEY-ID>
secret_access_key = <S3-ACCESS-KEY>
region = eu-central-1
location_constraint = eu-central-1
acl = private
env_auth = false

# GCS Example. Please put the `auth.json` file in the app folder
[gcs-service]
type = google cloud storage
client_id =
client_secret =
project_number =
service_account_file = /home/vcap/app/auth.json
storage_class = REGIONAL
location = europe-west4

Note that the bucket names are not defined in this configuration, you have to define them in the environment variables CLONE_SOURCE_BUCKET or CLONE_DESTINATION_BUCKET and set the variable CLONE_SOURCE_SERVICE or CLONE_DESTINATION_SERVICE to the name of the entry between brackets (s3-service or gcs-service -no brackets- in this example).

... I need something else, other actions or more

Just create a file post-start.sh, like this:

#!/bin/bash
# $RCLONE is defined env variable, just use it to execute commands

# Example command
$RCLONE rc core/version

# Sync these 2 buckets
$RCLONE -vv rc sync/sync srcFs=s3-service:bucket1 dstFs=gcs-service:bucket2

# alternative way to do it (aysnc == true)
$RCLONE rc sync/sync  --json '{ "srcFs": "s3-service:bucket1", "dstFs": "gcs-service:bucket2", "_async": true }'

The variables CLONE_SOURCE_BUCKET and CLONE_DESTINATION_BUCKET are automatically defined if the counterparts SERVICE variables are provided.

If a post-start.sh file is found, no automatic clone operation will be performed. You can define all kind of logic in this file, sync or async operations, it does not matter, the file will be executed in background automatically at startup.

Remote objects via HTTP

Just open in a browser https://rclone.example.app/[SERVICE_NAME:]BUCKET_NAME/ changing SERVICE_NAME and BUCKET_NAME to the correct values.

Or using curl:

# Note the square brackets are escaped with \
# curl -u admin:password 'https://rclone.example.app/\[SERVICE_NAME:\]BUCKET_NAME/'
curl -u admin:password 'https://rclone.example.app/\[s3-service:\]bucket1/'

Use rclone as server

You can define a lot of buckets and use rclone API to trigger actions to those buckets (also retrieve them using HTTP). All calls must made using POST.

https://rclone.org/rc/#accessing-the-remote-control-via-http

curl -u admin:password -H "Content-Type: application/json" -X POST -d '{"potato":2,"sausage":1}' http://rclone.example.com/rc/noop

Real world example, perform a sync between 2 buckets:

curl -u admin:password -H "Content-Type: application/json" -X POST -d '{ "srcFs": "s3-service:bucket1", "dstFs": "gcs-service:bucket2", "_async": true }'  http://rclone.example.com/sync/sync

Known issues

All issues found are regarding the new web ui. It is quite new piece of software (Aug 2019) and currently is in alpha.

  • When log in, you have to introduce twice the auth settings. The second time, in the program interface, click first on Verify and then Login.

  • It allows you to visualize the contents of the buckets, see current operations and view/delete objects. In order to see the contents of a bucket, go to Explorer and type <name-of-service>:<name-of-bucket> and click Open (yep, you need to now the name of the bucket!).

  • Graph does not get refreshed after the transation is done.

Development

Buildpack implemented using bash scripts to make it easy to understand and change.

https://docs.cloudfoundry.org/buildpacks/understand-buildpacks.html

The builpack uses the deps and cache folders according the implementation purposes, so, the first time the buildpack is used it will download all resources, next times it will use the cached resources.

Author

(c) 2019 Jose Riguera Lopez [email protected] Springernature Engineering Enablement

MIT License

About

A buildpack to provide rclone

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages