Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLP samples #752

Merged
merged 9 commits into from
Jul 18, 2017
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions dlp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Cloud Data Loss Prevention (DLP) API Samples
The [Data Loss Prevention API](https://cloud.google.com/dlp/docs/) provides programmatic access to
a powerful detection engine for personally identifiable information and other privacy-sensitive data
in unstructured data streams.

## Setup
- A Google Cloud project with billing enabled
- [Enable](https://console.cloud.google.com/launcher/details/google/dlp.googleapis.com) the DLP API.
- (Local testing)[Create a service account](https://cloud.google.com/docs/authentication/getting-started)
and set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable pointing to the downloaded credentials file.

## Build
This project uses the [Assembly Plugin](https://maven.apache.org/plugins/maven-assembly-plugin/usage.html) to build an uber jar.
Run:
```
mvn clean package
```

## Retrieve InfoTypes
An [InfoType identifier](https://cloud.google.com/dlp/docs/infotypes-categories) represents an element of sensitive data.

[Info types](https://cloud.google.com/dlp/docs/infotypes-reference#global) are updated periodically. Use the API to retrieve the most current
info types for a given category. eg. HEALTH or GOVERNMENT.
```
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Metadata -category GOVERNMENT
```

## Retrieve Categories
[Categories](https://cloud.google.com/dlp/docs/infotypes-categories) provide a way to easily access a group of related InfoTypes.
```
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Metadata
```

## Inspect data for sensitive elements
Inspect strings, files locally and on Google Cloud Storage and Cloud Datastore kinds with the DLP API.

Note: image scanning is not currently supported on Google Cloud Storage.
For more information, refer to the [API documentation](https://cloud.google.com/dlp/docs).
Optional flags are explained in [this resource](https://cloud.google.com/dlp/docs/reference/rest/v2beta1/content/inspect#InspectConfig).
```
Commands:
-s <string> Inspect a string using the Data Loss Prevention API.
-f <filepath> Inspects a local text, PNG, or JPEG file using the Data Loss Prevention API.
-gcs -bucketName <bucketName> -fileName <fileName> Inspects a text file stored on Google Cloud Storage using the Data Loss
Prevention API.
-ds -projectId [projectId] -namespace [namespace] - kind <kind> Inspect a Datastore instance using the Data Loss Prevention API.

Options:
--help Show help
-minLikelihood [string] [choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"]
[default: "LIKELIHOOD_UNSPECIFIED"]
specifies the minimum reporting likelihood threshold.
-f, --maxFindings [number] [default: 0]
maximum number of results to retrieve
-q, --includeQuote [boolean] [default: true] include matching string in results
-t, --infoTypes restrict to limited set of infoTypes [ default: []]
[ eg. PHONE_NUMBER US_PASSPORT]
```
### Examples
- Inspect a string:
```
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -s "My phone number is (123) 456-7890 and my email address is [email protected]"
```
- Inspect a local file (text / image):
```
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f resources/test.txt
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -f resources/test.png
```
- Inspect a file on Google Cloud Storage:
```
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -gcs -bucketName my-bucket -fileName my-file.txt
```
- Inspect a Google Cloud Datastore kind:
```
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Inspect -ds -kind my-kind
```

## Automatic redaction of sensitive data
[Automatic redaction](https://cloud.google.com/dlp/docs/classification-redaction) produces an output with sensitive data matches removed.

```
Commands:
-s <string> Source input string
-r <replacement string> String to replace detected info types
Options:
--help Show help
-minLikelihood choices: "LIKELIHOOD_UNSPECIFIED", "VERY_UNLIKELY", "UNLIKELY", "POSSIBLE", "LIKELY", "VERY_LIKELY"]
[default: "LIKELIHOOD_UNSPECIFIED"]
specifies the minimum reporting likelihood threshold.

-infoTypes restrict operation to limited set of info types [ default: []]
[ eg. PHONE_NUMBER US_PASSPORT]
```

### Example
- Replace sensitive data in text with `_REDACTED_`:
```
java -cp target/dlp-samples-1.0-jar-with-dependencies.jar com.example.dlp.Redact -s "My phone number is (123) 456-7890 and my email address is [email protected]" -r "_REDACTED_"
```

## Integration tests
### Setup
- [Create a Google Cloud Storage bucket](https://console.cloud.google.com/storage) and upload [test.txt](src/test/resources/test.txt).
- [Create a Google Cloud Datastore](https://console.cloud.google.com/datastore) kind and add an entity with properties:
- `property1` : [email protected]
- `property2` : 343-343-3435
- Ensure the following environment variables are set:
- `GOOGLE_APPLICATION_CREDENTIALS` points to authorized service account credentials file.
- `DLP_BUCKET_ID` points to Google Cloud Storage bucket that contains the sample text document.
- `DLP_DATASTORE_KIND` points to a Datastore kind under default project.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these are still required?


## Run
Run all tests:
```
mvn clean verify
```

101 changes: 101 additions & 0 deletions dlp/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright 2017 Google Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- [START pom] -->
<project>
<modelVersion>4.0.0</modelVersion>
<packaging>jar</packaging>
<groupId>com.example</groupId>
<artifactId>dlp-samples</artifactId>
<version>1.0</version>

<!-- Parent defines config for testing & linting. -->
<parent>
<artifactId>doc-samples</artifactId>
<groupId>com.google.cloud</groupId>
<version>1.0.0</version>
<relativePath>..</relativePath>
</parent>

<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<google.auth.version>0.7.0</google.auth.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<!-- Temporary workaround for known issue : https://github.com/GoogleCloudPlatform/google-cloud-java/issues/2192 -->
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.auth</groupId>
<artifactId>google-auth-library-credentials</artifactId>
<version>${google.auth.version}</version>
</dependency>
<dependency>
<groupId>com.google.auth</groupId>
<artifactId>google-auth-library-oauth2-http</artifactId>
<version>${google.auth.version}</version>
</dependency>
</dependencies>
</dependencyManagement>
<!--- End of workaround -->

<dependencies>
<!-- [START dlp_maven] -->
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-dlp</artifactId>
<version>0.20.2-alpha</version>
</dependency>
<!-- [END dlp_maven] -->
<dependency>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
<version>1.4</version>
</dependency>
<!-- Test dependencies -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
</dependencies>
<!-- Build jar with dependencies for testing -->
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id> <!-- this is used for inheritance merges -->
<phase>package</phase> <!-- bind to the packaging phase -->
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
<!-- [END pom] -->
Loading