Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DLP samples (BigQuery, DeID, RiskAnalysis) #474

Merged
merged 10 commits into from
Oct 18, 2017
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions dlp/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
**/*.result.png
67 changes: 65 additions & 2 deletions dlp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ The [Data Loss Prevention API](https://cloud.google.com/dlp/docs/) provides prog
* [Inspect](#inspect)
* [Redact](#redact)
* [Metadata](#metadata)
* [DeID](#deid)
* [Risk Analysis](#risk-analysis)
* [Running the tests](#running-the-tests)

## Setup
Expand Down Expand Up @@ -47,6 +49,7 @@ Commands:
Prevention API and the promise pattern.
gcsFileEvent <bucketName> <fileName> Inspects a text file stored on Google Cloud Storage using the Data Loss
Prevention API and the event-handler pattern.
bigquery <datasetName> <tableName> Inspects a BigQuery table using the Data Loss Prevention API.
datastore <kind> Inspect a Datastore instance using the Data Loss Prevention API.

Options:
Expand All @@ -56,14 +59,15 @@ Options:
[default: "LIKELIHOOD_UNSPECIFIED"]
-f, --maxFindings [number] [default: 0]
-q, --includeQuote [boolean] [default: true]
-l, --languageCode [string] [default: "en-US"]
-t, --infoTypes [array] [default: []]
-t, --infoTypes [array] [default: ["PHONE_NUMBER","EMAIL_ADDRESS","CREDIT_CARD_NUMBER"]]

Examples:
node inspect.js string "My phone number is (123) 456-7890 and my email address is [email protected]"
node inspect.js file resources/test.txt
node inspect.js gcsFilePromise my-bucket my-file.txt
node inspect.js gcsFileEvent my-bucket my-file.txt
node inspect.js bigquery my-dataset my-table
node inspect.js datastore my-datastore-kind

For more information, see https://cloud.google.com/dlp/docs. Optional flags are explained at
https://cloud.google.com/dlp/docs/reference/rest/v2beta1/content/inspect#InspectConfig
Expand All @@ -81,6 +85,7 @@ __Usage:__ `node redact.js --help`
```
Commands:
string <string> <replaceString> Redact sensitive data from a string using the Data Loss Prevention API.
image <filepath> <outputPath> Redact sensitive data from an image using the Data Loss Prevention API.

Options:
--help Show help [boolean]
Expand All @@ -91,6 +96,7 @@ Options:

Examples:
node redact.js string "My name is Gary" "REDACTED" -t US_MALE_NAME
node redact.js image resources/test.png redaction_result.png -t US_MALE_NAME

For more information, see https://cloud.google.com/dlp/docs. Optional flags are explained at
https://cloud.google.com/dlp/docs/reference/rest/v2beta1/content/inspect#InspectConfig
Expand Down Expand Up @@ -124,6 +130,63 @@ For more information, see https://cloud.google.com/dlp/docs
[metadata_2_docs]: https://cloud.google.com/dlp/docs
[metadata_2_code]: metadata.js

### DeID

View the [documentation][deid_3_docs] or the [source code][deid_3_code].

__Usage:__ `node deid.js --help`

```
Commands:
mask <string> Deidentify sensitive data by masking it with a character.
fpe <string> <wrappedKey> <keyName> Deidentify sensitive data using Format Preserving Encryption (FPE).

Options:
--help Show help [boolean]

Examples:
node deid.js mask "My SSN is 372819127"
node deid.js fpe "My SSN is 372819127" <YOUR_ENCRYPTED_AES_256_KEY> <YOUR_KEY_NAME>

For more information, see https://cloud.google.com/dlp/docs.
```

[deid_3_docs]: https://cloud.google.com/dlp/docs
[deid_3_code]: deid.js

### Risk Analysis

View the [documentation][risk_4_docs] or the [source code][risk_4_code].

__Usage:__ `node risk.js --help`

```
Commands:
numerical <datasetId> <tableId> <columnName> Computes risk metrics of a column of numbers in a Google
BigQuery table.
categorical <datasetId> <tableId> <columnName> Computes risk metrics of a column of data in a Google
BigQuery table.
kAnonymity <datasetId> <tableId> [quasiIdColumnNames..] Computes the k-anonymity of a column set in a Google
BigQuery table.
lDiversity <datasetId> <tableId> <sensitiveAttribute> Computes the l-diversity of a column set in a Google
[quasiIdColumnNames..] BigQuery table.

Options:
--help Show help [boolean]
-p, --projectId [string] [default: "nodejs-docs-samples"]

Examples:
node risk.js numerical nhtsa_traffic_fatalities accident_2015 state_number -p bigquery-public-data
node risk.js categorical nhtsa_traffic_fatalities accident_2015 state_name -p bigquery-public-data
node risk.js kAnonymity nhtsa_traffic_fatalities accident_2015 state_number county -p bigquery-public-data
node risk.js lDiversity nhtsa_traffic_fatalities accident_2015 city state_number county -p bigquery-public-data

For more information, see https://cloud.google.com/dlp/docs.
```

[risk_4_docs]: https://cloud.google.com/dlp/docs
[risk_4_code]: risk.js

## Running the tests

1. Set the **GCLOUD_PROJECT** and **GOOGLE_APPLICATION_CREDENTIALS** environment variables.
Expand Down
164 changes: 164 additions & 0 deletions dlp/deid.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
/**
* Copyright 2017, Google, Inc.
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

'use strict';

function deidentifyWithMask (string, maskingCharacter, numberToMask) {
// [START deidentify_masking]
// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The string to deidentify
// const string = 'My SSN is 372819127';

// (Optional) The maximum number of sensitive characters to mask in a match
// If omitted from the request or set to 0, the API will mask any matching characters
// const numberToMask = 5;

// (Optional) The character to mask matching sensitive data with
// If omitted from the request, the API will use '-' for strings and 'N' for digits
// const maskingCharacter = 'x';

// Construct deidentification request
const items = [{ type: 'text/plain', value: string }];
const request = {
deidentifyConfig: {
infoTypeTransformations: {
transformations: [{
primitiveTransformation: {
characterMaskConfig: {
maskingCharacter: maskingCharacter,
numberToMask: numberToMask
}
}
}]
}
},
items: items
};

// Run deidentification request
dlp.deidentifyContent(request)
.then((response) => {
const deidentifiedItems = response[0].items;
console.log(deidentifiedItems[0].value);
})
.catch((err) => {
console.log(`Error in deidentifyWithMask: ${err.message || err}`);
});
// [END deidentify_masking]
}

function deidentifyWithFpe (string, alphabet, keyName, wrappedKey) {
// [START deidentify_fpe]
// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The string to deidentify
// const string = 'My SSN is 372819127';

// The set of characters to replace sensitive ones with
// For more information, see https://cloud.google.com/dlp/docs/reference/rest/v2beta1/content/deidentify#FfxCommonNativeAlphabet
// const alphabet = 'ALPHA_NUMERIC';

// The name of the Cloud KMS key used to encrypt ('wrap') the AES-256 key
// const keyName = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME';

// The encrypted ('wrapped') AES-256 key to use
// This key should be encrypted using the Cloud KMS key specified above
// const wrappedKey = 'YOUR_ENCRYPTED_AES_256_KEY'

// Construct deidentification request
const items = [{ type: 'text/plain', value: string }];
const request = {
deidentifyConfig: {
infoTypeTransformations: {
transformations: [{
primitiveTransformation: {
cryptoReplaceFfxFpeConfig: {
cryptoKey: {
kmsWrapped: {
wrappedKey: wrappedKey,
cryptoKeyName: keyName
}
},
commonAlphabet: alphabet
}
}
}]
}
},
items: items
};

// Run deidentification request
dlp.deidentifyContent(request)
.then((response) => {
const deidentifiedItems = response[0].items;
console.log(deidentifiedItems[0].value);
})
.catch((err) => {
console.log(`Error in deidentifyWithFpe: ${err.message || err}`);
});
// [END deidentify_fpe]
}

const cli = require(`yargs`)
.demand(1)
.command(
`mask <string>`,
`Deidentify sensitive data by masking it with a character.`,
{
maskingCharacter: {
type: 'string',
alias: 'c',
default: ''
},
numberToMask: {
type: 'number',
alias: 'n',
default: 0
}
},
(opts) => deidentifyWithMask(opts.string, opts.maskingCharacter, opts.numberToMask)
)
.command(
`fpe <string> <wrappedKey> <keyName>`,
`Deidentify sensitive data using Format Preserving Encryption (FPE).`,
{
alphabet: {
type: 'string',
alias: 'a',
default: 'ALPHA_NUMERIC',
choices: ['NUMERIC', 'HEXADECIMAL', 'UPPER_CASE_ALPHA_NUMERIC', 'ALPHA_NUMERIC']
}
},
(opts) => deidentifyWithFpe(opts.string, opts.alphabet, opts.keyName, opts.wrappedKey)
)
.example(`node $0 mask "My SSN is 372819127"`)
.example(`node $0 fpe "My SSN is 372819127" <YOUR_ENCRYPTED_AES_256_KEY> <YOUR_KEY_NAME>`)
.wrap(120)
.recommendCommands()
.epilogue(`For more information, see https://cloud.google.com/dlp/docs.`);

if (module === require.main) {
cli.help().strict().argv; // eslint-disable-line
}
Loading