diff --git a/samples/snippets/README.md b/samples/snippets/README.md index d0ba5691..5689d7c2 100644 --- a/samples/snippets/README.md +++ b/samples/snippets/README.md @@ -10,17 +10,6 @@ This directory contains Python examples that use the - [api](api) has a simple command line tool that shows off the API's features. -- [movie_nl](movie_nl) combines sentiment and entity analysis to come up with -actors/directors who are the most and least popular in the imdb movie reviews. - -- [ocr_nl](ocr_nl) uses the [Cloud Vision API](https://cloud.google.com/vision/) -to extract text from images, then uses the NL API to extract entity information -from those texts, and stores the extracted information in a database in support -of further analysis and correlation. - - [sentiment](sentiment) contains the [Sentiment Analysis Tutorial](https://cloud.google.com/natural-language/docs/sentiment-tutorial) code as used within the documentation. - -- [syntax_triples](syntax_triples) uses syntax analysis to find -subject-verb-object triples in a given piece of text. diff --git a/samples/snippets/cloud-client/v1beta2/README.rst b/samples/snippets/cloud-client/v1beta2/README.rst deleted file mode 100644 index 03400319..00000000 --- a/samples/snippets/cloud-client/v1beta2/README.rst +++ /dev/null @@ -1,151 +0,0 @@ -.. This file is automatically generated. Do not edit this file directly. - -Google Cloud Natural Language API Python Samples -=============================================================================== - -.. image:: https://gstatic.com/cloudssh/images/open-btn.png - :target: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=language/cloud-client/v1beta2/README.rst - - -This directory contains samples for Google Cloud Natural Language API. The `Google Cloud Natural Language API`_ provides natural language understanding technologies to developers, including sentiment analysis, entity recognition, and syntax analysis. This API is part of the larger Cloud Machine Learning API. - -- See the `migration guide`_ for information about migrating to Python client library v0.26.1. - -.. _migration guide: https://cloud.google.com/natural-language/docs/python-client-migration - - - - -.. _Google Cloud Natural Language API: https://cloud.google.com/natural-language/docs/ - -Setup -------------------------------------------------------------------------------- - - -Authentication -++++++++++++++ - -This sample requires you to have authentication setup. Refer to the -`Authentication Getting Started Guide`_ for instructions on setting up -credentials for applications. - -.. _Authentication Getting Started Guide: - https://cloud.google.com/docs/authentication/getting-started - -Install Dependencies -++++++++++++++++++++ - -#. Clone python-docs-samples and change directory to the sample directory you want to use. - - .. code-block:: bash - - $ git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git - -#. Install `pip`_ and `virtualenv`_ if you do not already have them. You may want to refer to the `Python Development Environment Setup Guide`_ for Google Cloud Platform for instructions. - - .. _Python Development Environment Setup Guide: - https://cloud.google.com/python/setup - -#. Create a virtualenv. Samples are compatible with Python 2.7 and 3.4+. - - .. code-block:: bash - - $ virtualenv env - $ source env/bin/activate - -#. Install the dependencies needed to run the samples. - - .. code-block:: bash - - $ pip install -r requirements.txt - -.. _pip: https://pip.pypa.io/ -.. _virtualenv: https://virtualenv.pypa.io/ - -Samples -------------------------------------------------------------------------------- - -Quickstart -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - -.. image:: https://gstatic.com/cloudssh/images/open-btn.png - :target: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=language/cloud-client/v1beta2/quickstart.py,language/cloud-client/v1beta2/README.rst - - - - -To run this sample: - -.. code-block:: bash - - $ python quickstart.py - - -Snippets -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - -.. image:: https://gstatic.com/cloudssh/images/open-btn.png - :target: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=language/cloud-client/v1beta2/snippets.py,language/cloud-client/v1beta2/README.rst - - - - -To run this sample: - -.. code-block:: bash - - $ python snippets.py - - usage: snippets.py [-h] - {classify-text,classify-file,sentiment-entities-text,sentiment-entities-file,sentiment-text,sentiment-file,entities-text,entities-file,syntax-text,syntax-file} - ... - - This application demonstrates how to perform basic operations with the - Google Cloud Natural Language API - - For more information, the documentation at - https://cloud.google.com/natural-language/docs. - - positional arguments: - {classify-text,classify-file,sentiment-entities-text,sentiment-entities-file,sentiment-text,sentiment-file,entities-text,entities-file,syntax-text,syntax-file} - classify-text Classifies content categories of the provided text. - classify-file Classifies content categories of the text in a Google - Cloud Storage file. - sentiment-entities-text - Detects entity sentiment in the provided text. - sentiment-entities-file - Detects entity sentiment in a Google Cloud Storage - file. - sentiment-text Detects sentiment in the text. - sentiment-file Detects sentiment in the file located in Google Cloud - Storage. - entities-text Detects entities in the text. - entities-file Detects entities in the file located in Google Cloud - Storage. - syntax-text Detects syntax in the text. - syntax-file Detects syntax in the file located in Google Cloud - Storage. - - optional arguments: - -h, --help show this help message and exit - - - - - -The client library -------------------------------------------------------------------------------- - -This sample uses the `Google Cloud Client Library for Python`_. -You can read the documentation for more details on API usage and use GitHub -to `browse the source`_ and `report issues`_. - -.. _Google Cloud Client Library for Python: - https://googlecloudplatform.github.io/google-cloud-python/ -.. _browse the source: - https://github.com/GoogleCloudPlatform/google-cloud-python -.. _report issues: - https://github.com/GoogleCloudPlatform/google-cloud-python/issues - - -.. _Google Cloud SDK: https://cloud.google.com/sdk/ \ No newline at end of file diff --git a/samples/snippets/cloud-client/v1beta2/README.rst.in b/samples/snippets/cloud-client/v1beta2/README.rst.in deleted file mode 100644 index d1166745..00000000 --- a/samples/snippets/cloud-client/v1beta2/README.rst.in +++ /dev/null @@ -1,32 +0,0 @@ -# This file is used to generate README.rst - -product: - name: Google Cloud Natural Language API - short_name: Cloud Natural Language API - url: https://cloud.google.com/natural-language/docs/ - description: > - The `Google Cloud Natural Language API`_ provides natural language - understanding technologies to developers, including sentiment analysis, - entity recognition, and syntax analysis. This API is part of the larger - Cloud Machine Learning API. - - - - See the `migration guide`_ for information about migrating to Python client library v0.26.1. - - - .. _migration guide: https://cloud.google.com/natural-language/docs/python-client-migration - -setup: -- auth -- install_deps - -samples: -- name: Quickstart - file: quickstart.py -- name: Snippets - file: snippets.py - show_help: true - -cloud_client_library: true - -folder: language/cloud-client/v1beta2 \ No newline at end of file diff --git a/samples/snippets/cloud-client/v1beta2/quickstart.py b/samples/snippets/cloud-client/v1beta2/quickstart.py deleted file mode 100644 index b19d11b7..00000000 --- a/samples/snippets/cloud-client/v1beta2/quickstart.py +++ /dev/null @@ -1,43 +0,0 @@ -#!/usr/bin/env python - -# Copyright 2017 Google Inc. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -def run_quickstart(): - # [START language_quickstart] - # Imports the Google Cloud client library - from google.cloud import language_v1beta2 - from google.cloud.language_v1beta2 import enums - from google.cloud.language_v1beta2 import types - - # Instantiates a client with the v1beta2 version - client = language_v1beta2.LanguageServiceClient() - - # The text to analyze - text = u'Hallo Welt!' - document = types.Document( - content=text, - type=enums.Document.Type.PLAIN_TEXT, - language='de') - # Detects the sentiment of the text - sentiment = client.analyze_sentiment(document).document_sentiment - - print('Text: {}'.format(text)) - print('Sentiment: {}, {}'.format(sentiment.score, sentiment.magnitude)) - # [END language_quickstart] - - -if __name__ == '__main__': - run_quickstart() diff --git a/samples/snippets/cloud-client/v1beta2/quickstart_test.py b/samples/snippets/cloud-client/v1beta2/quickstart_test.py deleted file mode 100644 index 839faae2..00000000 --- a/samples/snippets/cloud-client/v1beta2/quickstart_test.py +++ /dev/null @@ -1,22 +0,0 @@ -# Copyright 2017 Google Inc. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -import quickstart - - -def test_quickstart(capsys): - quickstart.run_quickstart() - out, _ = capsys.readouterr() - assert 'Sentiment' in out diff --git a/samples/snippets/cloud-client/v1beta2/requirements.txt b/samples/snippets/cloud-client/v1beta2/requirements.txt deleted file mode 100644 index 2cbc37eb..00000000 --- a/samples/snippets/cloud-client/v1beta2/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -google-cloud-language==1.0.2 diff --git a/samples/snippets/cloud-client/v1beta2/resources/android_text.txt b/samples/snippets/cloud-client/v1beta2/resources/android_text.txt deleted file mode 100644 index c05c452d..00000000 --- a/samples/snippets/cloud-client/v1beta2/resources/android_text.txt +++ /dev/null @@ -1 +0,0 @@ -Android is a mobile operating system developed by Google, based on the Linux kernel and designed primarily for touchscreen mobile devices such as smartphones and tablets. diff --git a/samples/snippets/cloud-client/v1beta2/resources/text.txt b/samples/snippets/cloud-client/v1beta2/resources/text.txt deleted file mode 100644 index 97a1cea0..00000000 --- a/samples/snippets/cloud-client/v1beta2/resources/text.txt +++ /dev/null @@ -1 +0,0 @@ -President Obama is speaking at the White House. \ No newline at end of file diff --git a/samples/snippets/cloud-client/v1beta2/snippets.py b/samples/snippets/cloud-client/v1beta2/snippets.py deleted file mode 100644 index abf16ada..00000000 --- a/samples/snippets/cloud-client/v1beta2/snippets.py +++ /dev/null @@ -1,346 +0,0 @@ -#!/usr/bin/env python - -# Copyright 2016 Google, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""This application demonstrates how to perform basic operations with the -Google Cloud Natural Language API - -For more information, the documentation at -https://cloud.google.com/natural-language/docs. -""" - -import argparse -import sys - -# [START beta_import] -from google.cloud import language_v1beta2 -from google.cloud.language_v1beta2 import enums -from google.cloud.language_v1beta2 import types -# [END beta_import] -import six - - -def sentiment_text(text): - """Detects sentiment in the text.""" - client = language_v1beta2.LanguageServiceClient() - - if isinstance(text, six.binary_type): - text = text.decode('utf-8') - - # Instantiates a plain text document. - document = types.Document( - content=text, - type=enums.Document.Type.PLAIN_TEXT) - - # Detects sentiment in the document. You can also analyze HTML with: - # document.type == enums.Document.Type.HTML - sentiment = client.analyze_sentiment(document).document_sentiment - - print('Score: {}'.format(sentiment.score)) - print('Magnitude: {}'.format(sentiment.magnitude)) - - -def sentiment_file(gcs_uri): - """Detects sentiment in the file located in Google Cloud Storage.""" - client = language_v1beta2.LanguageServiceClient() - - # Instantiates a plain text document. - document = types.Document( - gcs_content_uri=gcs_uri, - type=enums.Document.Type.PLAIN_TEXT) - - # Detects sentiment in the document. You can also analyze HTML with: - # document.type == enums.Document.Type.HTML - sentiment = client.analyze_sentiment(document).document_sentiment - - print('Score: {}'.format(sentiment.score)) - print('Magnitude: {}'.format(sentiment.magnitude)) - - -def entities_text(text): - """Detects entities in the text.""" - client = language_v1beta2.LanguageServiceClient() - - if isinstance(text, six.binary_type): - text = text.decode('utf-8') - - # Instantiates a plain text document. - document = types.Document( - content=text, - type=enums.Document.Type.PLAIN_TEXT) - - # Detects entities in the document. You can also analyze HTML with: - # document.type == enums.Document.Type.HTML - entities = client.analyze_entities(document).entities - - # entity types from enums.Entity.Type - entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION', - 'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER') - - for entity in entities: - print('=' * 20) - print(u'{:<16}: {}'.format('name', entity.name)) - print(u'{:<16}: {}'.format('type', entity_type[entity.type])) - print(u'{:<16}: {}'.format('metadata', entity.metadata)) - print(u'{:<16}: {}'.format('salience', entity.salience)) - print(u'{:<16}: {}'.format('wikipedia_url', - entity.metadata.get('wikipedia_url', '-'))) - - -def entities_file(gcs_uri): - """Detects entities in the file located in Google Cloud Storage.""" - client = language_v1beta2.LanguageServiceClient() - - # Instantiates a plain text document. - document = types.Document( - gcs_content_uri=gcs_uri, - type=enums.Document.Type.PLAIN_TEXT) - - # Detects sentiment in the document. You can also analyze HTML with: - # document.type == enums.Document.Type.HTML - entities = client.analyze_entities(document).entities - - # entity types from enums.Entity.Type - entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION', - 'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER') - - for entity in entities: - print('=' * 20) - print(u'{:<16}: {}'.format('name', entity.name)) - print(u'{:<16}: {}'.format('type', entity_type[entity.type])) - print(u'{:<16}: {}'.format('metadata', entity.metadata)) - print(u'{:<16}: {}'.format('salience', entity.salience)) - print(u'{:<16}: {}'.format('wikipedia_url', - entity.metadata.get('wikipedia_url', '-'))) - - -# [START def_entity_sentiment_text] -def entity_sentiment_text(text): - """Detects entity sentiment in the provided text.""" - client = language_v1beta2.LanguageServiceClient() - - if isinstance(text, six.binary_type): - text = text.decode('utf-8') - - document = types.Document( - content=text.encode('utf-8'), - type=enums.Document.Type.PLAIN_TEXT) - - # Detect and send native Python encoding to receive correct word offsets. - encoding = enums.EncodingType.UTF32 - if sys.maxunicode == 65535: - encoding = enums.EncodingType.UTF16 - - result = client.analyze_entity_sentiment(document, encoding) - - for entity in result.entities: - print('Mentions: ') - print(u'Name: "{}"'.format(entity.name)) - for mention in entity.mentions: - print(u' Begin Offset : {}'.format(mention.text.begin_offset)) - print(u' Content : {}'.format(mention.text.content)) - print(u' Magnitude : {}'.format(mention.sentiment.magnitude)) - print(u' Sentiment : {}'.format(mention.sentiment.score)) - print(u' Type : {}'.format(mention.type)) - print(u'Salience: {}'.format(entity.salience)) - print(u'Sentiment: {}\n'.format(entity.sentiment)) -# [END def_entity_sentiment_text] - - -def entity_sentiment_file(gcs_uri): - """Detects entity sentiment in a Google Cloud Storage file.""" - client = language_v1beta2.LanguageServiceClient() - - document = types.Document( - gcs_content_uri=gcs_uri, - type=enums.Document.Type.PLAIN_TEXT) - - # Detect and send native Python encoding to receive correct word offsets. - encoding = enums.EncodingType.UTF32 - if sys.maxunicode == 65535: - encoding = enums.EncodingType.UTF16 - - result = client.analyze_entity_sentiment(document, encoding) - - for entity in result.entities: - print(u'Name: "{}"'.format(entity.name)) - for mention in entity.mentions: - print(u' Begin Offset : {}'.format(mention.text.begin_offset)) - print(u' Content : {}'.format(mention.text.content)) - print(u' Magnitude : {}'.format(mention.sentiment.magnitude)) - print(u' Sentiment : {}'.format(mention.sentiment.score)) - print(u' Type : {}'.format(mention.type)) - print(u'Salience: {}'.format(entity.salience)) - print(u'Sentiment: {}\n'.format(entity.sentiment)) - - -def syntax_text(text): - """Detects syntax in the text.""" - client = language_v1beta2.LanguageServiceClient() - - if isinstance(text, six.binary_type): - text = text.decode('utf-8') - - # Instantiates a plain text document. - document = types.Document( - content=text, - type=enums.Document.Type.PLAIN_TEXT) - - # Detects syntax in the document. You can also analyze HTML with: - # document.type == enums.Document.Type.HTML - tokens = client.analyze_syntax(document).tokens - - # part-of-speech tags from enums.PartOfSpeech.Tag - pos_tag = ('UNKNOWN', 'ADJ', 'ADP', 'ADV', 'CONJ', 'DET', 'NOUN', 'NUM', - 'PRON', 'PRT', 'PUNCT', 'VERB', 'X', 'AFFIX') - - for token in tokens: - print(u'{}: {}'.format(pos_tag[token.part_of_speech.tag], - token.text.content)) - - -def syntax_file(gcs_uri): - """Detects syntax in the file located in Google Cloud Storage.""" - client = language_v1beta2.LanguageServiceClient() - - # Instantiates a plain text document. - document = types.Document( - gcs_content_uri=gcs_uri, - type=enums.Document.Type.PLAIN_TEXT) - - # Detects syntax in the document. You can also analyze HTML with: - # document.type == enums.Document.Type.HTML - tokens = client.analyze_syntax(document).tokens - - # part-of-speech tags from enums.PartOfSpeech.Tag - pos_tag = ('UNKNOWN', 'ADJ', 'ADP', 'ADV', 'CONJ', 'DET', 'NOUN', 'NUM', - 'PRON', 'PRT', 'PUNCT', 'VERB', 'X', 'AFFIX') - - for token in tokens: - print(u'{}: {}'.format(pos_tag[token.part_of_speech.tag], - token.text.content)) - - -# [START def_classify_text] -def classify_text(text): - """Classifies content categories of the provided text.""" - # [START beta_client] - client = language_v1beta2.LanguageServiceClient() - # [END beta_client] - - if isinstance(text, six.binary_type): - text = text.decode('utf-8') - - document = types.Document( - content=text.encode('utf-8'), - type=enums.Document.Type.PLAIN_TEXT) - - categories = client.classify_text(document).categories - - for category in categories: - print(u'=' * 20) - print(u'{:<16}: {}'.format('name', category.name)) - print(u'{:<16}: {}'.format('confidence', category.confidence)) -# [END def_classify_text] - - -# [START def_classify_file] -def classify_file(gcs_uri): - """Classifies content categories of the text in a Google Cloud Storage - file. - """ - client = language_v1beta2.LanguageServiceClient() - - document = types.Document( - gcs_content_uri=gcs_uri, - type=enums.Document.Type.PLAIN_TEXT) - - categories = client.classify_text(document).categories - - for category in categories: - print(u'=' * 20) - print(u'{:<16}: {}'.format('name', category.name)) - print(u'{:<16}: {}'.format('confidence', category.confidence)) -# [END def_classify_file] - - -if __name__ == '__main__': - parser = argparse.ArgumentParser( - description=__doc__, - formatter_class=argparse.RawDescriptionHelpFormatter) - subparsers = parser.add_subparsers(dest='command') - - classify_text_parser = subparsers.add_parser( - 'classify-text', help=classify_text.__doc__) - classify_text_parser.add_argument('text') - - classify_text_parser = subparsers.add_parser( - 'classify-file', help=classify_file.__doc__) - classify_text_parser.add_argument('gcs_uri') - - sentiment_entities_text_parser = subparsers.add_parser( - 'sentiment-entities-text', help=entity_sentiment_text.__doc__) - sentiment_entities_text_parser.add_argument('text') - - sentiment_entities_file_parser = subparsers.add_parser( - 'sentiment-entities-file', help=entity_sentiment_file.__doc__) - sentiment_entities_file_parser.add_argument('gcs_uri') - - sentiment_text_parser = subparsers.add_parser( - 'sentiment-text', help=sentiment_text.__doc__) - sentiment_text_parser.add_argument('text') - - sentiment_file_parser = subparsers.add_parser( - 'sentiment-file', help=sentiment_file.__doc__) - sentiment_file_parser.add_argument('gcs_uri') - - entities_text_parser = subparsers.add_parser( - 'entities-text', help=entities_text.__doc__) - entities_text_parser.add_argument('text') - - entities_file_parser = subparsers.add_parser( - 'entities-file', help=entities_file.__doc__) - entities_file_parser.add_argument('gcs_uri') - - syntax_text_parser = subparsers.add_parser( - 'syntax-text', help=syntax_text.__doc__) - syntax_text_parser.add_argument('text') - - syntax_file_parser = subparsers.add_parser( - 'syntax-file', help=syntax_file.__doc__) - syntax_file_parser.add_argument('gcs_uri') - - args = parser.parse_args() - - if args.command == 'sentiment-text': - sentiment_text(args.text) - elif args.command == 'sentiment-file': - sentiment_file(args.gcs_uri) - elif args.command == 'entities-text': - entities_text(args.text) - elif args.command == 'entities-file': - entities_file(args.gcs_uri) - elif args.command == 'syntax-text': - syntax_text(args.text) - elif args.command == 'syntax-file': - syntax_file(args.gcs_uri) - elif args.command == 'sentiment-entities-text': - entity_sentiment_text(args.text) - elif args.command == 'sentiment-entities-file': - entity_sentiment_file(args.gcs_uri) - elif args.command == 'classify-text': - classify_text(args.text) - elif args.command == 'classify-file': - classify_file(args.gcs_uri) diff --git a/samples/snippets/cloud-client/v1beta2/snippets_test.py b/samples/snippets/cloud-client/v1beta2/snippets_test.py deleted file mode 100644 index 5924ffb4..00000000 --- a/samples/snippets/cloud-client/v1beta2/snippets_test.py +++ /dev/null @@ -1,106 +0,0 @@ -# -*- coding: utf-8 -*- -# Copyright 2017 Google, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os - -import snippets - -BUCKET = os.environ['CLOUD_STORAGE_BUCKET'] -TEST_FILE_URL = 'gs://{}/text.txt'.format(BUCKET) -LONG_TEST_FILE_URL = 'gs://{}/android_text.txt'.format(BUCKET) - - -def test_sentiment_text(capsys): - snippets.sentiment_text('President Obama is speaking at the White House.') - out, _ = capsys.readouterr() - assert 'Score: 0' in out - - -def test_sentiment_utf(capsys): - snippets.sentiment_text( - u'1er site d\'information. Les articles du journal et toute l\'' + - u'actualité en continu : International, France, Société, Economie, ' + - u'Culture, Environnement') - out, _ = capsys.readouterr() - assert 'Score: 0' in out - - -def test_sentiment_file(capsys): - snippets.sentiment_file(TEST_FILE_URL) - out, _ = capsys.readouterr() - assert 'Score: 0' in out - - -def test_entities_text(capsys): - snippets.entities_text('President Obama is speaking at the White House.') - out, _ = capsys.readouterr() - assert 'name' in out - assert ': Obama' in out - - -def test_entities_file(capsys): - snippets.entities_file(TEST_FILE_URL) - out, _ = capsys.readouterr() - assert 'name' in out - assert ': Obama' in out - - -def test_syntax_text(capsys): - snippets.syntax_text('President Obama is speaking at the White House.') - out, _ = capsys.readouterr() - assert 'NOUN: President' in out - - -def test_syntax_file(capsys): - snippets.syntax_file(TEST_FILE_URL) - out, _ = capsys.readouterr() - assert 'NOUN: President' in out - - -def test_sentiment_entities_text(capsys): - snippets.entity_sentiment_text( - 'President Obama is speaking at the White House.') - out, _ = capsys.readouterr() - assert 'Content : White House' in out - - -def test_sentiment_entities_file(capsys): - snippets.entity_sentiment_file(TEST_FILE_URL) - out, _ = capsys.readouterr() - assert 'Content : White House' in out - - -def test_sentiment_entities_utf(capsys): - snippets.entity_sentiment_text( - 'foo→bar') - out, _ = capsys.readouterr() - assert 'Begin Offset : 4' in out - - -def test_classify_text(capsys): - snippets.classify_text( - 'Android is a mobile operating system developed by Google, ' - 'based on the Linux kernel and designed primarily for touchscreen ' - 'mobile devices such as smartphones and tablets.') - out, _ = capsys.readouterr() - assert 'name' in out - assert '/Computers & Electronics' in out - - -def test_classify_file(capsys): - snippets.classify_file(LONG_TEST_FILE_URL) - out, _ = capsys.readouterr() - assert 'name' in out - assert '/Computers & Electronics' in out diff --git a/samples/snippets/movie_nl/README.md b/samples/snippets/movie_nl/README.md deleted file mode 100644 index 95c05dbb..00000000 --- a/samples/snippets/movie_nl/README.md +++ /dev/null @@ -1,157 +0,0 @@ -# Introduction - -[![Open in Cloud Shell][shell_img]][shell_link] - -[shell_img]: http://gstatic.com/cloudssh/images/open-btn.png -[shell_link]: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=language/movie_nl/README.md -This sample is an application of the Google Cloud Platform Natural Language API. -It uses the [imdb movie reviews data set](https://www.cs.cornell.edu/people/pabo/movie-review-data/) -from [Cornell University](http://www.cs.cornell.edu/) and performs sentiment & entity -analysis on it. It combines the capabilities of sentiment analysis and entity recognition -to come up with actors/directors who are the most and least popular. - -### Set Up to Authenticate With Your Project's Credentials - -Please follow the [Set Up Your Project](https://cloud.google.com/natural-language/docs/getting-started#set_up_your_project) -steps in the Quickstart doc to create a project and enable the -Cloud Natural Language API. Following those steps, make sure that you -[Set Up a Service Account](https://cloud.google.com/natural-language/docs/common/auth#set_up_a_service_account), -and export the following environment variable: - -``` -export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your-project-credentials.json -``` - -**Note:** If you get an error saying your API hasn't been enabled, make sure -that you have correctly set this environment variable, and that the project that -you got the service account from has the Natural Language API enabled. - -## How it works -This sample uses the Natural Language API to annotate the input text. The -movie review document is broken into sentences using the `extract_syntax` feature. -Each sentence is sent to the API for sentiment analysis. The positive and negative -sentiment values are combined to come up with a single overall sentiment of the -movie document. - -In addition to the sentiment, the program also extracts the entities of type -`PERSON`, who are the actors in the movie (including the director and anyone -important). These entities are assigned the sentiment value of the document to -come up with the most and least popular actors/directors. - -### Movie document -We define a movie document as a set of reviews. These reviews are individual -sentences and we use the NL API to extract the sentences from the document. See -an example movie document below. - -``` - Sample review sentence 1. Sample review sentence 2. Sample review sentence 3. -``` - -### Sentences and Sentiment -Each sentence from the above document is assigned a sentiment as below. - -``` - Sample review sentence 1 => Sentiment 1 - Sample review sentence 2 => Sentiment 2 - Sample review sentence 3 => Sentiment 3 -``` - -### Sentiment computation -The final sentiment is computed by simply adding the sentence sentiments. - -``` - Total Sentiment = Sentiment 1 + Sentiment 2 + Sentiment 3 -``` - - -### Entity extraction and Sentiment assignment -Entities with type `PERSON` are extracted from the movie document using the NL -API. Since these entities are mentioned in their respective movie document, -they are associated with the document sentiment. - -``` - Document 1 => Sentiment 1 - - Person 1 - Person 2 - Person 3 - - Document 2 => Sentiment 2 - - Person 2 - Person 4 - Person 5 -``` - -Based on the above data we can calculate the sentiment associated with Person 2: - -``` - Person 2 => (Sentiment 1 + Sentiment 2) -``` - -## Movie Data Set -We have used the Cornell Movie Review data as our input. Please follow the instructions below to download and extract the data. - -### Download Instructions - -``` - $ curl -O http://www.cs.cornell.edu/people/pabo/movie-review-data/mix20_rand700_tokens.zip - $ unzip mix20_rand700_tokens.zip -``` - -## Command Line Usage -In order to use the movie analyzer, follow the instructions below. (Note that the `--sample` parameter below runs the script on -fewer documents, and can be omitted to run it on the entire corpus) - -### Install Dependencies - -Install [pip](https://pip.pypa.io/en/stable/installing) if not already installed. - -Then, install dependencies by running the following pip command: - -``` -$ pip install -r requirements.txt -``` -### How to Run - -``` -$ python main.py analyze --inp "tokens/*/*" \ - --sout sentiment.json \ - --eout entity.json \ - --sample 5 -``` - -You should see the log file `movie.log` created. - -## Output Data -The program produces sentiment and entity output in json format. For example: - -### Sentiment Output -``` - { - "doc_id": "cv310_tok-16557.txt", - "sentiment": 3.099, - "label": -1 - } -``` - -### Entity Output - -``` - { - "name": "Sean Patrick Flanery", - "wiki_url": "http://en.wikipedia.org/wiki/Sean_Patrick_Flanery", - "sentiment": 3.099 - } -``` - -### Entity Output Sorting -In order to sort and rank the entities generated, use the same `main.py` script. For example, -this will print the top 5 actors with negative sentiment: - -``` -$ python main.py rank --entity_input entity.json \ - --sentiment neg \ - --reverse True \ - --sample 5 -``` diff --git a/samples/snippets/movie_nl/main.py b/samples/snippets/movie_nl/main.py deleted file mode 100644 index 06be1c9c..00000000 --- a/samples/snippets/movie_nl/main.py +++ /dev/null @@ -1,334 +0,0 @@ -# Copyright 2016 Google, Inc -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import argparse -import codecs -import glob -import json -import logging -import os - -import googleapiclient.discovery -from googleapiclient.errors import HttpError -import requests - - -def analyze_document(service, document): - """Analyze the document and get the distribution of sentiments and - the movie name.""" - logging.info('Analyzing {}'.format(document.doc_id)) - - sentiments, entities = document.extract_sentiment_entities(service) - return sentiments, entities - - -def get_request_body(text, syntax=True, entities=True, sentiment=True): - """Creates the body of the request to the language api in - order to get an appropriate api response.""" - body = { - 'document': { - 'type': 'PLAIN_TEXT', - 'content': text, - }, - 'features': { - 'extract_syntax': syntax, - 'extract_entities': entities, - 'extract_document_sentiment': sentiment, - }, - 'encoding_type': 'UTF32' - } - - return body - - -class Document(object): - """Document class captures a single document of movie reviews.""" - - def __init__(self, text, doc_id, doc_path): - self.text = text - self.doc_id = doc_id - self.doc_path = doc_path - self.sentiment_entity_pair = None - self.label = None - - def extract_sentiment_entities(self, service): - """Extract the sentences in a document.""" - - if self.sentiment_entity_pair is not None: - return self.sentence_entity_pair - - docs = service.documents() - request_body = get_request_body( - self.text, - syntax=False, - entities=True, - sentiment=True) - request = docs.annotateText(body=request_body) - - ent_list = [] - - response = request.execute() - entities = response.get('entities', []) - documentSentiment = response.get('documentSentiment', {}) - - for entity in entities: - ent_type = entity.get('type') - wiki_url = entity.get('metadata', {}).get('wikipedia_url') - - if ent_type == 'PERSON' and wiki_url is not None: - ent_list.append(wiki_url) - - self.sentiment_entity_pair = (documentSentiment, ent_list) - - return self.sentiment_entity_pair - - -def to_sentiment_json(doc_id, sent, label): - """Convert the sentiment info to json. - - Args: - doc_id: Document id - sent: Overall Sentiment for the document - label: Actual label +1, 0, -1 for the document - - Returns: - String json representation of the input - - """ - json_doc = {} - - json_doc['doc_id'] = doc_id - json_doc['sentiment'] = float('%.3f' % sent) - json_doc['label'] = label - - return json.dumps(json_doc) - - -def get_wiki_title(wiki_url): - """Get the wikipedia page title for a given wikipedia URL. - - Args: - wiki_url: Wikipedia URL e.g., http://en.wikipedia.org/wiki/Sean_Connery - - Returns: - Wikipedia canonical name e.g., Sean Connery - - """ - try: - content = requests.get(wiki_url).text - return content.split('title')[1].split('-')[0].split('>')[1].strip() - except KeyError: - return os.path.basename(wiki_url).replace('_', ' ') - - -def to_entity_json(entity, entity_sentiment, entity_frequency): - """Convert entities and their associated sentiment to json. - - Args: - entity: Wikipedia entity name - entity_sentiment: Sentiment associated with the entity - entity_frequency: Frequency of the entity in the corpus - - Returns: - Json string representation of input - - """ - json_doc = {} - - avg_sentiment = float(entity_sentiment) / float(entity_frequency) - - json_doc['wiki_url'] = entity - json_doc['name'] = get_wiki_title(entity) - json_doc['sentiment'] = float('%.3f' % entity_sentiment) - json_doc['avg_sentiment'] = float('%.3f' % avg_sentiment) - - return json.dumps(json_doc) - - -def get_sentiment_entities(service, document): - """Compute the overall sentiment volume in the document. - - Args: - service: Client to Google Natural Language API - document: Movie review document (See Document object) - - Returns: - Tuple of total sentiment and entities found in the document - - """ - - sentiments, entities = analyze_document(service, document) - score = sentiments.get('score') - - return (score, entities) - - -def get_sentiment_label(sentiment): - """Return the sentiment label based on the sentiment quantity.""" - if sentiment < 0: - return -1 - elif sentiment > 0: - return 1 - else: - return 0 - - -def process_movie_reviews(service, reader, sentiment_writer, entity_writer): - """Perform some sentiment math and come up with movie review.""" - collected_entities = {} - - for document in reader: - try: - sentiment_total, entities = get_sentiment_entities( - service, document) - except HttpError as e: - logging.error('Error process_movie_reviews {}'.format(e.content)) - continue - - document.label = get_sentiment_label(sentiment_total) - - sentiment_writer.write( - to_sentiment_json( - document.doc_id, - sentiment_total, - document.label - ) - ) - - sentiment_writer.write('\n') - - for ent in entities: - ent_sent, frequency = collected_entities.get(ent, (0, 0)) - ent_sent += sentiment_total - frequency += 1 - - collected_entities[ent] = (ent_sent, frequency) - - for entity, sentiment_frequency in collected_entities.items(): - entity_writer.write(to_entity_json(entity, sentiment_frequency[0], - sentiment_frequency[1])) - entity_writer.write('\n') - - sentiment_writer.flush() - entity_writer.flush() - - -def document_generator(dir_path_pattern, count=None): - """Generator for the input movie documents. - - Args: - dir_path_pattern: Input dir pattern e.g., "foo/bar/*/*" - count: Number of documents to read else everything if None - - Returns: - Generator which contains Document (See above) - - """ - for running_count, item in enumerate(glob.iglob(dir_path_pattern)): - if count and running_count >= count: - raise StopIteration() - - doc_id = os.path.basename(item) - - with codecs.open(item, encoding='utf-8') as f: - try: - text = f.read() - except UnicodeDecodeError: - continue - - yield Document(text, doc_id, item) - - -def rank_entities(reader, sentiment=None, topn=None, reverse_bool=False): - """Rank the entities (actors) based on their sentiment - assigned from the movie.""" - - items = [] - for item in reader: - json_item = json.loads(item) - sent = json_item.get('sentiment') - entity_item = (sent, json_item) - - if sentiment: - if sentiment == 'pos' and sent > 0: - items.append(entity_item) - elif sentiment == 'neg' and sent < 0: - items.append(entity_item) - else: - items.append(entity_item) - - items.sort(reverse=reverse_bool) - items = [json.dumps(item[1]) for item in items] - - print('\n'.join(items[:topn])) - - -def analyze(input_dir, sentiment_writer, entity_writer, sample, log_file): - """Analyze the document for sentiment and entities""" - - # Create logger settings - logging.basicConfig(filename=log_file, level=logging.DEBUG) - - # Create a Google Service object - service = googleapiclient.discovery.build('language', 'v1') - - reader = document_generator(input_dir, sample) - - # Process the movie documents - process_movie_reviews(service, reader, sentiment_writer, entity_writer) - - -if __name__ == '__main__': - parser = argparse.ArgumentParser( - description=__doc__, - formatter_class=argparse.RawDescriptionHelpFormatter) - - subparsers = parser.add_subparsers(dest='command') - - rank_parser = subparsers.add_parser('rank') - - rank_parser.add_argument( - '--entity_input', help='location of entity input') - rank_parser.add_argument( - '--sentiment', help='filter sentiment as "neg" or "pos"') - rank_parser.add_argument( - '--reverse', help='reverse the order of the items', type=bool, - default=False - ) - rank_parser.add_argument( - '--sample', help='number of top items to process', type=int, - default=None - ) - - analyze_parser = subparsers.add_parser('analyze') - - analyze_parser.add_argument( - '--inp', help='location of the input', required=True) - analyze_parser.add_argument( - '--sout', help='location of the sentiment output', required=True) - analyze_parser.add_argument( - '--eout', help='location of the entity output', required=True) - analyze_parser.add_argument( - '--sample', help='number of top items to process', type=int) - analyze_parser.add_argument('--log_file', default='movie.log') - - args = parser.parse_args() - - if args.command == 'analyze': - with open(args.sout, 'w') as sout, open(args.eout, 'w') as eout: - analyze(args.inp, sout, eout, args.sample, args.log_file) - elif args.command == 'rank': - with open(args.entity_input, 'r') as entity_input: - rank_entities( - entity_input, args.sentiment, args.sample, args.reverse) diff --git a/samples/snippets/movie_nl/main_test.py b/samples/snippets/movie_nl/main_test.py deleted file mode 100644 index 7e33cefd..00000000 --- a/samples/snippets/movie_nl/main_test.py +++ /dev/null @@ -1,130 +0,0 @@ -# Copyright 2016 Google, Inc -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import json - -import googleapiclient.discovery -import six - -import main - - -def test_get_request_body(): - text = 'hello world' - body = main.get_request_body(text, syntax=True, entities=True, - sentiment=False) - assert body.get('document').get('content') == text - - assert body.get('features').get('extract_syntax') is True - assert body.get('features').get('extract_entities') is True - assert body.get('features').get('extract_document_sentiment') is False - - -def test_get_sentiment_label(): - assert main.get_sentiment_label(20.50) == 1 - assert main.get_sentiment_label(-42.34) == -1 - - -def test_to_sentiment_json(): - doc_id = '12345' - sentiment = 23.344564 - label = 1 - - sentiment_json = json.loads( - main.to_sentiment_json(doc_id, sentiment, label) - ) - - assert sentiment_json.get('doc_id') == doc_id - assert sentiment_json.get('sentiment') == 23.345 - assert sentiment_json.get('label') == label - - -def test_process_movie_reviews(): - service = googleapiclient.discovery.build('language', 'v1') - - doc1 = main.Document('Top Gun was awesome and Tom Cruise rocked!', 'doc1', - 'doc1') - doc2 = main.Document('Tom Cruise is a great actor.', 'doc2', 'doc2') - - reader = [doc1, doc2] - swriter = six.StringIO() - ewriter = six.StringIO() - - main.process_movie_reviews(service, reader, swriter, ewriter) - - sentiments = swriter.getvalue().strip().split('\n') - entities = ewriter.getvalue().strip().split('\n') - - sentiments = [json.loads(sentiment) for sentiment in sentiments] - entities = [json.loads(entity) for entity in entities] - - # assert sentiments - assert sentiments[0].get('sentiment') > 0 - assert sentiments[0].get('label') == 1 - - assert sentiments[1].get('sentiment') > 0 - assert sentiments[1].get('label') == 1 - - # assert entities - assert len(entities) == 1 - assert entities[0].get('name') == 'Tom Cruise' - assert (entities[0].get('wiki_url') == - 'https://en.wikipedia.org/wiki/Tom_Cruise') - assert entities[0].get('sentiment') > 0 - - -def test_rank_positive_entities(capsys): - reader = [ - ('{"avg_sentiment": -12.0, ' - '"name": "Patrick Macnee", "sentiment": -12.0}'), - ('{"avg_sentiment": 5.0, ' - '"name": "Paul Rudd", "sentiment": 5.0}'), - ('{"avg_sentiment": -5.0, ' - '"name": "Martha Plimpton", "sentiment": -5.0}'), - ('{"avg_sentiment": 7.0, ' - '"name": "Lucy (2014 film)", "sentiment": 7.0}') - ] - - main.rank_entities(reader, 'pos', topn=1, reverse_bool=False) - out, err = capsys.readouterr() - - expected = ('{"avg_sentiment": 5.0, ' - '"name": "Paul Rudd", "sentiment": 5.0}') - - expected = ''.join(sorted(expected)) - out = ''.join(sorted(out.strip())) - assert out == expected - - -def test_rank_negative_entities(capsys): - reader = [ - ('{"avg_sentiment": -12.0, ' - '"name": "Patrick Macnee", "sentiment": -12.0}'), - ('{"avg_sentiment": 5.0, ' - '"name": "Paul Rudd", "sentiment": 5.0}'), - ('{"avg_sentiment": -5.0, ' - '"name": "Martha Plimpton", "sentiment": -5.0}'), - ('{"avg_sentiment": 7.0, ' - '"name": "Lucy (2014 film)", "sentiment": 7.0}') - ] - - main.rank_entities(reader, 'neg', topn=1, reverse_bool=True) - out, err = capsys.readouterr() - - expected = ('{"avg_sentiment": -5.0, ' - '"name": "Martha Plimpton", "sentiment": -5.0}') - - expected = ''.join(sorted(expected)) - out = ''.join(sorted(out.strip())) - assert out == expected diff --git a/samples/snippets/movie_nl/requirements.txt b/samples/snippets/movie_nl/requirements.txt deleted file mode 100644 index 9718b185..00000000 --- a/samples/snippets/movie_nl/requirements.txt +++ /dev/null @@ -1,4 +0,0 @@ -google-api-python-client==1.7.4 -google-auth==1.5.1 -google-auth-httplib2==0.0.3 -requests==2.19.1 diff --git a/samples/snippets/ocr_nl/README.md b/samples/snippets/ocr_nl/README.md deleted file mode 100644 index a34ff317..00000000 --- a/samples/snippets/ocr_nl/README.md +++ /dev/null @@ -1,232 +0,0 @@ - - -[![Open in Cloud Shell][shell_img]][shell_link] - -[shell_img]: http://gstatic.com/cloudssh/images/open-btn.png -[shell_link]: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=language/ocr_nl/README.md -# Using the Cloud Natural Language API to analyze image text found with Cloud Vision - -This example uses the [Cloud Vision API](https://cloud.google.com/vision/) to -detect text in images, then analyzes that text using the [Cloud NL (Natural -Language) API](https://cloud.google.com/natural-language/) to detect -[entities](https://cloud.google.com/natural-language/docs/basics#entity_analysis) -in the text. It stores the detected entity -information in an [sqlite3](https://www.sqlite.org) database, which may then be -queried. - -(This kind of analysis can be useful with scans of brochures and fliers, -invoices, and other types of company documents... or maybe just organizing your -memes). - -After the example script has analyzed a directory of images, it outputs some -information on the images' entities to STDOUT. You can also further query -the generated sqlite3 database. - -## Setup - -### Install sqlite3 as necessary - -The example requires that sqlite3 be installed. Most likely, sqlite3 is already -installed for you on your machine, but if not, you can find it -[here](https://www.sqlite.org/download.html). - -### Set Up to Authenticate With Your Project's Credentials - -* Please follow the [Set Up Your Project](https://cloud.google.com/natural-language/docs/getting-started#set_up_your_project) -steps in the Quickstart doc to create a project and enable the -Cloud Natural Language API. -* Following those steps, make sure that you [Set Up a Service - Account](https://cloud.google.com/natural-language/docs/common/auth#set_up_a_service_account), - and export the following environment variable: - - ``` - export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your-project-credentials.json - ``` -* This sample also requires that you [enable the Cloud Vision - API](https://console.cloud.google.com/apis/api/vision.googleapis.com/overview?project=_) - -## Running the example - -Install [pip](https://pip.pypa.io/en/stable/installing) if not already installed. - -To run the example, install the necessary libraries using pip: - -```sh -$ pip install -r requirements.txt -``` - -You must also be set up to authenticate with the Cloud APIs using your -project's service account credentials, as described above. - -Then, run the script on a directory of images to do the analysis, E.g.: - -```sh -$ python main.py --input_directory= -``` - -You can try this on a sample directory of images: - -```sh -$ curl -O http://storage.googleapis.com/python-docs-samples-tests/language/ocr_nl-images.zip -$ unzip ocr_nl-images.zip -$ python main.py --input_directory=images/ -``` - -## A walkthrough of the example and its results - -Let's take a look at what the example generates when run on the `images/` -sample directory, and how it does it. - -The script looks at each image file in the given directory, and uses the Vision -API's text detection capabilities (OCR) to find any text in each image. It -passes that info to the NL API, and asks it to detect [entities](xxx) in the -discovered text, then stores this information in a queryable database. - -To keep things simple, we're just passing to the NL API all the text found in a -given image, in one string. Note that sometimes this string can include -misinterpreted characters (if the image text was not very clear), or list words -"out of order" from how a human would interpret them. So, the text that is -actually passed to the NL API might not be quite what you would have predicted -with your human eyeballs. - -The Entity information returned by the NL API includes *type*, *name*, *salience*, -information about where in the text the given entity was found, and detected -language. It may also include *metadata*, including a link to a Wikipedia URL -that the NL API believes this entity maps to. See the -[documentation](https://cloud.google.com/natural-language/docs/) and the [API -reference pages](https://cloud.google.com/natural-language/reference/rest/v1beta1/Entity) -for more information about `Entity` fields. - -For example, if the NL API was given the sentence: - -``` -"Holmes and Watson walked over to the cafe." -``` - -it would return a response something like the following: - -``` -{ - "entities": [{ - "salience": 0.51629782, - "mentions": [{ - "text": { - "content": "Holmes", - "beginOffset": 0 - }}], - "type": "PERSON", - "name": "Holmes", - "metadata": { - "wikipedia_url": "http://en.wikipedia.org/wiki/Sherlock_Holmes" - }}, - { - "salience": 0.22334209, - "mentions": [{ - "text": { - "content": "Watson", - "beginOffset": 11 - }}], - "type": "PERSON", - "name": "Watson", - "metadata": { - "wikipedia_url": "http://en.wikipedia.org/wiki/Dr._Watson" - }}], - "language": "en" -} -``` - -Note that the NL API determined from context that "Holmes" was referring to -'Sherlock Holmes', even though the name "Sherlock" was not included. - -Note also that not all nouns in a given sentence are detected as Entities. An -Entity represents a phrase in the text that is a known entity, such as a person, -an organization, or location. The generic mention of a 'cafe' is not treated as -an entity in this sense. - -For each image file, we store its detected entity information (if any) in an -sqlite3 database. - -### Querying for information about the detected entities - -Once the detected entity information from all the images is stored in the -sqlite3 database, we can run some queries to do some interesting analysis. The -script runs a couple of such example query sets and outputs the result to STDOUT. - -The first set of queries outputs information about the top 15 most frequent -entity names found in the images, and the second outputs information about the -top 15 most frequent Wikipedia URLs found. - -For example, with the sample image set, note that the name 'Sherlock Holmes' is -found three times, but entities associated with the URL -http://en.wikipedia.org/wiki/Sherlock_Holmes are found four times; one of the -entity names was only "Holmes", but the NL API detected from context that it -referred to Sherlock Holmes. Similarly, you can see that mentions of 'Hive' and -'Spark' mapped correctly – given their context – to the URLs of those Apache -products. - -``` -----entity: http://en.wikipedia.org/wiki/Apache_Hive was found with count 1 -Found in file images/IMG_20160621_133020.jpg, detected as type OTHER, with - locale en. -names(s): set([u'hive']) -salience measure(s): set([0.0023808887]) -``` - -Similarly, 'Elizabeth' (in screencaps of text from "Pride and Prejudice") is -correctly mapped to http://en.wikipedia.org/wiki/Elizabeth_Bennet because of the -context of the surrounding text. - -``` -----entity: http://en.wikipedia.org/wiki/Elizabeth_Bennet was found with count 2 -Found in file images/Screenshot 2016-06-19 11.51.50.png, detected as type PERSON, with - locale en. -Found in file images/Screenshot 2016-06-19 12.08.30.png, detected as type PERSON, with - locale en. -names(s): set([u'elizabeth']) -salience measure(s): set([0.34601286, 0.0016268975]) -``` - -## Further queries to the sqlite3 database - -When the script runs, it makes a couple of example queries to the database -containing the entity information returned from the NL API. You can make further -queries on that database by starting up sqlite3 from the command line, and -passing it the name of the database file generated by running the example. This -file will be in the same directory, and have `entities` as a prefix, with the -timestamp appended. (If you have run the example more than once, a new database -file will be created each time). - -Run sqlite3 as follows (using the name of your own database file): - -```sh -$ sqlite3 entities1466518508.db -``` - -You'll see something like this: - -``` -SQLite version 3.8.10.2 2015-05-20 18:17:19 -Enter ".help" for usage hints. -sqlite> -``` - -From this prompt, you can make any queries on the data that you want. E.g., -start with something like: - -``` -sqlite> select * from entities limit 20; -``` - -Or, try this to see in which images the most entities were detected: - -``` -sqlite> select filename, count(filename) from entities group by filename; -``` - -You can do more complex queries to get further information about the entities -that have been discovered in your images. E.g., you might want to investigate -which of the entities are most commonly found together in the same image. See -the [SQLite documentation](https://www.sqlite.org/docs.html) for more -information. - - diff --git a/samples/snippets/ocr_nl/main.py b/samples/snippets/ocr_nl/main.py deleted file mode 100755 index db156054..00000000 --- a/samples/snippets/ocr_nl/main.py +++ /dev/null @@ -1,354 +0,0 @@ -#!/usr/bin/env python -# Copyright 2016 Google Inc. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -""" -This example uses the Google Cloud Vision API to detect text in images, then -analyzes that text using the Google Cloud Natural Language API to detect -entities in the text. It stores the detected entity information in an sqlite3 -database, which may then be queried. - -After this script has analyzed a directory of images, it outputs some -information on the images' entities to STDOUT. You can also further query -the generated sqlite3 database; see the README for more information. - -Run the script on a directory of images to do the analysis, E.g.: - $ python main.py --input_directory= - -You can try this on a sample directory of images: - $ curl -O http://storage.googleapis.com/python-docs-samples-tests/language/ocr_nl-images.zip - $ unzip ocr_nl-images.zip - $ python main.py --input_directory=images/ - -""" # noqa - -import argparse -import base64 -import contextlib -import logging -import os -import sqlite3 -import sys -import time - -import googleapiclient.discovery -import googleapiclient.errors - -BATCH_SIZE = 10 - - -class VisionApi(object): - """Construct and use the Cloud Vision API service.""" - - def __init__(self): - self.service = googleapiclient.discovery.build('vision', 'v1') - - def detect_text(self, input_filenames, num_retries=3, max_results=6): - """Uses the Vision API to detect text in the given file.""" - batch_request = [] - for filename in input_filenames: - request = { - 'image': {}, - 'features': [{ - 'type': 'TEXT_DETECTION', - 'maxResults': max_results, - }] - } - - # Accept both files in cloud storage, as well as local files. - if filename.startswith('gs://'): - request['image']['source'] = { - 'gcsImageUri': filename - } - else: - with open(filename, 'rb') as image_file: - request['image']['content'] = base64.b64encode( - image_file.read()).decode('UTF-8') - - batch_request.append(request) - - request = self.service.images().annotate( - body={'requests': batch_request}) - - try: - responses = request.execute(num_retries=num_retries) - if 'responses' not in responses: - return {} - - text_response = {} - for filename, response in zip( - input_filenames, responses['responses']): - - if 'error' in response: - logging.error('API Error for {}: {}'.format( - filename, - response['error'].get('message', ''))) - continue - - text_response[filename] = response.get('textAnnotations', []) - - return text_response - - except googleapiclient.errors.HttpError as e: - logging.error('Http Error for {}: {}'.format(filename, e)) - except KeyError as e2: - logging.error('Key error: {}'.format(e2)) - - -class TextAnalyzer(object): - """Construct and use the Google Natural Language API service.""" - - def __init__(self, db_filename=None): - self.service = googleapiclient.discovery.build('language', 'v1') - - # This list will store the entity information gleaned from the - # image files. - self.entity_info = [] - - # This is the filename of the sqlite3 database to save to - self.db_filename = db_filename or 'entities{}.db'.format( - int(time.time())) - - def _get_native_encoding_type(self): - """Returns the encoding type that matches Python's native strings.""" - if sys.maxunicode == 65535: - return 'UTF16' - else: - return 'UTF32' - - def nl_detect(self, text): - """Use the Natural Language API to analyze the given text string.""" - # We're only requesting 'entity' information from the Natural Language - # API at this time. - body = { - 'document': { - 'type': 'PLAIN_TEXT', - 'content': text, - }, - 'encodingType': self._get_native_encoding_type(), - } - entities = [] - try: - request = self.service.documents().analyzeEntities(body=body) - response = request.execute() - entities = response['entities'] - except googleapiclient.errors.HttpError as e: - logging.error('Http Error: %s' % e) - except KeyError as e2: - logging.error('Key error: %s' % e2) - return entities - - def add_entities(self, filename, locale, document): - """Apply the Natural Language API to the document, and collect the - detected entities.""" - - # Apply the Natural Language API to the document. - entities = self.nl_detect(document) - self.extract_and_save_entity_info(entities, locale, filename) - - def extract_entity_info(self, entity): - """Extract information about an entity.""" - type = entity['type'] - name = entity['name'].lower() - metadata = entity['metadata'] - salience = entity['salience'] - wiki_url = metadata.get('wikipedia_url', None) - return (type, name, salience, wiki_url) - - def extract_and_save_entity_info(self, entities, locale, filename): - for entity in entities: - type, name, salience, wiki_url = self.extract_entity_info(entity) - # Because this is a small example, we're using a list to hold - # all the entity information, then we'll insert it into the - # database all at once when we've processed all the files. - # For a larger data set, you would want to write to the database - # in batches. - self.entity_info.append( - (locale, type, name, salience, wiki_url, filename)) - - def write_entity_info_to_db(self): - """Store the info gleaned about the entities in the text, via the - Natural Language API, in an sqlite3 database table, and then print out - some simple analytics. - """ - logging.info('Saving entity info to the sqlite3 database.') - # Create the db. - with contextlib.closing(sqlite3.connect(self.db_filename)) as conn: - with conn as cursor: - # Create table - cursor.execute( - 'CREATE TABLE if not exists entities (locale text, ' - 'type text, name text, salience real, wiki_url text, ' - 'filename text)') - with conn as cursor: - # Load all the data - cursor.executemany( - 'INSERT INTO entities VALUES (?,?,?,?,?,?)', - self.entity_info) - - def output_entity_data(self): - """Output some info about the entities by querying the generated - sqlite3 database. - """ - - with contextlib.closing(sqlite3.connect(self.db_filename)) as conn: - - # This query finds the number of times each entity name was - # detected, in descending order by count, and returns information - # about the first 15 names, including the files in which they were - # found, their detected 'salience' and language (locale), and the - # wikipedia urls (if any) associated with them. - print('\n==============\nTop 15 most frequent entity names:') - - cursor = conn.cursor() - results = cursor.execute( - 'select name, count(name) as wc from entities ' - 'group by name order by wc desc limit 15;') - - for item in results: - cursor2 = conn.cursor() - print(u'\n----Name: {} was found with count {}'.format(*item)) - results2 = cursor2.execute( - 'SELECT name, type, filename, locale, wiki_url, salience ' - 'FROM entities WHERE name=?', (item[0],)) - urls = set() - for elt in results2: - print(('Found in file {}, detected as type {}, with\n' - ' locale {} and salience {}.').format( - elt[2], elt[1], elt[3], elt[5])) - if elt[4]: - urls.add(elt[4]) - if urls: - print('url(s): {}'.format(urls)) - - # This query finds the number of times each wikipedia url was - # detected, in descending order by count, and returns information - # about the first 15 urls, including the files in which they were - # found and the names and 'salience' with which they were - # associated. - print('\n==============\nTop 15 most frequent Wikipedia URLs:') - c = conn.cursor() - results = c.execute( - 'select wiki_url, count(wiki_url) as wc from entities ' - 'group by wiki_url order by wc desc limit 15;') - - for item in results: - cursor2 = conn.cursor() - print('\n----entity: {} was found with count {}'.format(*item)) - results2 = cursor2.execute( - 'SELECT name, type, filename, locale, salience ' - 'FROM entities WHERE wiki_url=?', (item[0],)) - names = set() - salience = set() - for elt in results2: - print(('Found in file {}, detected as type {}, with\n' - ' locale {}.').format(elt[2], elt[1], elt[3])) - names.add(elt[0]) - salience.add(elt[4]) - print('names(s): {}'.format(names)) - print('salience measure(s): {}'.format(salience)) - - -def extract_description(texts): - """Returns text annotations as a single string""" - document = [] - - for text in texts: - try: - document.append(text['description']) - locale = text['locale'] - # Process only the first entry, which contains all - # text detected. - break - except KeyError as e: - logging.error('KeyError: %s\n%s' % (e, text)) - return (locale, ' '.join(document)) - - -def extract_descriptions(input_filename, texts, text_analyzer): - """Gets the text that was detected in the image.""" - if texts: - locale, document = extract_description(texts) - text_analyzer.add_entities(input_filename, locale, document) - sys.stdout.write('.') # Output a progress indicator. - sys.stdout.flush() - elif texts == []: - print('%s had no discernible text.' % input_filename) - - -def get_text_from_files(vision, input_filenames, text_analyzer): - """Call the Vision API on a file and index the results.""" - texts = vision.detect_text(input_filenames) - if texts: - for filename, text in texts.items(): - extract_descriptions(filename, text, text_analyzer) - - -def batch(list_to_batch, batch_size=BATCH_SIZE): - """Group a list into batches of size batch_size. - - >>> tuple(batch([1, 2, 3, 4, 5], batch_size=2)) - ((1, 2), (3, 4), (5)) - """ - for i in range(0, len(list_to_batch), batch_size): - yield tuple(list_to_batch[i:i + batch_size]) - - -def main(input_dir, db_filename=None): - """Walk through all the image files in the given directory, extracting any - text from them and feeding that text to the Natural Language API for - analysis. - """ - # Create a client object for the Vision API - vision_api_client = VisionApi() - # Create an object to analyze our text using the Natural Language API - text_analyzer = TextAnalyzer(db_filename) - - if input_dir: - allfileslist = [] - # Recursively construct a list of all the files in the given input - # directory. - for folder, subs, files in os.walk(input_dir): - for filename in files: - allfileslist.append(os.path.join(folder, filename)) - - # Analyze the text in the files using the Vision and Natural Language - # APIs. - for filenames in batch(allfileslist, batch_size=1): - get_text_from_files(vision_api_client, filenames, text_analyzer) - - # Save the result to a database, then run some queries on the database, - # with output to STDOUT. - text_analyzer.write_entity_info_to_db() - - # now, print some information about the entities detected. - text_analyzer.output_entity_data() - - -if __name__ == '__main__': - parser = argparse.ArgumentParser( - description='Detects text in the images in the given directory.') - parser.add_argument( - '--input_directory', - help='The image directory you\'d like to detect text in. If left ' - 'unspecified, the --db specified will be queried without being ' - 'updated.') - parser.add_argument( - '--db', help='The filename to use for the sqlite3 database.') - args = parser.parse_args() - - if not (args.input_directory or args.db): - parser.error('Either --input_directory or --db must be specified.') - - main(args.input_directory, args.db) diff --git a/samples/snippets/ocr_nl/main_test.py b/samples/snippets/ocr_nl/main_test.py deleted file mode 100755 index 5a8f72f2..00000000 --- a/samples/snippets/ocr_nl/main_test.py +++ /dev/null @@ -1,100 +0,0 @@ -#!/usr/bin/env python -# Copyright 2016 Google Inc. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import re -import zipfile - -import requests - -import main - -BUCKET = os.environ['CLOUD_STORAGE_BUCKET'] -TEST_IMAGE_URI = 'gs://{}/language/image8.png'.format(BUCKET) -OCR_IMAGES_URI = 'http://storage.googleapis.com/{}/{}'.format( - BUCKET, 'language/ocr_nl-images-small.zip') - - -def test_batch_empty(): - for batch_size in range(1, 10): - assert len( - list(main.batch([], batch_size=batch_size))) == 0 - - -def test_batch_single(): - for batch_size in range(1, 10): - batched = tuple(main.batch([1], batch_size=batch_size)) - assert batched == ((1,),) - - -def test_single_image_returns_text(): - vision_api_client = main.VisionApi() - - image_path = TEST_IMAGE_URI - texts = vision_api_client.detect_text([image_path]) - - assert image_path in texts - _, document = main.extract_description(texts[image_path]) - assert "daughter" in document - assert "Bennet" in document - assert "hat" in document - - -def test_single_nonimage_returns_error(): - vision_api_client = main.VisionApi() - texts = vision_api_client.detect_text(['README.md']) - assert "README.md" not in texts - - -def test_text_returns_entities(): - text = "Holmes and Watson walked to the cafe." - text_analyzer = main.TextAnalyzer() - entities = text_analyzer.nl_detect(text) - assert entities - etype, ename, salience, wurl = text_analyzer.extract_entity_info( - entities[0]) - assert ename == 'holmes' - - -def test_entities_list(): - vision_api_client = main.VisionApi() - image_path = TEST_IMAGE_URI - texts = vision_api_client.detect_text([image_path]) - locale, document = main.extract_description(texts[image_path]) - text_analyzer = main.TextAnalyzer() - entities = text_analyzer.nl_detect(document) - assert entities - etype, ename, salience, wurl = text_analyzer.extract_entity_info( - entities[0]) - assert ename == 'bennet' - - -def test_main(tmpdir, capsys): - images_path = str(tmpdir.mkdir('images')) - - # First, pull down some test data - response = requests.get(OCR_IMAGES_URI) - images_file = tmpdir.join('images.zip') - images_file.write_binary(response.content) - - # Extract it to the image directory - with zipfile.ZipFile(str(images_file)) as zfile: - zfile.extractall(images_path) - - main.main(images_path, str(tmpdir.join('ocr_nl.db'))) - - stdout, _ = capsys.readouterr() - - assert re.search(r'.* found with count', stdout) diff --git a/samples/snippets/ocr_nl/requirements.txt b/samples/snippets/ocr_nl/requirements.txt deleted file mode 100644 index 5e902918..00000000 --- a/samples/snippets/ocr_nl/requirements.txt +++ /dev/null @@ -1,3 +0,0 @@ -google-api-python-client==1.7.4 -google-auth==1.5.1 -google-auth-httplib2==0.0.3 diff --git a/samples/snippets/syntax_triples/README.md b/samples/snippets/syntax_triples/README.md deleted file mode 100644 index 551057e7..00000000 --- a/samples/snippets/syntax_triples/README.md +++ /dev/null @@ -1,96 +0,0 @@ -# Using the Cloud Natural Language API to find subject-verb-object triples in text - -[![Open in Cloud Shell][shell_img]][shell_link] - -[shell_img]: http://gstatic.com/cloudssh/images/open-btn.png -[shell_link]: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=language/syntax_triples/README.md - -This example finds subject-verb-object triples in a given piece of text using -syntax analysis capabilities of -[Cloud Natural Language API](https://cloud.google.com/natural-language/). -To do this, it calls the extractSyntax feature of the API -and uses the dependency parse tree and part-of-speech tags in the resposne -to build the subject-verb-object triples. The results are printed to STDOUT. -This type of analysis can be considered as the -first step towards an information extraction task. - -## Set Up to Authenticate With Your Project's Credentials - -Please follow the [Set Up Your Project](https://cloud.google.com/natural-language/docs/getting-started#set_up_your_project) -steps in the Quickstart doc to create a project and enable the -Cloud Natural Language API. Following those steps, make sure that you -[Set Up a Service Account](https://cloud.google.com/natural-language/docs/common/auth#set_up_a_service_account), -and export the following environment variable: - -``` -export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your-project-credentials.json -``` - -## Running the example - -Install [pip](https://pip.pypa.io/en/stable/installing) if not already installed. - -To run the example, install the necessary libraries using pip: - -``` -$ pip install -r requirements.txt -``` -You must also be set up to authenticate with the Cloud APIs using your -project's service account credentials, as described above. - -Then, run the script on a file containing the text that you wish to analyze. -The text must be encoded in UTF8 or ASCII: - -``` -$ python main.py -``` - -Try this on a sample text in the resources directory: - -``` -$ python main.py resources/obama_wikipedia.txt -``` - -## A walkthrough of the example and its results - -Let's take a look at what the example generates when run on the -`obama_wikipedia.txt` sample file, and how it does it. - -The goal is to find all subject-verb-object -triples in the text. The example first sends the text to the Cloud Natural -Language API to perform extractSyntax analysis. Then, using part-of-speech tags, - it finds all the verbs in the text. For each verb, it uses the dependency -parse tree information to find all the dependent tokens. - -For example, given the following sentence in the `obama_wikipedia.txt` file: - -``` -"He began his presidential campaign in 2007" -``` -The example finds the verb `began`, and `He`, `campaign`, and `in` as its -dependencies. Then the script enumerates the dependencies for each verb and -finds all the subjects and objects. For the sentence above, the found subject -and object are `He` and `campaign`. - -The next step is to complete each subject and object token by adding their -dependencies to them. For example, in the sentence above, `his` and -`presidential` are dependent tokens for `campaign`. This is done using the -dependency parse tree, similar to verb dependencies as explained above. The -final result is (`He`, `began`, `his presidential campaign`) triple for -the example sentence above. - -The script performs this analysis for the entire text and prints the result. -For the `obama_wikipedia.txt` file, the result is the following: - -```sh -+------------------------------+------------+------------------------------+ -| Obama | received | national attention | -+------------------------------+------------+------------------------------+ -| He | began | his presidential campaign | -+------------------------------+------------+------------------------------+ -| he | won | sufficient delegates in the | -| | | Democratic Party primaries | -+------------------------------+------------+------------------------------+ -| He | defeated | Republican nominee John | -| | | McCain | -``` diff --git a/samples/snippets/syntax_triples/main.py b/samples/snippets/syntax_triples/main.py deleted file mode 100644 index bbe23866..00000000 --- a/samples/snippets/syntax_triples/main.py +++ /dev/null @@ -1,172 +0,0 @@ -#!/usr/bin/env python -# Copyright 2016 Google Inc. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -""" -This example finds subject-verb-object triples in a given piece of text using -the syntax analysis capabilities of Cloud Natural Language API. The triples are -printed to STDOUT. This can be considered as the first step towards an -information extraction task. - -Run the script on a file containing the text that you wish to analyze. -The text must be encoded in UTF8 or ASCII: - $ python main.py - -Try this on a sample text in the resources directory: - $ python main.py resources/obama_wikipedia.txt -""" - -import argparse -import sys -import textwrap - -import googleapiclient.discovery - - -def dependents(tokens, head_index): - """Returns an ordered list of the token indices of the dependents for - the given head.""" - # Create head->dependency index. - head_to_deps = {} - for i, token in enumerate(tokens): - head = token['dependencyEdge']['headTokenIndex'] - if i != head: - head_to_deps.setdefault(head, []).append(i) - return head_to_deps.get(head_index, ()) - - -def phrase_text_for_head(tokens, text, head_index): - """Returns the entire phrase containing the head token - and its dependents. - """ - begin, end = phrase_extent_for_head(tokens, head_index) - return text[begin:end] - - -def phrase_extent_for_head(tokens, head_index): - """Returns the begin and end offsets for the entire phrase - containing the head token and its dependents. - """ - begin = tokens[head_index]['text']['beginOffset'] - end = begin + len(tokens[head_index]['text']['content']) - for child in dependents(tokens, head_index): - child_begin, child_end = phrase_extent_for_head(tokens, child) - begin = min(begin, child_begin) - end = max(end, child_end) - return (begin, end) - - -def analyze_syntax(text): - """Use the NL API to analyze the given text string, and returns the - response from the API. Requests an encodingType that matches - the encoding used natively by Python. Raises an - errors.HTTPError if there is a connection problem. - """ - service = googleapiclient.discovery.build('language', 'v1beta1') - body = { - 'document': { - 'type': 'PLAIN_TEXT', - 'content': text, - }, - 'features': { - 'extract_syntax': True, - }, - 'encodingType': get_native_encoding_type(), - } - request = service.documents().annotateText(body=body) - return request.execute() - - -def get_native_encoding_type(): - """Returns the encoding type that matches Python's native strings.""" - if sys.maxunicode == 65535: - return 'UTF16' - else: - return 'UTF32' - - -def find_triples(tokens, - left_dependency_label='NSUBJ', - head_part_of_speech='VERB', - right_dependency_label='DOBJ'): - """Generator function that searches the given tokens - with the given part of speech tag, that have dependencies - with the given labels. For each such head found, yields a tuple - (left_dependent, head, right_dependent), where each element of the - tuple is an index into the tokens array. - """ - for head, token in enumerate(tokens): - if token['partOfSpeech']['tag'] == head_part_of_speech: - children = dependents(tokens, head) - left_deps = [] - right_deps = [] - for child in children: - child_token = tokens[child] - child_dep_label = child_token['dependencyEdge']['label'] - if child_dep_label == left_dependency_label: - left_deps.append(child) - elif child_dep_label == right_dependency_label: - right_deps.append(child) - for left_dep in left_deps: - for right_dep in right_deps: - yield (left_dep, head, right_dep) - - -def show_triple(tokens, text, triple): - """Prints the given triple (left, head, right). For left and right, - the entire phrase headed by each token is shown. For head, only - the head token itself is shown. - - """ - nsubj, verb, dobj = triple - - # Extract the text for each element of the triple. - nsubj_text = phrase_text_for_head(tokens, text, nsubj) - verb_text = tokens[verb]['text']['content'] - dobj_text = phrase_text_for_head(tokens, text, dobj) - - # Pretty-print the triple. - left = textwrap.wrap(nsubj_text, width=28) - mid = textwrap.wrap(verb_text, width=10) - right = textwrap.wrap(dobj_text, width=28) - print('+' + 30 * '-' + '+' + 12 * '-' + '+' + 30 * '-' + '+') - for l, m, r in zip(left, mid, right): - print('| {:<28s} | {:<10s} | {:<28s} |'.format( - l or '', m or '', r or '')) - - -def main(text_file): - # Extracts subject-verb-object triples from the given text file, - # and print each one. - - # Read the input file. - text = open(text_file, 'rb').read().decode('utf8') - - analysis = analyze_syntax(text) - tokens = analysis.get('tokens', []) - - for triple in find_triples(tokens): - show_triple(tokens, text, triple) - - -if __name__ == '__main__': - parser = argparse.ArgumentParser( - description=__doc__, - formatter_class=argparse.RawDescriptionHelpFormatter) - parser.add_argument( - 'text_file', - help='A file containing the document to process. ' - 'Should be encoded in UTF8 or ASCII') - args = parser.parse_args() - main(args.text_file) diff --git a/samples/snippets/syntax_triples/main_test.py b/samples/snippets/syntax_triples/main_test.py deleted file mode 100755 index 6aa87818..00000000 --- a/samples/snippets/syntax_triples/main_test.py +++ /dev/null @@ -1,53 +0,0 @@ -# Copyright 2016 Google Inc. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import re - -import main - -RESOURCES = os.path.join(os.path.dirname(__file__), 'resources') - - -def test_dependents(): - text = "I am eating a delicious banana" - analysis = main.analyze_syntax(text) - tokens = analysis.get('tokens', []) - assert [0, 1, 5] == main.dependents(tokens, 2) - assert [3, 4] == main.dependents(tokens, 5) - - -def test_phrase_text_for_head(): - text = "A small collection of words" - analysis = main.analyze_syntax(text) - tokens = analysis.get('tokens', []) - assert "words" == main.phrase_text_for_head(tokens, text, 4) - - -def test_find_triples(): - text = "President Obama won the noble prize" - analysis = main.analyze_syntax(text) - tokens = analysis.get('tokens', []) - triples = main.find_triples(tokens) - for triple in triples: - assert (1, 2, 5) == triple - - -def test_obama_example(capsys): - main.main(os.path.join(RESOURCES, 'obama_wikipedia.txt')) - stdout, _ = capsys.readouterr() - lines = stdout.split('\n') - assert re.match( - r'.*Obama\b.*\| received\b.*\| national attention\b', - lines[1]) diff --git a/samples/snippets/syntax_triples/requirements.txt b/samples/snippets/syntax_triples/requirements.txt deleted file mode 100644 index 5e902918..00000000 --- a/samples/snippets/syntax_triples/requirements.txt +++ /dev/null @@ -1,3 +0,0 @@ -google-api-python-client==1.7.4 -google-auth==1.5.1 -google-auth-httplib2==0.0.3 diff --git a/samples/snippets/syntax_triples/resources/obama_wikipedia.txt b/samples/snippets/syntax_triples/resources/obama_wikipedia.txt deleted file mode 100644 index 1e89d4ab..00000000 --- a/samples/snippets/syntax_triples/resources/obama_wikipedia.txt +++ /dev/null @@ -1 +0,0 @@ -In 2004, Obama received national attention during his campaign to represent Illinois in the United States Senate with his victory in the March Democratic Party primary, his keynote address at the Democratic National Convention in July, and his election to the Senate in November. He began his presidential campaign in 2007 and, after a close primary campaign against Hillary Clinton in 2008, he won sufficient delegates in the Democratic Party primaries to receive the presidential nomination. He then defeated Republican nominee John McCain in the general election, and was inaugurated as president on January 20, 2009. Nine months after his inauguration, Obama was named the 2009 Nobel Peace Prize laureate. diff --git a/samples/snippets/tutorial/README.rst b/samples/snippets/tutorial/README.rst deleted file mode 100644 index 3f83c1a2..00000000 --- a/samples/snippets/tutorial/README.rst +++ /dev/null @@ -1,93 +0,0 @@ -.. This file is automatically generated. Do not edit this file directly. - -Google Cloud Natural Language Tutorial Python Samples -=============================================================================== - -.. image:: https://gstatic.com/cloudssh/images/open-btn.png - :target: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=language/tutorial/README.rst - - -This directory contains samples for Google Cloud Natural Language Tutorial. The `Google Cloud Natural Language API`_ provides natural language understanding technologies to developers, including sentiment analysis, entity recognition, and syntax analysis. This API is part of the larger Cloud Machine Learning API. - - - - -.. _Google Cloud Natural Language Tutorial: https://cloud.google.com/natural-language/docs/ - -Setup -------------------------------------------------------------------------------- - - -Authentication -++++++++++++++ - -This sample requires you to have authentication setup. Refer to the -`Authentication Getting Started Guide`_ for instructions on setting up -credentials for applications. - -.. _Authentication Getting Started Guide: - https://cloud.google.com/docs/authentication/getting-started - -Install Dependencies -++++++++++++++++++++ - -#. Clone python-docs-samples and change directory to the sample directory you want to use. - - .. code-block:: bash - - $ git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git - -#. Install `pip`_ and `virtualenv`_ if you do not already have them. You may want to refer to the `Python Development Environment Setup Guide`_ for Google Cloud Platform for instructions. - - .. _Python Development Environment Setup Guide: - https://cloud.google.com/python/setup - -#. Create a virtualenv. Samples are compatible with Python 2.7 and 3.4+. - - .. code-block:: bash - - $ virtualenv env - $ source env/bin/activate - -#. Install the dependencies needed to run the samples. - - .. code-block:: bash - - $ pip install -r requirements.txt - -.. _pip: https://pip.pypa.io/ -.. _virtualenv: https://virtualenv.pypa.io/ - -Samples -------------------------------------------------------------------------------- - -Language tutorial -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - -.. image:: https://gstatic.com/cloudssh/images/open-btn.png - :target: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=language/tutorial/tutorial.py,language/tutorial/README.rst - - - - -To run this sample: - -.. code-block:: bash - - $ python tutorial.py - - usage: tutorial.py [-h] movie_review_filename - - positional arguments: - movie_review_filename - The filename of the movie review you'd like to - analyze. - - optional arguments: - -h, --help show this help message and exit - - - - - -.. _Google Cloud SDK: https://cloud.google.com/sdk/ \ No newline at end of file diff --git a/samples/snippets/tutorial/README.rst.in b/samples/snippets/tutorial/README.rst.in deleted file mode 100644 index 945c701e..00000000 --- a/samples/snippets/tutorial/README.rst.in +++ /dev/null @@ -1,22 +0,0 @@ -# This file is used to generate README.rst - -product: - name: Google Cloud Natural Language Tutorial - short_name: Cloud Natural Language Tutorial - url: https://cloud.google.com/natural-language/docs/ - description: > - The `Google Cloud Natural Language API`_ provides natural language - understanding technologies to developers, including sentiment analysis, - entity recognition, and syntax analysis. This API is part of the larger - Cloud Machine Learning API. - -setup: -- auth -- install_deps - -samples: -- name: Language tutorial - file: tutorial.py - show_help: true - -folder: language/tutorial \ No newline at end of file diff --git a/samples/snippets/tutorial/requirements.txt b/samples/snippets/tutorial/requirements.txt deleted file mode 100644 index 5e902918..00000000 --- a/samples/snippets/tutorial/requirements.txt +++ /dev/null @@ -1,3 +0,0 @@ -google-api-python-client==1.7.4 -google-auth==1.5.1 -google-auth-httplib2==0.0.3 diff --git a/samples/snippets/tutorial/reviews/bladerunner-mixed.txt b/samples/snippets/tutorial/reviews/bladerunner-mixed.txt deleted file mode 100644 index 3b520b65..00000000 --- a/samples/snippets/tutorial/reviews/bladerunner-mixed.txt +++ /dev/null @@ -1,19 +0,0 @@ -I really wanted to love 'Bladerunner' but ultimately I couldn't get -myself to appreciate it fully. However, you may like it if you're into -science fiction, especially if you're interested in the philosophical -exploration of what it means to be human or machine. Some of the gizmos -like the flying cars and the Vouight-Kampff machine (which seemed very -steampunk), were quite cool. - -I did find the plot pretty slow and but the dialogue and action sequences -were good. Unlike most science fiction films, this one was mostly quiet, and -not all that much happened, except during the last 15 minutes. I didn't -understand why a unicorn was in the movie. The visual effects were fantastic, -however, and the musical score and overall mood was quite interesting. -A futurist Los Angeles that was both highly polished and also falling apart -reminded me of 'Outland.' Certainly, the style of the film made up for -many of its pedantic plot holes. - -If you want your sci-fi to be lasers and spaceships, 'Bladerunner' may -disappoint you. But if you want it to make you think, this movie may -be worth the money. \ No newline at end of file diff --git a/samples/snippets/tutorial/reviews/bladerunner-neg.txt b/samples/snippets/tutorial/reviews/bladerunner-neg.txt deleted file mode 100644 index dbef7627..00000000 --- a/samples/snippets/tutorial/reviews/bladerunner-neg.txt +++ /dev/null @@ -1,3 +0,0 @@ -What was Hollywood thinking with this movie! I hated, -hated, hated it. BORING! I went afterwards and demanded my money back. -They refused. \ No newline at end of file diff --git a/samples/snippets/tutorial/reviews/bladerunner-neutral.txt b/samples/snippets/tutorial/reviews/bladerunner-neutral.txt deleted file mode 100644 index 60556e60..00000000 --- a/samples/snippets/tutorial/reviews/bladerunner-neutral.txt +++ /dev/null @@ -1,2 +0,0 @@ -I neither liked nor disliked this movie. Parts were interesting, but -overall I was left wanting more. The acting was pretty good. \ No newline at end of file diff --git a/samples/snippets/tutorial/reviews/bladerunner-pos.txt b/samples/snippets/tutorial/reviews/bladerunner-pos.txt deleted file mode 100644 index a7faf815..00000000 --- a/samples/snippets/tutorial/reviews/bladerunner-pos.txt +++ /dev/null @@ -1,10 +0,0 @@ -`Bladerunner` is often touted as one of the best science fiction films ever -made. Indeed, it satisfies many of the requisites for good sci-fi: a future -world with flying cars and humanoid robots attempting to rebel against their -creators. But more than anything, `Bladerunner` is a fantastic exploration -of the nature of what it means to be human. If we create robots which can -think, will they become human? And if they do, what makes us unique? Indeed, -how can we be sure we're not human in any case? `Bladerunner` explored -these issues before such movies as `The Matrix,' and did so intelligently. -The visual effects and score by Vangelis set the mood. See this movie -in a dark theatre to appreciate it fully. Highly recommended! \ No newline at end of file diff --git a/samples/snippets/tutorial/tutorial.py b/samples/snippets/tutorial/tutorial.py deleted file mode 100644 index 5d14b223..00000000 --- a/samples/snippets/tutorial/tutorial.py +++ /dev/null @@ -1,69 +0,0 @@ -#!/usr/bin/env python - -# Copyright 2016 Google, Inc -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# [START full_tutorial_script] -# [START import_libraries] -import argparse -import io - -import googleapiclient.discovery -# [END import_libraries] - - -def print_sentiment(filename): - """Prints sentiment analysis on a given file contents.""" - # [START authenticating_to_the_api] - service = googleapiclient.discovery.build('language', 'v1') - # [END authenticating_to_the_api] - - # [START constructing_the_request] - with io.open(filename, 'r') as review_file: - review_file_contents = review_file.read() - - service_request = service.documents().analyzeSentiment( - body={ - 'document': { - 'type': 'PLAIN_TEXT', - 'content': review_file_contents, - } - } - ) - response = service_request.execute() - # [END constructing_the_request] - - # [START parsing_the_response] - score = response['documentSentiment']['score'] - magnitude = response['documentSentiment']['magnitude'] - - for n, sentence in enumerate(response['sentences']): - sentence_sentiment = sentence['sentiment']['score'] - print('Sentence {} has a sentiment score of {}'.format(n, - sentence_sentiment)) - - print('Overall Sentiment: score of {} with magnitude of {}'.format( - score, magnitude)) - # [END parsing_the_response] - - -# [START running_your_application] -if __name__ == '__main__': - parser = argparse.ArgumentParser() - parser.add_argument( - 'movie_review_filename', - help='The filename of the movie review you\'d like to analyze.') - args = parser.parse_args() - print_sentiment(args.movie_review_filename) -# [END running_your_application] -# [END full_tutorial_script] diff --git a/samples/snippets/tutorial/tutorial_test.py b/samples/snippets/tutorial/tutorial_test.py deleted file mode 100644 index 065076fb..00000000 --- a/samples/snippets/tutorial/tutorial_test.py +++ /dev/null @@ -1,51 +0,0 @@ -# Copyright 2016, Google, Inc. -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import re - -import tutorial - - -def test_neutral(capsys): - tutorial.print_sentiment('reviews/bladerunner-neutral.txt') - out, _ = capsys.readouterr() - assert re.search(r'Sentence \d has a sentiment score of \d', out, re.I) - assert re.search( - r'Overall Sentiment: score of -?[0-2]\.?[0-9]? with ' - r'magnitude of [0-1]\.?[0-9]?', out, re.I) - - -def test_pos(capsys): - tutorial.print_sentiment('reviews/bladerunner-pos.txt') - out, _ = capsys.readouterr() - assert re.search(r'Sentence \d has a sentiment score of \d', out, re.I) - assert re.search( - r'Overall Sentiment: score of [0-9]\.?[0-9]? with ' - r'magnitude of [0-9]\.?[0-9]?', out, re.I) - - -def test_neg(capsys): - tutorial.print_sentiment('reviews/bladerunner-neg.txt') - out, _ = capsys.readouterr() - assert re.search(r'Sentence \d has a sentiment score of \d', out, re.I) - assert re.search( - r'Overall Sentiment: score of -[0-9]\.?[0-9]? with ' - r'magnitude of [2-7]\.?[0-9]?', out, re.I) - - -def test_mixed(capsys): - tutorial.print_sentiment('reviews/bladerunner-mixed.txt') - out, _ = capsys.readouterr() - assert re.search(r'Sentence \d has a sentiment score of \d', out, re.I) - assert re.search( - r'Overall Sentiment: score of -?[0-9]\.?[0-9]? with ' - r'magnitude of [3-6]\.?[0-9]?', out, re.I)