Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Client.list_rows should prefetch the first page and populate total_rows and other attributes #4152

Closed
bits01 opened this issue Oct 11, 2017 · 17 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@bits01
Copy link

bits01 commented Oct 11, 2017

https://googlecloudplatform.github.io/google-cloud-python/latest/_modules/google/cloud/bigquery/table.html#Table.fetch_data

Return value doc:

:returns: Iterator of row data :class:`tuple`s. During each page, the
                  iterator will have the ``total_rows`` attribute set,
                  which counts the total number of rows **in the table**
                  (this is distinct from the total number of rows in the
                  current page: ``iterator.page.num_items``).

The iterator does not have the total_rows attribute or a page attribute either.

Can fetch_data() get a count of all rows or the only way is to iterate and count?

@bits01
Copy link
Author

bits01 commented Oct 11, 2017

@lukesneeringer
Copy link
Contributor

Summoning @tswast since he has been in this code a lot recently and probably knows the answer off hand. :-)

@lukesneeringer lukesneeringer added api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue. labels Oct 12, 2017
@dhermes
Copy link
Contributor

dhermes commented Oct 12, 2017

@tswast The iterator docs are great, they just got unpublished when the module moved from google.cloud.iterator to google.api.core.page_iterator.

/cc @jonparrott

@bits01
Copy link
Author

bits01 commented Oct 12, 2017

#4153 is related.

@tswast
Copy link
Contributor

tswast commented Oct 13, 2017

The iterator is using

https://github.com/GoogleCloudPlatform/google-cloud-python/blob/e6cc4b4a6486971f910139d766c7807a588ea53b/bigquery/google/cloud/bigquery/_helpers.py#L507-L522

which appears to be populating total_rows. I'll investigate further and make sure our system tests are checking for this property.

@bits01
Copy link
Author

bits01 commented Oct 14, 2017

It works on QueryResult, but not on the iterator returned by the destination table fetch_data():
AttributeError: 'HTTPIterator' object has no attribute 'total_rows'

@tswast
Copy link
Contributor

tswast commented Oct 16, 2017

My guess as to what is happening is that total_rows only gets populated after you start iterating through, because it gets that information from the first page of results.

I agree that this is confusing behavior.

@jonparrott Do you think we could prefetch the first page when we create the iterator?

@bits01
Copy link
Author

bits01 commented Oct 16, 2017

Indeed. Once you start iterating total_rows shows up. Should be made consistent with QueryResult, assuming that QueryResult even needs to exist at all.

@theacodes
Copy link
Contributor

@jonparrott Do you think we could prefetch the first page when we create the iterator?

That's a better question for @dhermes, who designed the iterator.

@dhermes
Copy link
Contributor

dhermes commented Oct 16, 2017

Yes. The easiest way to do this would be to just subclass the relevant iterator class and over-ride the _next_page method.

Or you could monkey-patch the _next_page method on your instances. This way you wouldn't have to subclass both the HTTP and gRPC iterators, but then you have the problem of figuring out how to fetch the first page via HTTP or gRPC.

@tswast tswast changed the title BigQuery: table.fetch_data() return documentation BigQuery: Client.list_rows should prefetch the first page and populate total_rows and other attributes Oct 25, 2017
@tswast tswast added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: question Request for information or clarification. Not an issue. labels Oct 25, 2017
@tseaver
Copy link
Contributor

tseaver commented Feb 21, 2018

@tswast Is this really a P1 issue? If so, we need to get it into SLO, i.e., fix it ASAP. If not, can you please switch it to P2?

@tswast
Copy link
Contributor

tswast commented Feb 22, 2018

We can bump it down. I put it at P1 thinking it might be a breaking change for GA, but then forgot about it.

@tswast tswast added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Feb 22, 2018
@chemelnucfin
Copy link
Contributor

I'm actually going to track it through here for the time being.
Feature Requests

@ashutoshkumars
Copy link

IMHO, total_rows should be accessible irrsepective of whether we start iterating the page or not.

@tswast
Copy link
Contributor

tswast commented Dec 18, 2018

@ashutoshkumars I agree, which is why I opened this feature request.

I have changed my mind on the implementation details. I don't think we should prefetch the first page, because that could be an unnecessary API request.

Instead, total_rows is available both in the query results API response and the get table API response. It could be copied to the object in most cases. The only problem with that is with table references. We would still need to make an API request to get the table size since it's not available in just a reference.

@tswast
Copy link
Contributor

tswast commented Apr 2, 2019

#7622 should address this request in the client for most cases.

parthea pushed a commit that referenced this issue Sep 22, 2023
* Add samples for DLP API v2beta1 [(#1369)](GoogleCloudPlatform/python-docs-samples#1369)

* Auto-update dependencies. [(#1377)](GoogleCloudPlatform/python-docs-samples#1377)

* Auto-update dependencies.

* Update requirements.txt

* Update DLP samples for release [(#1415)](GoogleCloudPlatform/python-docs-samples#1415)

* fix DLP region tags, and add @flaky to pub/sub sample tests [(#1418)](GoogleCloudPlatform/python-docs-samples#1418)

* Auto-update dependencies.

* Regenerate the README files and fix the Open in Cloud Shell link for some samples [(#1441)](GoogleCloudPlatform/python-docs-samples#1441)

* Update README for DLP GA [(#1426)](GoogleCloudPlatform/python-docs-samples#1426)

* Update READMEs to fix numbering and add git clone [(#1464)](GoogleCloudPlatform/python-docs-samples#1464)

* DLP: Add auto_populate_timespan option for create job trigger. [(#1543)](GoogleCloudPlatform/python-docs-samples#1543)

* Add DLP code samples for custom info types [(#1524)](GoogleCloudPlatform/python-docs-samples#1524)

* Add custom info type samples to inspect_content.py

Use flags to indicate dictionary word lists and regex patterns, then parse them into custom info types.

* Make code compatible with python 2.7

* Add missing commas

* Remove bad import

* Add tests for custom info types

* Add info_types parameter to deid.py

* Update deid tests to use info_types parameter

* Fix indentation

* Add blank lines

* Share logic for building custom info types

* Fix line too long

* Fix typo.

* Revert "Fix typo."

This reverts commit b4ffea6eef1fc2ccd2a4f17adb6e9492e54f1b76, so that
the sharing of the custom info type logic can be reverted as well to
make the code samples more readable.

* Revert "Share logic for building custom info types"

This reverts commit 47fc04f74c77db3bd5397459cf9242dc11521c37. This makes
the code samples more readable.

* Switch from indexes to using enumerate.

* Updated help message for custom dictionaries.

* Fix enumerate syntax error.

* upgrade DLP version and fix tests [(#1784)](GoogleCloudPlatform/python-docs-samples#1784)

* upgrade DLP version and fix tests

* bump dlp version again

* Auto-update dependencies. [(#1846)](GoogleCloudPlatform/python-docs-samples#1846)

ACK, merging.

* Per internal documentation complaint, fix the naming. [(#1933)](GoogleCloudPlatform/python-docs-samples#1933)

The documentation for DLP uses 'dlp' as the instance name.  As this is also the name of the python package, it could be confusing for people new to the API object model so switch to dlp_client.

* Add inspect table code sample for DLP and some nit fixes [(#1921)](GoogleCloudPlatform/python-docs-samples#1921)

* Remove claim that redact.py operates on strings

Reflect in the comments that this particular code sample does not support text redaction.

* Add code sample for inspecting table, fix requirements for running tests, quickstart example refactor

* Remove newline, if -> elif

* formatting

* More formatting

* Update DLP redact image code sample region to include mimetype import [(#1928)](GoogleCloudPlatform/python-docs-samples#1928)

In response to feedback where a user was confused that the mimetype
import was missing from the code sample in the documentation.

* Update to use new subscribe() syntax [(#1989)](GoogleCloudPlatform/python-docs-samples#1989)

* Update to use new subscribe() syntax

* Missed two subscribe() call changes before

* Cancel subscription when processed

* Update risk.py

* Fix waiting for message

* Unneeded try/except removed

* Auto-update dependencies. [(#1980)](GoogleCloudPlatform/python-docs-samples#1980)

* Auto-update dependencies.

* Update requirements.txt

* Update requirements.txt

* Convert append -> nargs, so arguments are not additive [(#2191)](GoogleCloudPlatform/python-docs-samples#2191)

* increase test timeout [(#2351)](GoogleCloudPlatform/python-docs-samples#2351)

* Adds updates including compute [(#2436)](GoogleCloudPlatform/python-docs-samples#2436)

* Adds updates including compute

* Python 2 compat pytest

* Fixing weird \r\n issue from GH merge

* Put asset tests back in

* Re-add pod operator test

* Hack parameter for k8s pod operator

* Update DLP samples to use dlp_v2 client. [(#2580)](GoogleCloudPlatform/python-docs-samples#2580)

* fix: correct dataset name, use env var for project [(#2621)](GoogleCloudPlatform/python-docs-samples#2621)

* fix: correct dataset name, use env var for project

* Add uuids to tests

* add uuids and fixtures for bq

* Add logic to delete job

* ran black

* Run black with line length

* Add utf encoding for python 2 tests

* Add skips for now

* Ran black

* Remove skips, adjust job tests

* fix lint and skips

* Cleanup commented things

Co-authored-by: Kurtis Van Gent <[email protected]>

* Remove param to reduce latency (per docs) [(#2853)](GoogleCloudPlatform/python-docs-samples#2853)

* chore(deps): update dependency google-cloud-storage to v1.26.0 [(#3046)](GoogleCloudPlatform/python-docs-samples#3046)

* chore(deps): update dependency google-cloud-storage to v1.26.0

* chore(deps): specify dependencies by python version

* chore: up other deps to try to remove errors

Co-authored-by: Leah E. Cole <[email protected]>
Co-authored-by: Leah Cole <[email protected]>

* Fix dlp tests [(#3058)](GoogleCloudPlatform/python-docs-samples#3058)

Since the tests are flaky and timing out, I'm proposing we do the ML API approach of creating an operation then canceling it. 

It would 
fix #2809
fix #2810  
fix #2811 
fix #2812

* Simplify noxfile setup. [(#2806)](GoogleCloudPlatform/python-docs-samples#2806)

* chore(deps): update dependency requests to v2.23.0

* Simplify noxfile and add version control.

* Configure appengine/standard to only test Python 2.7.

* Update Kokokro configs to match noxfile.

* Add requirements-test to each folder.

* Remove Py2 versions from everything execept appengine/standard.

* Remove conftest.py.

* Remove appengine/standard/conftest.py

* Remove 'no-sucess-flaky-report' from pytest.ini.

* Add GAE SDK back to appengine/standard tests.

* Fix typo.

* Roll pytest to python 2 version.

* Add a bunch of testing requirements.

* Remove typo.

* Add appengine lib directory back in.

* Add some additional requirements.

* Fix issue with flake8 args.

* Even more requirements.

* Readd appengine conftest.py.

* Add a few more requirements.

* Even more Appengine requirements.

* Add webtest for appengine/standard/mailgun.

* Add some additional requirements.

* Add workaround for issue with mailjet-rest.

* Add responses for appengine/standard/mailjet.

Co-authored-by: Renovate Bot <[email protected]>

* [dlp] fix: fix periodic builds timeout [(#3420)](GoogleCloudPlatform/python-docs-samples#3420)

* [dlp] fix: remove gcp-devrel-py-tools

fixes #3375
fixes #3416
fixes #3417

* remove wrong usage of `eventually_consistent.call`
* only test if the operation has been started
* shorter timeout for polling
* correct use of `pytest.mark.flaky`
* use try-finally
* use uuid for job_id
* add a filter to allow state = DONE

* chore(deps): update dependency google-cloud-dlp to v0.14.0 [(#3431)](GoogleCloudPlatform/python-docs-samples#3431)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [google-cloud-dlp](https://togithub.com/googleapis/python-dlp) | minor | `==0.13.0` -> `==0.14.0` |

---

### Release Notes

<details>
<summary>googleapis/python-dlp</summary>

### [`v0.14.0`](https://togithub.com/googleapis/python-dlp/blob/master/CHANGELOG.md#&#8203;0140-httpswwwgithubcomgoogleapispython-dlpcomparev0130v0140-2020-02-21)

[Compare Source](https://togithub.com/googleapis/python-dlp/compare/v0.13.0...v0.14.0)

##### Features

-   **dlp:** undeprecate resource name helper methods, add 2.7 deprecation warning (via synth)  ([#&#8203;10040](https://www.github.com/googleapis/python-dlp/issues/10040)) ([b30d7c1](https://www.github.com/googleapis/python-dlp/commit/b30d7c1cd48fba47fdddb7b9232e421261108a52))

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [x] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples).

* Update dependency google-cloud-datastore to v1.12.0 [(#3296)](GoogleCloudPlatform/python-docs-samples#3296)

Co-authored-by: gcf-merge-on-green[bot] <60162190+gcf-merge-on-green[bot]@users.noreply.github.com>

* Update dependency google-cloud-pubsub to v1.4.2 [(#3340)](GoogleCloudPlatform/python-docs-samples#3340)

Co-authored-by: Leah E. Cole <[email protected]>

* chore(deps): update dependency google-cloud-storage to v1.28.0 [(#3260)](GoogleCloudPlatform/python-docs-samples#3260)

Co-authored-by: Takashi Matsuo <[email protected]>

* [dlp] fix: increase the number of retries for some tests [(#3685)](GoogleCloudPlatform/python-docs-samples#3685)

fixes #3673

* chore: some lint fixes [(#3744)](GoogleCloudPlatform/python-docs-samples#3744)

* chore(deps): update dependency google-cloud-pubsub to v1.4.3 [(#3725)](GoogleCloudPlatform/python-docs-samples#3725)

Co-authored-by: Bu Sun Kim <[email protected]>
Co-authored-by: Takashi Matsuo <[email protected]>

* chore(deps): update dependency google-cloud-dlp to v0.15.0 [(#3780)](GoogleCloudPlatform/python-docs-samples#3780)

* chore(deps): update dependency google-cloud-storage to v1.28.1 [(#3785)](GoogleCloudPlatform/python-docs-samples#3785)

* chore(deps): update dependency google-cloud-storage to v1.28.1

* [asset] testing: use uuid instead of time

Co-authored-by: Takashi Matsuo <[email protected]>

* chore(deps): update dependency google-cloud-pubsub to v1.5.0 [(#3781)](GoogleCloudPlatform/python-docs-samples#3781)

Co-authored-by: Bu Sun Kim <[email protected]>

* [dlp] fix: mitigate flakiness [(#3919)](GoogleCloudPlatform/python-docs-samples#3919)

* [dlp] fix: mitigate flakiness

* make the Pub/Sub fixture function level
* shorten the timeout for the tests from 300 secs to 30 secs
* retring all the tests in risk_test.py 3 times

fixes #3897
fixes #3896
fixes #3895
fixes #3894
fixes #3893
fixes #3892
fixes #3890
fixes #3889

* more retries, comment

* 30 seconds operation wait and 20 minutes retry delay

* lint fix etc

* limit the max retry wait time

* [dlp] testing: fix Pub/Sub notifications [(#3925)](GoogleCloudPlatform/python-docs-samples#3925)

* re-generated README.rst with some more setup info
* use parent with the global location attached
* re-enabled some tests with Pub/Sub notification
* stop waiting between test retries

* Add text redaction sample using DLP [(#3964)](GoogleCloudPlatform/python-docs-samples#3964)

* Add text redaction sample using DLP

* Update dlp/deid.py

Co-authored-by: Bu Sun Kim <[email protected]>

* Rename string parameter to item

Co-authored-by: Bu Sun Kim <[email protected]>

* testing: start using btlr [(#3959)](GoogleCloudPlatform/python-docs-samples#3959)

* testing: start using btlr

The binary is at gs://cloud-devrel-kokoro-resources/btlr/v0.0.1/btlr

* add period after DIFF_FROM

* use array for btlr args

* fix websocket tests

* add debug message

* wait longer for the server to spin up

* dlp: bump the wait timeout to 10 minutes

* [run] copy noxfile.py to child directory to avoid gcloud issue

* [iam] fix: only display description when the key exists

* use uuid4 instead of uuid1

* [iot] testing: use the same format for registry id

* Stop asserting Out of memory not in the output

* fix missing imports

* [dns] testing: more retries with delay

* [dlp] testing: longer timeout

* use the max-concurrency flag

* use 30 workers

* [monitoring] use multiple projects

* [dlp] testing: longer timeout

* Add code sample for string replacement based deidentification. [(#3956)](GoogleCloudPlatform/python-docs-samples#3956)

Adds a code sample corresponding to the replacement based deidentification in the Cloud DLP API. The detected sensitive value is replaced with a specified surrogate.

* Add custom infoType snippets to DLP samples [(#3991)](GoogleCloudPlatform/python-docs-samples#3991)

* Replace GCLOUD_PROJECT with GOOGLE_CLOUD_PROJECT. [(#4022)](GoogleCloudPlatform/python-docs-samples#4022)

* Rename DLP code samples from 'redact' to 'replace' [(#4020)](GoogleCloudPlatform/python-docs-samples#4020)

In the DLP API, redaction and replacement are two separate, named concepts. Code samples recently added by #3964 were named 'redact' but are actually examples of replacement. This change renames those samples for clarity.

* Add DLP sample for redacting all image text [(#4018)](GoogleCloudPlatform/python-docs-samples#4018)

The sample shows how to remove all text found in an image with DLP.
The sample is integrated into the existing redact.py CLI application.

* Add DLP sample code for inspecting with custom regex detector [(#4031)](GoogleCloudPlatform/python-docs-samples#4031)

* code sample and test for medical record number custom regex detector

* fix linter error

* Using f-strings instead of string.format

Co-authored-by: Bu Sun Kim <[email protected]>

Co-authored-by: Bu Sun Kim <[email protected]>
Co-authored-by: gcf-merge-on-green[bot] <60162190+gcf-merge-on-green[bot]@users.noreply.github.com>

* Update dependency google-cloud-dlp to v1 [(#4047)](GoogleCloudPlatform/python-docs-samples#4047)

* Update dependency google-cloud-bigquery to v1.25.0 [(#4024)](GoogleCloudPlatform/python-docs-samples#4024)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [google-cloud-bigquery](https://togithub.com/googleapis/python-bigquery) | minor | `==1.24.0` -> `==1.25.0` |

---

### Release Notes

<details>
<summary>googleapis/python-bigquery</summary>

### [`v1.25.0`](https://togithub.com/googleapis/python-bigquery/blob/master/CHANGELOG.md#&#8203;1250-httpswwwgithubcomgoogleapispython-bigquerycomparev1240v1250-2020-06-06)

[Compare Source](https://togithub.com/googleapis/python-bigquery/compare/v1.24.0...v1.25.0)

##### Features

-   add BigQuery storage client support to DB API ([#&#8203;36](https://www.github.com/googleapis/python-bigquery/issues/36)) ([ba9b2f8](https://www.github.com/googleapis/python-bigquery/commit/ba9b2f87e36320d80f6f6460b77e6daddb0fa214))
-   **bigquery:** add create job method ([#&#8203;32](https://www.github.com/googleapis/python-bigquery/issues/32)) ([2abdef8](https://www.github.com/googleapis/python-bigquery/commit/2abdef82bed31601d1ca1aa92a10fea1e09f5297))
-   **bigquery:** add support of model for extract job ([#&#8203;71](https://www.github.com/googleapis/python-bigquery/issues/71)) ([4a7a514](https://www.github.com/googleapis/python-bigquery/commit/4a7a514659a9f6f9bbd8af46bab3f8782d6b4b98))
-   add HOUR support for time partitioning interval ([#&#8203;91](https://www.github.com/googleapis/python-bigquery/issues/91)) ([0dd90b9](https://www.github.com/googleapis/python-bigquery/commit/0dd90b90e3714c1d18f8a404917a9454870e338a))
-   add support for policy tags ([#&#8203;77](https://www.github.com/googleapis/python-bigquery/issues/77)) ([38a5c01](https://www.github.com/googleapis/python-bigquery/commit/38a5c01ca830daf165592357c45f2fb4016aad23))
-   make AccessEntry objects hashable ([#&#8203;93](https://www.github.com/googleapis/python-bigquery/issues/93)) ([23a173b](https://www.github.com/googleapis/python-bigquery/commit/23a173bc5a25c0c8200adc5af62eb05624c9099e))
-   **bigquery:** expose start index parameter for query result ([#&#8203;121](https://www.github.com/googleapis/python-bigquery/issues/121)) ([be86de3](https://www.github.com/googleapis/python-bigquery/commit/be86de330a3c3801653a0ccef90e3d9bdb3eee7a))
-   **bigquery:** unit and system test for dataframe with int column with Nan values  ([#&#8203;39](https://www.github.com/googleapis/python-bigquery/issues/39)) ([5fd840e](https://www.github.com/googleapis/python-bigquery/commit/5fd840e9d4c592c4f736f2fd4792c9670ba6795e))

##### Bug Fixes

-   allow partial streaming_buffer statistics ([#&#8203;37](https://www.github.com/googleapis/python-bigquery/issues/37)) ([645f0fd](https://www.github.com/googleapis/python-bigquery/commit/645f0fdb35ee0e81ee70f7459e796a42a1f03210))
-   distinguish server timeouts from transport timeouts ([#&#8203;43](https://www.github.com/googleapis/python-bigquery/issues/43)) ([a17be5f](https://www.github.com/googleapis/python-bigquery/commit/a17be5f01043f32d9fbfb2ddf456031ea9205c8f))
-   improve cell magic error message on missing query ([#&#8203;58](https://www.github.com/googleapis/python-bigquery/issues/58)) ([6182cf4](https://www.github.com/googleapis/python-bigquery/commit/6182cf48aef8f463bb96891cfc44a96768121dbc))
-   **bigquery:** fix repr of model reference ([#&#8203;66](https://www.github.com/googleapis/python-bigquery/issues/66)) ([26c6204](https://www.github.com/googleapis/python-bigquery/commit/26c62046f4ec8880cf6561cc90a8b821dcc84ec5))
-   **bigquery:** fix start index with page size for list rows ([#&#8203;27](https://www.github.com/googleapis/python-bigquery/issues/27)) ([400673b](https://www.github.com/googleapis/python-bigquery/commit/400673b5d0f2a6a3d828fdaad9d222ca967ffeff))

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples).

* Add code sample and tests for redaction [(#4037)](GoogleCloudPlatform/python-docs-samples#4037)

Add A DLP code sample for redacting text.

Code will be linked to this documentation: https://cloud.google.com/dlp/docs/deidentify-sensitive-data

* dlp: add inspect string sample, person_name w/ custom hotword certainty boosting [(#4081)](GoogleCloudPlatform/python-docs-samples#4081)

* Add a simplified inspect string example to DLP code samples [(#4069)](GoogleCloudPlatform/python-docs-samples#4069)

* Add a simplified inspect string example

* Remove unnecessary try-catch block - all findings in this examnple should have quotes.

* dlp: Add sample for reid w/ fpe using surrogate type and unwrapped security key [(#4051)](GoogleCloudPlatform/python-docs-samples#4051)

* add code sample and test for reid w/ fpe using surrogate type and unwrapped security key

* refactor reidentify_config

* add code sample and test for medical number custom detector with hotwords [(#4071)](GoogleCloudPlatform/python-docs-samples#4071)

Co-authored-by: Kurtis Van Gent <[email protected]>

* Add DLP code sample and test for de-id free text with surrogate [(#4085)](GoogleCloudPlatform/python-docs-samples#4085)

## Description
Add DLP code sample and test for de-id free text with surrogate, meant for https://cloud.google.com/dlp/docs/pseudonymization#de-identification_in_free_text_code_example

## Checklist
- [x] I have followed [Sample Guidelines from AUTHORING_GUIDE.MD](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/AUTHORING_GUIDE.md)
- [ ] README is updated to include [all relevant information](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/AUTHORING_GUIDE.md#readme-file)
- [x] **Tests** pass:   `nox -s py-3.6` (see [Test Enviroment Setup](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/AUTHORING_GUIDE.md#test-environment-setup))
- [x] **Lint** pass:   `nox -s lint` (see [Test Enviroment Setup](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/AUTHORING_GUIDE.md#test-environment-setup))
- [ ] These samples need a new **API enabled** in testing projects to pass (let us know which ones)
- [ ] These samples need a new/updated **env vars** in testing projects set to pass (let us know which ones)
- [x] Please **merge** this PR for me once it is approved.

* chore(deps): update dependency google-cloud-storage to v1.29.0 [(#4040)](GoogleCloudPlatform/python-docs-samples#4040)

* Update dependency google-cloud-pubsub to v1.6.0 [(#4039)](GoogleCloudPlatform/python-docs-samples#4039)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [google-cloud-pubsub](https://togithub.com/googleapis/python-pubsub) | minor | `==1.5.0` -> `==1.6.0` |

---

### Release Notes

<details>
<summary>googleapis/python-pubsub</summary>

### [`v1.6.0`](https://togithub.com/googleapis/python-pubsub/blob/master/CHANGELOG.md#&#8203;160-httpswwwgithubcomgoogleapispython-pubsubcomparev150v160-2020-06-09)

[Compare Source](https://togithub.com/googleapis/python-pubsub/compare/v1.5.0...v1.6.0)

##### Features

-   Add flow control for message publishing ([#&#8203;96](https://www.github.com/googleapis/python-pubsub/issues/96)) ([06085c4](https://www.github.com/googleapis/python-pubsub/commit/06085c4083b9dccdd50383257799904510bbf3a0))

##### Bug Fixes

-   Fix PubSub incompatibility with api-core 1.17.0+ ([#&#8203;103](https://www.github.com/googleapis/python-pubsub/issues/103)) ([c02060f](https://www.github.com/googleapis/python-pubsub/commit/c02060fbbe6e2ca4664bee08d2de10665d41dc0b))

##### Documentation

-   Clarify that Schedulers shouldn't be used with multiple SubscriberClients ([#&#8203;100](https://togithub.com/googleapis/python-pubsub/pull/100)) ([cf9e87c](https://togithub.com/googleapis/python-pubsub/commit/cf9e87c80c0771f3fa6ef784a8d76cb760ad37ef))
-   Fix update subscription/snapshot/topic samples ([#&#8203;113](https://togithub.com/googleapis/python-pubsub/pull/113)) ([e62c38b](https://togithub.com/googleapis/python-pubsub/commit/e62c38bb33de2434e32f866979de769382dea34a))

##### Internal / Testing Changes

-   Re-generated service implementaton using synth: removed experimental notes from the RetryPolicy and filtering features in anticipation of GA, added DetachSubscription (experimental) ([#&#8203;114](https://togithub.com/googleapis/python-pubsub/pull/114)) ([0132a46](https://togithub.com/googleapis/python-pubsub/commit/0132a4680e0727ce45d5e27d98ffc9f3541a0962))
-   Incorporate will_accept() checks into publish() ([#&#8203;108](https://togithub.com/googleapis/python-pubsub/pull/108)) ([6c7677e](https://togithub.com/googleapis/python-pubsub/commit/6c7677ecb259672bbb9b6f7646919e602c698570))

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples).

* [dlp] fix: add retry count to mitigate the flake [(#4152)](GoogleCloudPlatform/python-docs-samples#4152)

fixes #4100

* chore(deps): update dependency google-cloud-pubsub to v1.6.1 [(#4242)](GoogleCloudPlatform/python-docs-samples#4242)

Co-authored-by: gcf-merge-on-green[bot] <60162190+gcf-merge-on-green[bot]@users.noreply.github.com>

* chore(deps): update dependency google-cloud-datastore to v1.13.0 [(#4273)](GoogleCloudPlatform/python-docs-samples#4273)

* chore(deps): update dependency pytest to v5.4.3 [(#4279)](GoogleCloudPlatform/python-docs-samples#4279)

* chore(deps): update dependency pytest to v5.4.3

* specify pytest for python 2 in appengine

Co-authored-by: Leah Cole <[email protected]>

* chore(deps): update dependency mock to v4 [(#4287)](GoogleCloudPlatform/python-docs-samples#4287)

* chore(deps): update dependency mock to v4

* specify mock version for appengine python 2

Co-authored-by: Leah Cole <[email protected]>

* chore(deps): update dependency google-cloud-pubsub to v1.7.0 [(#4290)](GoogleCloudPlatform/python-docs-samples#4290)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [google-cloud-pubsub](https://togithub.com/googleapis/python-pubsub) | minor | `==1.6.1` -> `==1.7.0` |

---

### Release Notes

<details>
<summary>googleapis/python-pubsub</summary>

### [`v1.7.0`](https://togithub.com/googleapis/python-pubsub/blob/master/CHANGELOG.md#&#8203;170-httpswwwgithubcomgoogleapispython-pubsubcomparev161v170-2020-07-13)

[Compare Source](https://togithub.com/googleapis/python-pubsub/compare/v1.6.1...v1.7.0)

##### New Features

-   Add support for server-side flow control. ([#&#8203;143](https://togithub.com/googleapis/python-pubsub/pull/143)) ([04e261c](https://www.github.com/googleapis/python-pubsub/commit/04e261c602a2919cc75b3efa3dab099fb2cf704c))

##### Dependencies

-   Update samples dependency `google-cloud-pubsub` to `v1.6.1`. ([#&#8203;144](https://togithub.com/googleapis/python-pubsub/pull/144)) ([1cb6746](https://togithub.com/googleapis/python-pubsub/commit/1cb6746b00ebb23dbf1663bae301b32c3fc65a88))

##### Documentation

-   Add pubsub/cloud-client samples from the common samples repo (with commit history). ([#&#8203;151](https://togithub.com/googleapis/python-pubsub/pull/151)) 
-   Add flow control section to publish overview. ([#&#8203;129](https://togithub.com/googleapis/python-pubsub/pull/129)) ([acc19eb](https://www.github.com/googleapis/python-pubsub/commit/acc19eb048eef067d9818ef3e310b165d9c6307e))
-   Add a link to Pub/Sub filtering language public documentation to `pubsub.proto`. ([#&#8203;121](https://togithub.com/googleapis/python-pubsub/pull/121)) ([8802d81](https://www.github.com/googleapis/python-pubsub/commit/8802d8126247f22e26057e68a42f5b5a82dcbf0d))

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Renovate will not automatically rebase this PR, because other commits have been found.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples).

* Update dependency flaky to v3.7.0 [(#4300)](GoogleCloudPlatform/python-docs-samples#4300)

* Update dependency google-cloud-datastore to v1.13.1 [(#4295)](GoogleCloudPlatform/python-docs-samples#4295)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [google-cloud-datastore](https://togithub.com/googleapis/python-datastore) | patch | `==1.13.0` -> `==1.13.1` |

---

### Release Notes

<details>
<summary>googleapis/python-datastore</summary>

### [`v1.13.1`](https://togithub.com/googleapis/python-datastore/blob/master/CHANGELOG.md#&#8203;1131-httpswwwgithubcomgoogleapispython-datastorecomparev1130v1131-2020-07-13)

[Compare Source](https://togithub.com/googleapis/python-datastore/compare/v1.13.0...v1.13.1)

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Renovate will not automatically rebase this PR, because other commits have been found.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples).

* chore(deps): update dependency google-cloud-datastore to v1.13.2 [(#4326)](GoogleCloudPlatform/python-docs-samples#4326)

* Update dependency google-cloud-storage to v1.30.0

* Update dependency pytest to v6 [(#4390)](GoogleCloudPlatform/python-docs-samples#4390)

* chore: update templates

* chore: update synth.py

* chore: update project env name

Co-authored-by: Andrew Gorcester <[email protected]>
Co-authored-by: DPE bot <[email protected]>
Co-authored-by: chenyumic <[email protected]>
Co-authored-by: Frank Natividad <[email protected]>
Co-authored-by: Mike DaCosta <[email protected]>
Co-authored-by: michaelawyu <[email protected]>
Co-authored-by: mwdaub <[email protected]>
Co-authored-by: realjordanna <[email protected]>
Co-authored-by: Ace <[email protected]>
Co-authored-by: djmailhot <[email protected]>
Co-authored-by: Charles Engelke <[email protected]>
Co-authored-by: Maximus <[email protected]>
Co-authored-by: Averi Kitsch <[email protected]>
Co-authored-by: Gus Class <[email protected]>
Co-authored-by: Leah E. Cole <[email protected]>
Co-authored-by: Kurtis Van Gent <[email protected]>
Co-authored-by: WhiteSource Renovate <[email protected]>
Co-authored-by: Leah Cole <[email protected]>
Co-authored-by: Takashi Matsuo <[email protected]>
Co-authored-by: gcf-merge-on-green[bot] <60162190+gcf-merge-on-green[bot]@users.noreply.github.com>
Co-authored-by: Bu Sun Kim <[email protected]>
Co-authored-by: Seth Moore <[email protected]>
Co-authored-by: Ace <[email protected]>
Co-authored-by: Seth Moore <[email protected]>
Co-authored-by: jlmwise <[email protected]>
Co-authored-by: Xiaohua (Victor) Liang <[email protected]>
Co-authored-by: Xiaohua (Victor) Liang <[email protected]>
Co-authored-by: Charles Engelke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

8 participants