Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unified Fides] Logging too much PII because Test Mode is True by Default #1162

Merged
merged 1 commit into from
Oct 3, 2022

Conversation

pattisdr
Copy link
Contributor

@pattisdr pattisdr commented Sep 30, 2022

Closes #1143

Code Changes

  • Change hide_parameters on the BaseConnector to inspect "dev mode" instead of "test mode".

Steps to Confirm

  • Make sure "Dev mode" is False locally. Run a privacy request in Postman against a Postgres database and verify that PII isn't visible in the logs.

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation Updated:
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Issue Requirements are Met
  • Relevant Follow-Up Issues Created
  • Update CHANGELOG.md

Description Of Changes

Logging query parameters for the BaseConnector in fidesops is tied to whether "test mode" is True. It is currently "True" by default in Fides, where it was "False" by default in Fidesops. This is causing too much PII to be logged when running a privacy request.

Instead, tying this field to look at "dev mode" instead of "test mode". We'd like to separately get rid of "test mode" altogether.

@pattisdr pattisdr self-assigned this Sep 30, 2022
@pattisdr
Copy link
Contributor Author

pattisdr commented Sep 30, 2022

Looking to see where else needs to be updated, sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "fides-db" (172.18.0.3), port 5432 failed: FATAL: database "fides_test" does not exist when we do the fides_db_scan

This also breaks some ctl tests

@ThomasLaPiana
Copy link
Contributor

@pattisdr just asking for background here so I have context, why shouldn't we run in test mode when developing locally? I'm assuming it fundamentally changes how ops functions?

@pattisdr
Copy link
Contributor Author

pattisdr commented Sep 30, 2022

Fidesops intended test mode to just be used for pytest. Test mode populated the test database, here fides_test and also displayed the parameters inside queries for troubleshooting.

As-is, the fides database is never populated because test mode is True everywhere by default. I do think it's useful to be populating separate databases for running pytest versus running local application code. Fidesops pytest at least did a lot of cleanup of the database it connected to.

There would also be a concern that users would mimic code inside our docker-compose.yml and use test mode which logs excessive PII to the console when executing privacy requests.


As an aside, I think we should instead tie whether or not we hide query parameters to "dev mode" instead of tying it to "test mode". I think the meaning of test mode should really be just, "do we populate the regular database or the test one"? And "dev mode" should be "do we log more details than normal because we're developing"

@ThomasLaPiana
Copy link
Contributor

Fidesops intended test mode to just be used for pytest. Test mode populated the test database, here fides_test and also displayed the parameters inside queries for troubleshooting.

As-is, the fides database is never populated because test mode is True everywhere by default. I do think it's useful to be populating separate databases for running pytest versus running local application code. Fidesops pytest at least did a lot of cleanup of the database it connected to.

There would also be a concern that users would mimic code inside our docker-compose.yml and use test mode which logs excessive PII to the console when executing privacy requests.

As an aside, I think we should instead tie whether or not we hide query parameters to "dev mode" instead of tying it to "test mode". I think the meaning of test mode should really be just, "do we populate the regular database or the test one"? And "dev mode" should be "do we log more details than normal because we're developing"

That all makes sense to me, I appreciate the detailed explanation! On ctl we never worried about this, we just blew away the database for testing every single time and didn't separate it from an application database while developing, but I'm not against separating those concerns

Even though it'd be a pain, maybe what we need to do here is remove the concept of a "test" database altogether, and instead load a test_fides.toml config file that points to a test database? This would require injecting a test_config fixture into every test that needs the config (we do this in the ctl tests mostly), but then this would also mean the application is unaware of any "test mode" vs. normal operation.

This then ties in perfectly to your idea of dangerous features being tied to a "dev" mode instead of a "test" mode which I think sounds like a great improvement

@pattisdr
Copy link
Contributor Author

This sentence is unclear:

Even though it'd be a pain, maybe what we need to do here is remove the concept of a "test" database altogether, and instead load a test_fides.toml config file that points to a test database?

I wonder if I've made Fidesops seem like it was more tightly tied to "test mode" than it is.


To clarify, Fidesops just had the one PII logging feature tied to test mode, that just seems to be a leftover artifact from the early days. The rest of the application was not aware of test mode. You were also not intended to set Test Mode as an environment variable anywhere, it was a variable that was only set by Pytest to load the test database specifically. We also tore the test database up and down in pytest, but it was nice having a separate application database that was more stable while you were locally testing a feature.

My preference would be to:

  • Still have a separate application and test database
  • Have pytest populate the test database
  • Have everything outside of pytest populate the application database
  • Remove setting Test mode as an environment variable anywhere except by pytest.
  • Change the single application reference to test mode to be dev mode, now the application isn't test-mode aware.

I may just not understand your suggestion.

@pattisdr
Copy link
Contributor Author

pattisdr commented Oct 1, 2022

Where I'm stuck right now, migrations are starting, and then it's going to load the default taxonomy but there's no organization's table yet. The fides db only has alembic_version, auditlog, ctl_data_categories, and fidesuser

@pattisdr
Copy link
Contributor Author

pattisdr commented Oct 2, 2022

I'm thinking we should separate this - I could have this PR just switch to logging the database details on dev mode instead of test mode. That will fix the most pressing consequence of test mode being true by default in fides when it was false it fidesops. Then we could create a separate lower priority PR of separately populating a test db or the regular db.

@ThomasLaPiana
Copy link
Contributor

@pattisdr what you suggested is exactly what I had in mind! I'm in full agreement that...

  • Pytest should handle its own database creation/teardown
  • During regular development we can populate the normal application database
  • Change the application check for test_mode to dev_mode

The only thing I'd change is the reference to a test_mode at all in pytest. We can update pytest to always use the test database url I believe?

@pattisdr
Copy link
Contributor Author

pattisdr commented Oct 3, 2022

ok great @ThomasLaPiana sounds like we're on the same page, I agree we can figure out how to get pytest to just use the test database. (Fidesops set TestMode in a pytest.ini file so pytest would use the test database url, but we can look into other ways to do this)

@pattisdr pattisdr changed the title Set Test Mode is False by Default Logging too much PII because Test Mode is True by Default Oct 3, 2022
@pattisdr pattisdr changed the title Logging too much PII because Test Mode is True by Default [Unified Fides] Logging too much PII because Test Mode is True by Default Oct 3, 2022
@pattisdr
Copy link
Contributor Author

pattisdr commented Oct 3, 2022

@ThomasLaPiana I've split this issue into two, so this PR only changes the one application check to inspect "dev mode" instead of "test mode".

Separately we can look into populating the application database instead of the test database. I was having some issues on the fidesctl side when I was trying to get rid of test mode so we can look further into this here #1175

@pattisdr pattisdr marked this pull request as ready for review October 3, 2022 14:08
@pattisdr pattisdr requested a review from a team October 3, 2022 14:08
@pattisdr pattisdr linked an issue Oct 3, 2022 that may be closed by this pull request
@ThomasLaPiana
Copy link
Contributor

@pattisdr excellent! I'm happy to look into the fidesctl one as well, since the complexity there is probably mostly my fault 😂

@pattisdr
Copy link
Contributor Author

pattisdr commented Oct 3, 2022

thank you Thomas!

@pattisdr pattisdr merged commit 167ea70 into unified-fides-2 Oct 3, 2022
@pattisdr pattisdr deleted the fides_1143_test_mode branch October 3, 2022 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Unified Fides] Logging PII during Privacy Request Execution
2 participants