Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error if user requests csv.gz file unless they've opted in #2092

Open
bloodearnest opened this issue Aug 29, 2024 · 1 comment
Open

Error if user requests csv.gz file unless they've opted in #2092

bloodearnest opened this issue Aug 29, 2024 · 1 comment

Comments

@bloodearnest
Copy link
Member

bloodearnest commented Aug 29, 2024

The docs, and previous experience, are leading users to still generating csv.gz outputs, even when using ehrql. This is expensive in terms of limited server CPU.

We should consider making this to be an error in ehrql, which exits with a message to say to use arrow. However, we should probably provide an --allow-csv flag to make it possible if needed.

This may need some coordination, as the docs need updating, and we need to make sure there's a good solution to viewing arrow files in local development

@evansd
Copy link
Contributor

evansd commented Sep 2, 2024

Just copying some thoughts from a Slack thread so they're salient when we next come to look at this:

I'm fully behind the goal here, but I'm not sure adding flags like this is the way to do it. For one thing, users have exhibited a strong tendency to copy/paste stuff wholesale from previous projects. If someone adds the --allow-csv flag somewhere ("just to get things working") then it can start making it's way into other projects without those users ever explicitly thinking about it.

In the first instance, I'd like to start by exploiting the fact that we have all our users' code available to us and searchable and do some regular (monthly?) check for people using CSV inappropriately.

We've also previously discussed having some more generally "opensafely lint" step which could check things like output formats (alongside lots of other stuff).

I wouldn't want to leap in to adding weird shit to the ehrql interface as our first port of call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants