-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the possibility to download files and keep them in a variable #323
Add the possibility to download files and keep them in a variable #323
Conversation
e19055a
to
a280783
Compare
I just pushed a replacement commit that went with the method suggested in the issue #322 instead (using a Note that I had to use a |
Thank you for your changes @TytoCapensis , I'll find time to review them. But by just looking at your comment I might argue with the type ignore as I'm afraid mypy is right in this case, BytesIO is already an open buffer ready for write and read so you shouldn't call open on it, just write. Can you double check? |
This should be right, because there are two cases: either the argument is a if type(download_path) is BytesIO:
for chunk in response.iter_content(chunk_size=4096):
download_path.write(chunk)
return download_path
else:
with open(download_path, "wb") as download_fp: # type: ignore
for chunk in response.iter_content(chunk_size=4096):
download_fp.write(chunk) But I am guessing mypy does not realize that, and raises a warning because it sees |
That's because you're using: if type(download_path) is BytesIO Which is not recognized by mypy properly, a more conventional method (which will also get rid of the type hinting warning) is the below: if isinstance(download_path, BytesIO):
...
else:
... Can you check with this? |
Also it might make sense to create a module e.g. in types/_common.py: from io import BytesIO
from os import PathLike
PathOrBuffer = str | PathLike | BytesIO And then just reuse this across the endpoints and the session modules. What do you think? |
I've reworked the integration test setup. Can you please rebase on the latest main branch to check how it works out? |
By the way, now that I'm taking a second look at this request, what are we really trying to solve here? |
You mean creating a from thehive4py.types.case import PathOrBuffer We could also import
I would not say that using a memory buffer is really mandatory (except if we are in a specific environment, like a read-only filesystem): it would be more about optimization and adaptation. Most API libraries include similar possibilities. I will address the other points (rebase, type checking...) soon. |
a280783
to
d76a517
Compare
Changes made! I also had to create |
I disagree with the above, this is the perfect usecase for the import os
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
tmp_path = os.path.join(tmpdir, "my-attachment.path")
hive.alert.download_attachment(alert_id="~1234", download_path=tmp_path)
# then one can upload based on a filepath
other_service.upload_attachment(tmp_path)
# or can open it for read and pass it as a buffer
with open(tmp_path, "rb") as tmp_buffer:
other_service.upload_attachment(tmp_buffer) With this you don't have to care about cleanup or name conflicts at all. Might not be as elegant as reading and writing to a buffer at the same time, but on the other hand much easier to understand in my opinion. |
That is an interesting solution – I was not aware of the However, I still think the library should offer a possibility for people to keep the file in a variable, instead of forcibly download it to the filesystem. Most API libraries I know in the Python ecosystem only give the raw bytes, and prefer to leave to the user the filesaving part (and some libs only return an URL that has to be requested). In the case of TheHive, the potentially large files we could encounter justifies the need for the filesaving to be handled by the download function (to not fill up memory too much), but not to the price of removing the other possibility in my opinion. What I was originally thinking about was to be able to not supply a file path, to get the file as raw bytes: f = thehive4py.case.download_attachment(case_id="myid", attachment_id="my_id")
print(type(f))
# <class 'bytes'> And by specifying a download path, we get the file on the filesystem: thehive4py.case.download_attachment(case_id="myid", attachment_id="my_id", attachment_path="/tmp/myfile.txt") However, That being said, using the |
This is what I mean when I say you don't need to return anything: from io import BytesIO
from thehive4py.client import TheHiveApi
hive = TheHiveApi(
url="http://localhost:9000", username="[email protected]", password="secret"
)
my_case = hive.case.create({"title": "...", "description": "..."})
attachment_path = "my-attachment.txt"
attachment_content = b"abcdef"
with open("my-attachment.txt", "wb") as attachment_fp:
attachment_fp.write(attachment_content)
case_attachments = hive.case.add_attachment(
case_id=my_case["_id"], attachment_paths=[attachment_path]
)
# this is the buffer that you pass to the call and then reuse it
download_buffer = BytesIO()
hive.case.download_attachment(
case_id=my_case["_id"],
attachment_id=case_attachments[0]["_id"],
attachment_path=download_buffer,
)
hive.case.delete(case_id=my_case["_id"])
assert download_buffer.getvalue() == attachment_content As you can see, you just create the case, upload the attachment, and then initialize your buffer and pass it to the call. Then after the lib wrote the content to your buffer you are free to use it for whatever you want. I understand your point, but I'm still not convinced that we need to complicate the lib internals with something that would be used in 20% of the usecases, while 80% of the people will eventually save the downloads to disk or use temp files. And you know the saying in software: no is temporary but yes is forever :) |
Let's leave this for another day. |
Related to issue #322
As today, all functions related to download of files take a file path as an argument, and download themselves the file on the filesystem (by using the functions in the
session.py
module)This can cause issues because:
This pull request:
False
as a "filepath" to all file download functions, to tell that we do not want the file to be written on the filesystem, but instead returned by the function so we can store it as a variableFalse
as a default for all file download functions (currently there is no default, the user has to provide a file path regardless).