What is the purpose of checksum and what can I expect #1509
-
Hi there, very cool project and I am very happy I stumbled across it in the very right time. I am implementing just another synchronization app and the fsspec already provides access to every file storage I could think of. Another benefit - at least for 10 secs - is that fsspec already provides a checksum method which might come handy when comparing files for different storage backends.... You might already guess my question: Why does the checksum behave completely different for different filesystem implementations? I would assume that this function should follow a base specification on all filesystems which makes it somehow interchangeable, but it seems not. Is this expected? Or do you see a chance to update the behaviors to follow a common scheme? For me this behavior is not a show stopper, I will just use another hashing algorithm. And again, thanks for this awesome library. BR ladi |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
fsspec does not do any checksumming of its own. The method In addition, some backends verify checksums on write, in which case fsspec does calculate the bytes as they are written. |
Beta Was this translation helpful? Give feedback.
fsspec does not do any checksumming of its own. The method
checksum
could perhaps be better named "UID" or similar: it is a value based on whatever information the target storage provides for the path in question. That information might include read checksums (e.g., on S3 or GCS), but some will not (e.g., local filesystem); it also might return different information every time (e.g., HTTP responses).In addition, some backends verify checksums on write, in which case fsspec does calculate the bytes as they are written.