Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transferManager.downloadManyFiles not writing files locally #2200

Closed
hochoy opened this issue May 18, 2023 · 7 comments · Fixed by #2199
Closed

transferManager.downloadManyFiles not writing files locally #2200

hochoy opened this issue May 18, 2023 · 7 comments · Fixed by #2199
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@hochoy
Copy link
Contributor

hochoy commented May 18, 2023

Issue summary

I hesitated to split this into 2 issues because they are rather tightly coupled

  1. Issue 1: When prefix is not set, transferManager.downloadManyFiles does not actually write files locally. This is because destination is undefined.
  2. Related issue 2: When prefix is set, file.download(...) attempts to write the file, but is unable to recursively create directory structure on local machine.

Proposed draft PR

#2199

Standard questions

  1. Is this a client library issue or a product issue? - client library
  2. Did someone already solve this? - it hasn't been asked yet in the Issue or Stack overflow
  3. Do you have a support contract? - At work yes, but this is a personal contribution outside work

Environment details

  • OS: macOS Catalina
  • Node.js version: v16.15.0
  • npm version: 8.5.5
  • @google-cloud/storage version: 6.10.1

Steps to reproduce

Issue 1:

  1. Implement code example for 'downloadManyFiles' without prefix
async function downloadDirectory({ bucketId, sourceDirectory }) {
  const storage = new Storage();
  const transferManager = new TransferManager(storage.bucket(bucketId));
  const options = {};

  async function downloadFolderWithTransferManager() {
    await transferManager.downloadManyFiles(sourceDirectory, options);
  }
  const res = await downloadFolderWithTransferManager();
}
  1. Running the function returns no error. However, no files are written locally, and a well-placed console.warn shows that this is because the destination is undefined. The draft PR contains a suggestion to always generate the destination.

Screen Shot 2023-05-18 at 12 53 20 AM

Issue 2:

  1. Implement same example for 'downloadManyFiles', except we now set prefix
async function downloadDirectory({ bucketId, sourceDirectory }) {
  const storage = new Storage();
  const transferManager = new TransferManager(storage.bucket(bucketId));

  const options = {
    prefix: "hello",
  };
  async function downloadFolderWithTransferManager() {
    await transferManager.downloadManyFiles(sourceDirectory, options);
  }
  const res = await downloadFolderWithTransferManager();
}
  1. Running the function returns
ENOENT: no such file or directory, open 'hello/test-folder/'

Screen Shot 2023-05-18 at 12 38 18 AM

This is due to the file stream at

nodejs-storage/src/file.ts

Lines 2108 to 2111 in fe9e3a4

fileStream
.pipe(writable)
.on('error', callback)
.on('finish', callback);
attempting to save a directory object locally. It appears that google storage api returns empty directory objects according to google docs here https://cloud.google.com/knowledge/kb/listing-only-objects-using-gsutil-and-not-directories-000004568. The draft PR proposes to filter out these empty directory objects from the final write.

Thanks, it is my first issue and fork of the repo! Open to feedback. I've already signed the CLA, but kept the PR as draft until the codeowner reviews the issue/intention of the existing design.

@hochoy hochoy added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels May 18, 2023
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/nodejs-storage API. label May 18, 2023
@ddelgrosso1
Copy link
Contributor

Hi @hochoy thanks for opening this issue! We have been interested in hearing feedback about TransferManager since we initially implemented it late last year and yours is the first we have received.

Regarding the first issue you raised about nothing being written when there is no prefix this was the intended design (and not saying it is necessarily the best). As you noted / saw under the hood TransferManager utilizes a call to file.download. When file.download is called without a destination it is designed to download the file into memory. Our intention was to mimic this behavior in TransferManager but looking at it now I can see how this isn't clear. I also wonder about the use case of downloading many files into memory. I am apt to agree with you in removing the conditional check and defaulting to the file name if nothing else is provided.

As for the second issue this is a good catch and I think the fix looks good. I will just do some local testing to verify.

Please feel free to take your PR out of draft state if you feel it is ready for review and I will comment there accordingly. We much appreciate the contribution.

@ddelgrosso1 ddelgrosso1 self-assigned this May 18, 2023
@hochoy
Copy link
Contributor Author

hochoy commented May 18, 2023

Thank you, did not expect such a prompt reply. I'll take it out of draft. - done ✅

Regarding your first issue comment, I had suspected that might have been the intent. Writing files to the local machine can often be undesirable to the user of the TransferManager. Especially if they are running code on ephemeral virtual environments.

However, I'm not sure how I can access the files downloaded in memory? The underlying files.download() expects a second cb argument, but it seems it is not accessible from TransferManager.downloadManyFiles(). passThroughOptions seems to only flow to the first argument for files.download().

EDITED: Correction, it looks like I could just const data = await transferManager.downloadManyFiles(...) and then parse the nested array of buffers. I got confused with the callbacks, until I noticed that this library uses promisifyAll.

@ddelgrosso1 ddelgrosso1 added priority: p3 Desirable enhancement or fix. May not be included in next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jul 20, 2023
@andrewsg andrewsg added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Oct 7, 2023
@stabback
Copy link

stabback commented Mar 4, 2024

Hey there, any movement on this? As is, it doesn't look like you can use downloadManyFiles to download files to the local filesystem.

@convers39
Copy link

convers39 commented Mar 5, 2024

This API is confusing. The response object gives you Buffer. Even if you can write the Buffer into a local file, you still need the filename at least (or other metadata) about the file object, not just the Buffer, and it seems no way to retrieve that metadata alone through this API. The passthroughOptions.destination seems can not define a target directory, instead it will create a file.

      const localPath = path.resolve(process.cwd(), `dest`);
      
      const downloadRes = await transferManager.downloadManyFiles('active', {
        // passthroughOptions: {
        //   destination: localPath, 
        // },
      });

      if (Array.isArray(downloadRes)) {
        downloadRes.map((res, idx) => {
          writeFileSync(path.resolve(localPath, `file-${idx}`), res[0]);
        });
      }

The documentation and code example do not help much. For those who want to download multiple files, I would suggest the humble download API instead of the transfer manager.

      const [files] = await storage.bucket(bucketName).getFiles({ prefix: 'folder' });

      console.log('files', files);
      // make directories
      files
        .filter((f) => f.name.endsWith('/'))
        .forEach((dir) => {
          const folder = path.resolve(localPath, dir.name);
          if (!existsSync(folder)) {
            mkdirSync(folder, { recursive: true });
          }
        });
      // download files
      const res = await Promise.all(
        files
          .filter((f) => !f.name.endsWith('/'))
          .map(async (f) => {
            const destination = path.resolve(localPath, f.name);
            await client.file(f.name).download({ destination });
          }),
      );

Concerns

  • not sure if this will hit the concurrent download limit/quota(if there is any), but it would be better to limit the concurrent requests anyway.
  • nested folders need extra handling, the download method will not create the directory for nested folders in the prefix folder(if you inspect the files from getFiles, the dir will also be listed, and we have to create directories for all file which have a name ending with /.)
  • dealing with 1 single file would be much easier, trying to zip all files into a single file would be a better choice if possible, for example, the files should be uploaded in zip and also downloaded in zip, only if there is no need for accessing individual files.

In Python sdk there is a similar API called download_many_to_path but it does download files into a local directory, I think this is what we need.

https://cloud.google.com/storage/docs/samples/storage-transfer-manager-download-bucket

@ddelgrosso1
Copy link
Contributor

Hi @stabback and @convers39 I will circle back on this and try and pick up where the previous PR left off. I will update accordingly.

@Snugug
Copy link

Snugug commented Jul 19, 2024

Hey @ddelgrosso1, sorry for commenting on a closed issue, but I'm running into the same issue in @google-cloud/storage v7.12.0. Strangely enough, though, it's not consistently failing.

I've got two folders, each with subfolders, in storage, with files in each. I'm calling await downloadFiles(''); and the first folder that's downloaded, it's subfolder is downloaded as expected. The second folder all of its immediate children are properly downloaded, but the moment it tries to write the first file to its subfolder, it fails with "no such file or directory" and an ENOENT error.

Update

I wound up deleting the parent folder in Storage of that was causing download errors and recreating it, and downloading works now as expected. I'm not sure if the problem was Storage-side or client-side.

@ddelgrosso1
Copy link
Contributor

@Snugug if you hit the same problem again would you mind opening a new issue so that I can take a look / resolve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants