Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Site Request] uploadir.com #3162

Closed
AlttiRi opened this issue Nov 5, 2022 · 8 comments
Closed

[Site Request] uploadir.com #3162

AlttiRi opened this issue Nov 5, 2022 · 8 comments

Comments

@AlttiRi
Copy link

AlttiRi commented Nov 5, 2022

First, I need to say:

# Uploadir is shutting down 30th of December 2022

https://uploadir.com/

So, it's the time to download a content from this site, if you did not download something yet.

In fact it's the main reason why I have created this issue here.
Just to warn the people that lurks here about this.


OK, about the site:

The most of URLs are direct to a file:

URL to an archive has a web page:


A short archive URL redirects to a long version:
https://uploadir.com/u/5ps2qm6z
->
https://uploadir.com/uploads/5ps2qm6z/downloads/new

A long non-archive URL redirects to a /uploads/ version:
https://uploadir.com/uploads/rd3t46ry/downloads/new
->
https://uploadir.com/uploads/rd3t46ry

Some users may incidentally post a URL looks like this
https://uploadir.com/user/uploads/rd3t46ry
it should be fixed by replacing /user/uploads/ with /u/.


The original filename is available in Content-Disposition header.
The Last-Modified header is also present.

@mikf
Copy link
Owner

mikf commented Nov 5, 2022

Support added with commit ccb80f1.

Last-Modified headers are only present for GET requests, so they can't be retrieved before the actual download and metadata from HTTP headers during downloads is not yet supported.

@AlttiRi
Copy link
Author

AlttiRi commented Nov 5, 2022

For example, https://uploadir.com/uploads/gxe8ti9v/downloads/new:

On the download button click it does the request on:
https://uploadir.com/uploads/gxe8ti9v/downloads (Status Code: 302)
that redirects to
https://uploadir.com/uploads/gxe8ti9v
It has Last-Modified header.

let token = document.querySelector(`form > input[name="authenticity_token"]`).value;
let resp = await fetch("https://uploadir.com/uploads/gxe8ti9v/downloads", {
  "headers": {
    "content-type": "application/x-www-form-urlencoded",
  },
  "body": "authenticity_token=" + encodeURIComponent(token) + "&upload_id=gxe8ti9v&commit=Download",
  "method": "POST",
});
console.log(resp.headers.get("content-disposition"));
console.log(resp.headers.get("last-modified"));
await resp.blob();

// attachment; filename="NYAN-Mods-Pack#1.zip"; filename*=UTF-8''NYAN-Mods-Pack#1.zip
// Tue, 04 Jan 2022 14:12:48 GMT
// Blob {size: 3405665, type: "application/zip"}

Technically there is no difference between GET and POST requests in measure of taking the headers.
HTTP response is always headers first and only then (optionally) a body.


It have checked it now: it correctly sets file's mtime, but I can't use date in filename:

        "uploadir": {
            "filename": "[{category}] {date:%Y.%m.%d}—{id|_http_data[upload_id]}—{filename}.{extension}"
        },
uploadir: FilenameFormatError: Applying filename format string failed (TypeError: unsupported format string passed to NoneType.__format__)

The error is present for both archives and images/videos.

Images/videos additionally do not have id key (_http_data[upload_id]).

I think id is the most important key.

  • It's required for the download history.

  • As well as for using it in a filename to prevent the wrong file skipping when different files (upload id) have the same filename from content-disposition header.
    The current default filename patter has this issue.

mikf added a commit that referenced this issue Nov 5, 2022
- prevent extra HTTP request from redirects
- add 'id' metadata field
- set 'filename_fmt' and 'archive_fmt'
@AlttiRi
Copy link
Author

AlttiRi commented Nov 5, 2022

Also gallery-dl should prefer to use filename*=UTF-8'' value (if possible) when parsing a filename from the content-disposition header.

https://uploadir.com/u/fllda6xl

produces ______image_.bmp while it should be _圖片_🖼_image_.png

(bmp? Well, acceptable. It's possibly the real format.)

Since:
content-disposition: inline; filename="_%3F%3F_%3F_image_.png"; filename*=UTF-8''_%E5%9C%96%E7%89%87_%F0%9F%96%BC_image_.png

(Usually filename= is a ByteString (UTF8 bytes as Latin1 characters within a string), but not in this case. Anyway using of filename*=UTF-8'' is should be preferred since it always can be decoded without problems.)

@AlttiRi
Copy link
Author

AlttiRi commented Nov 5, 2022

Here is that I use when parsing Content-Disposition:

/**
 * @param {string} header
 * @param {Object} opts
 * @param {Boolean} [opts.decode=false] `filename=` in rare cases may be also encoded as URIComponent
 * @param {Boolean} [opts.isBinary=true] The header usually is a binary string.
 * @return {string}
 */
export function getContentDispositionFilename(header, opts = {}) {
    const {
        decode, isBinary
    } = Object.assign({decode: false, isBinary: true}, opts);

    if (isBinary) {
        header = binaryStringToString(header);
    }

    // RFC 5987:
    //     [1] inline; filename="file.jpg"; filename*=UTF-8''file.jpg
    const encodedFilename = header.match(/(?<=filename\*=UTF-8'')[^;]+(?=;?$)/)?.[0]; // [1]
    if (encodedFilename) {
        return decodeURIComponent(encodedFilename);
    }

    // Quoted:
    //     [2] inline; filename="file.jpg"
    // Without quotes:
    //     [3] attachment; filename=file.jpg
    let filename = header.match(/(?<=filename=").+(?="$)/)?.[0] // [2]
                || header.match(/(?<=filename=).+$/)[0];        // [3]
    if (decode) {
        return decodeURIComponent(filename);
    }
    return filename;
}

function binaryStringToString(bString) {
    return new TextDecoder().decode(binaryStringToArrayBuffer(bString));
}
function binaryStringToArrayBuffer(binaryString) {
    const u8Array = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
        u8Array[i] = binaryString.charCodeAt(i);
    }
    return u8Array;
}

@AlttiRi
Copy link
Author

AlttiRi commented Nov 5, 2022

After b7a83ac date is still not present while filename does.
However, both things may be taken from HTTP headers.

Storing date in a filename is the really convenient thing since it's always visible, no need to check file's properties, or enable a table view to check file's mtime. It persists after sending via Internet, or after extracting an archive.

mikf added a commit that referenced this issue Nov 5, 2022
@mikf
Copy link
Owner

mikf commented Nov 5, 2022

However, both things may be taken from HTTP headers.

Not entirely true. In #3162 (comment) I meant "Last-Modified headers are not present for HEAD requests, but only when downloading a file". The data extraction code cannot send a GET/POST/whatever since that would download a file twice.

To compensate for that, I added an http-metadata option in 870e6a4, which allows using HTTP header data in filenames / as metadata, but it is very fiddly at the moment.

For example

{
    "http-metadata": "http",
    "filename": "{http[date]:?//%Y%m%d}"
}

edit: This data is only available during/after a file download. A skipped file, for example, does not have an http field.

@AlttiRi
Copy link
Author

AlttiRi commented Nov 5, 2022

let resp = await fetch("https://uploadir.com/u/3v5j70zd", {
    method: "head"
});
console.log(resp.headers.get("content-disposition"));
console.log(resp.headers.get("last-modified"));
await resp.blob();

// inline; filename="tifa.mp4"; filename*=UTF-8''tifa.mp4
// Wed, 10 Aug 2022 14:02:05 GMT
// Blob {size: 0, type: "video/mp4"}

Maybe I test something wrong, but last-modified is here even with HEAD request.


BTW, the other approach is to start to download a file, then on the "headers are received" event generate a filename (that uses the information from the headers), then if the file is already downloaded just abort the request.


This workaround works fine.

        "uploadir": {
            "filename": "[{category}] {http[date]:?//%Y.%m.%d}—{id}—{filename}.{extension}",
            "http-metadata": "http"
        },

mikf referenced this issue Nov 11, 2022
or at least attempt to.
@mikf mikf closed this as completed Nov 20, 2022
@AlttiRi AlttiRi changed the title [Site Request] uploadir.com (the site is closing) [Site Request] uploadir.com ~(the site is closing)~ Jan 25, 2023
@AlttiRi AlttiRi changed the title [Site Request] uploadir.com ~(the site is closing)~ [Site Request] uploadir.com Jan 25, 2023
@AlttiRi
Copy link
Author

AlttiRi commented Jan 25, 2023

Fortunately, the site has not been closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants