[Site Request] uploadir.com #3162

AlttiRi · 2022-11-05T02:10:31Z

~~First, I need to say:~~

~~# Uploadir is shutting down 30th of December 2022~~

https://uploadir.com/

So, it's the time to download a content from this site, if you did not download something yet.

In fact it's the main reason why I have created this issue here.
Just to warn the people that lurks here about this.

OK, about the site:

The most of URLs are direct to a file:

URL to an archive has a web page:

A short archive URL redirects to a long version:
https://uploadir.com/u/5ps2qm6z
->
https://uploadir.com/uploads/5ps2qm6z/downloads/new

A long non-archive URL redirects to a /uploads/ version:
https://uploadir.com/uploads/rd3t46ry/downloads/new
->
https://uploadir.com/uploads/rd3t46ry

Some users may incidentally post a URL looks like this
https://uploadir.com/user/uploads/rd3t46ry
it should be fixed by replacing /user/uploads/ with /u/.

The original filename is available in Content-Disposition header.
The Last-Modified header is also present.

The text was updated successfully, but these errors were encountered:

mikf · 2022-11-05T13:37:13Z

Support added with commit ccb80f1.

Last-Modified headers are only present for GET requests, so they can't be retrieved before the actual download and metadata from HTTP headers during downloads is not yet supported.

AlttiRi · 2022-11-05T16:39:39Z

For example, https://uploadir.com/uploads/gxe8ti9v/downloads/new:

On the download button click it does the request on:
https://uploadir.com/uploads/gxe8ti9v/downloads (Status Code: 302)
that redirects to
https://uploadir.com/uploads/gxe8ti9v
It has Last-Modified header.

let token = document.querySelector(`form > input[name="authenticity_token"]`).value;
let resp = await fetch("https://uploadir.com/uploads/gxe8ti9v/downloads", {
  "headers": {
    "content-type": "application/x-www-form-urlencoded",
  },
  "body": "authenticity_token=" + encodeURIComponent(token) + "&upload_id=gxe8ti9v&commit=Download",
  "method": "POST",
});
console.log(resp.headers.get("content-disposition"));
console.log(resp.headers.get("last-modified"));
await resp.blob();

// attachment; filename="NYAN-Mods-Pack#1.zip"; filename*=UTF-8''NYAN-Mods-Pack#1.zip
// Tue, 04 Jan 2022 14:12:48 GMT
// Blob {size: 3405665, type: "application/zip"}

Technically there is no difference between GET and POST requests in measure of taking the headers.
HTTP response is always headers first and only then (optionally) a body.

It have checked it now: it correctly sets file's mtime, but I can't use date in filename:

        "uploadir": {
            "filename": "[{category}] {date:%Y.%m.%d}—{id|_http_data[upload_id]}—{filename}.{extension}"
        },

uploadir: FilenameFormatError: Applying filename format string failed (TypeError: unsupported format string passed to NoneType.__format__)

The error is present for both archives and images/videos.

Images/videos additionally do not have id key (_http_data[upload_id]).

I think id is the most important key.

It's required for the download history.
As well as for using it in a filename to prevent the wrong file skipping when different files (upload id) have the same filename from content-disposition header.
The current default filename patter has this issue.

- prevent extra HTTP request from redirects - add 'id' metadata field - set 'filename_fmt' and 'archive_fmt'

AlttiRi · 2022-11-05T17:32:00Z

Also gallery-dl should prefer to use filename*=UTF-8'' value (if possible) when parsing a filename from the content-disposition header.

https://uploadir.com/u/fllda6xl

produces ______image_.bmp while it should be _圖片_🖼_image_.png

(bmp? Well, acceptable. It's possibly the real format.)

Since:
content-disposition: inline; filename="_%3F%3F_%3F_image_.png"; filename*=UTF-8''_%E5%9C%96%E7%89%87_%F0%9F%96%BC_image_.png

(Usually filename= is a ByteString (UTF8 bytes as Latin1 characters within a string), but not in this case. Anyway using of filename*=UTF-8'' is should be preferred since it always can be decoded without problems.)

AlttiRi · 2022-11-05T17:40:30Z

Here is that I use when parsing Content-Disposition:

/**
 * @param {string} header
 * @param {Object} opts
 * @param {Boolean} [opts.decode=false] `filename=` in rare cases may be also encoded as URIComponent
 * @param {Boolean} [opts.isBinary=true] The header usually is a binary string.
 * @return {string}
 */
export function getContentDispositionFilename(header, opts = {}) {
    const {
        decode, isBinary
    } = Object.assign({decode: false, isBinary: true}, opts);

    if (isBinary) {
        header = binaryStringToString(header);
    }

    // RFC 5987:
    //     [1] inline; filename="file.jpg"; filename*=UTF-8''file.jpg
    const encodedFilename = header.match(/(?<=filename\*=UTF-8'')[^;]+(?=;?$)/)?.[0]; // [1]
    if (encodedFilename) {
        return decodeURIComponent(encodedFilename);
    }

    // Quoted:
    //     [2] inline; filename="file.jpg"
    // Without quotes:
    //     [3] attachment; filename=file.jpg
    let filename = header.match(/(?<=filename=").+(?="$)/)?.[0] // [2]
                || header.match(/(?<=filename=).+$/)[0];        // [3]
    if (decode) {
        return decodeURIComponent(filename);
    }
    return filename;
}

function binaryStringToString(bString) {
    return new TextDecoder().decode(binaryStringToArrayBuffer(bString));
}
function binaryStringToArrayBuffer(binaryString) {
    const u8Array = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
        u8Array[i] = binaryString.charCodeAt(i);
    }
    return u8Array;
}

AlttiRi · 2022-11-05T17:57:02Z

After b7a83ac date is still not present while filename does.
However, both things may be taken from HTTP headers.

Storing date in a filename is the really convenient thing since it's always visible, no need to check file's properties, or enable a table view to check file's mtime. It persists after sending via Internet, or after extracting an archive.

mikf · 2022-11-05T18:05:54Z

However, both things may be taken from HTTP headers.

Not entirely true. In #3162 (comment) I meant "Last-Modified headers are not present for HEAD requests, but only when downloading a file". The data extraction code cannot send a GET/POST/whatever since that would download a file twice.

To compensate for that, I added an http-metadata option in 870e6a4, which allows using HTTP header data in filenames / as metadata, but it is very fiddly at the moment.

For example

{
    "http-metadata": "http",
    "filename": "{http[date]:?//%Y%m%d}"
}

edit: This data is only available during/after a file download. A skipped file, for example, does not have an http field.

AlttiRi · 2022-11-05T18:23:15Z

let resp = await fetch("https://uploadir.com/u/3v5j70zd", {
    method: "head"
});
console.log(resp.headers.get("content-disposition"));
console.log(resp.headers.get("last-modified"));
await resp.blob();

// inline; filename="tifa.mp4"; filename*=UTF-8''tifa.mp4
// Wed, 10 Aug 2022 14:02:05 GMT
// Blob {size: 0, type: "video/mp4"}

Maybe I test something wrong, but last-modified is here even with HEAD request.

BTW, the other approach is to start to download a file, then on the "headers are received" event generate a filename (that uses the information from the headers), then if the file is already downloaded just abort the request.

This workaround works fine.

        "uploadir": {
            "filename": "[{category}] {http[date]:?//%Y.%m.%d}—{id}—{filename}.{extension}",
            "http-metadata": "http"
        },

or at least attempt to.

AlttiRi · 2023-01-25T19:31:18Z

Fortunately, the site has not been closed.

mikf added the site:support label Nov 5, 2022

mikf added a commit that referenced this issue Nov 5, 2022

[uploadir] add support for 'uploadir.com' (#3162)

ccb80f1

mikf added a commit that referenced this issue Nov 5, 2022

[uploadir] update (#3162)

b7a83ac

- prevent extra HTTP request from redirects - add 'id' metadata field - set 'filename_fmt' and 'archive_fmt'

mikf added a commit that referenced this issue Nov 5, 2022

[uploadir] use utf-8 filenames (#3162)

93e6bd6

mikf referenced this issue Nov 11, 2022

implement 'http-metadata' option

870e6a4

or at least attempt to.

mikf closed this as completed Nov 20, 2022

AlttiRi changed the title ~~[Site Request] uploadir.com (the site is closing)~~ [Site Request] uploadir.com ~(the site is closing)~ Jan 25, 2023

AlttiRi changed the title ~~[Site Request] uploadir.com ~(the site is closing)~~~ [Site Request] uploadir.com Jan 25, 2023

AlttiRi mentioned this issue Jan 25, 2023

[catbox] Direct links #3570

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Site Request] uploadir.com #3162

[Site Request] uploadir.com #3162

AlttiRi commented Nov 5, 2022 •

edited

Loading

mikf commented Nov 5, 2022

AlttiRi commented Nov 5, 2022 •

edited

Loading

AlttiRi commented Nov 5, 2022 •

edited

Loading

AlttiRi commented Nov 5, 2022 •

edited

Loading

AlttiRi commented Nov 5, 2022 •

edited

Loading

mikf commented Nov 5, 2022 •

edited

Loading

AlttiRi commented Nov 5, 2022 •

edited

Loading

AlttiRi commented Jan 25, 2023

[Site Request] uploadir.com #3162

[Site Request] uploadir.com #3162

Comments

AlttiRi commented Nov 5, 2022 • edited Loading

mikf commented Nov 5, 2022

AlttiRi commented Nov 5, 2022 • edited Loading

AlttiRi commented Nov 5, 2022 • edited Loading

AlttiRi commented Nov 5, 2022 • edited Loading

AlttiRi commented Nov 5, 2022 • edited Loading

mikf commented Nov 5, 2022 • edited Loading

AlttiRi commented Nov 5, 2022 • edited Loading

AlttiRi commented Jan 25, 2023

AlttiRi commented Nov 5, 2022 •

edited

Loading

AlttiRi commented Nov 5, 2022 •

edited

Loading

AlttiRi commented Nov 5, 2022 •

edited

Loading

AlttiRi commented Nov 5, 2022 •

edited

Loading

AlttiRi commented Nov 5, 2022 •

edited

Loading

mikf commented Nov 5, 2022 •

edited

Loading

AlttiRi commented Nov 5, 2022 •

edited

Loading