Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weibo download incomplete #4168

Open
kylieeeeeee opened this issue Jun 11, 2023 · 8 comments
Open

Weibo download incomplete #4168

kylieeeeeee opened this issue Jun 11, 2023 · 8 comments
Labels

Comments

@kylieeeeeee
Copy link

kylieeeeeee commented Jun 11, 2023

There's no any error occurred, it just stopped at some point of time.
e.g. First post from this account is at 2016/10/21, but it stopped downloading at 2017/05/17 without any notice.

...
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_1.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_2.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_3.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_4.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_5.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_6.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_7.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_8.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_9.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-13 13_33_14_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-13 13_33_14_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-13 13_33_14_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-13 13_33_14_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-13 13_33_14_5.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-13 13_33_14_6.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-12 08_33_02_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_5.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_6.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_7.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_8.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 10_33_13_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 10_33_13_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 10_33_13_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 10_33_13_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-06 09_33_15_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-05 05_36_44_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-03 15_47_00_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-03 12_33_14_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-03 12_33_14_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 15_58_21_1.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_5.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_6.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_7.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_8.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_9.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 11_33_07_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-01 12_33_07_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-01 12_33_07_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-01 12_33_07_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-01 06_33_08_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-30 10_34_03_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-30 10_34_03_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-30 10_34_03_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-30 10_34_03_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 11_33_17_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 11_33_17_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 11_33_17_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 11_33_17_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_1.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_2.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_3.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_4.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_5.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_6.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_7.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_8.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_9.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-23 14_33_17_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-23 14_33_17_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 11_33_13_1.png
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_5.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_6.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_7.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_8.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_9.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-20 11_33_05_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-19 11_33_05_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-19 11_33_05_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-19 11_33_05_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-19 11_33_05_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-19 10_33_06_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-17 07_33_28_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-17 07_33_28_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-17 07_33_28_3.jpg
PS C:\Users\User>
@mikf mikf added the site:bug label Jun 13, 2023
@mikf
Copy link
Owner

mikf commented Jun 13, 2023

I only got till 2022-07-20 07:54:16 before it stopped with this account.

It appears that Weibo sometimes sends an empty response even though it shouldn't and gallery-dl currently interprets that as the end of the timeline. Retrying the same request a couple of times should help.

@kylieeeeeee
Copy link
Author

Been trying for about ten times, didn't work at all

@mikf
Copy link
Owner

mikf commented Jun 15, 2023

I wasn't talking about you retrying, but the gallery-dl code ...

Anyway, it seems that there is a hard limit with how far back weibo allows one to go, at least on the timeline for tabtype=feed.

Post 4108417012636584 from 2017/05/17 appears to be the last accessible one using this API endpoint. https://weibo.com/ajax/statuses/mymblog?uid=6019229199&feature=0&since_id=4108417012636584

You could try all the other tabtype=… timelines (home, video, album).
Maybe some of them go further back than feed.


How did you get a link to this user's first post?

@kylieeeeeee
Copy link
Author

kylieeeeeee commented Jun 20, 2023

It did work with trying different tabtypes, but still not able to trace back to their first post.

You can search posts of this account with filter by time on their home page, and there's a limit where you can scroll down, which is the time when this account was created.Screenshot_20230620_114948_Weibo.png

@YuanGYao
Copy link

YuanGYao commented Mar 8, 2024

Hello, I also had this problem when using gallery-dl to download image from weibo.
I checked your code and it seems that you stopped the loop when the list returned by Weibo is empty.
Is my understanding correct?

Update:
I tested on the web page of Weibo and found that Weibo sometimes returns an empty list when there are still images that have not been loaded, but at the same time the since_id is still a normal value.
Something like this:

{
  "data": {
    "since_id": "4839329857536226_4839738068701694|1034:4839735135502387_20221130_-1",
    "list": []
  },
  "bottom_tips_visible": false,
  "bottom_tips_text": "",
  "ok": 1
}

If you send the request again with the since_id, Weibo will still return the remaining data.

I tested other accounts, and when the Weibo image is actually loaded, the list it returns is not necessarily empty, but its since_id is 0.
Image actually loaded:

{
  "data": {
    "since_id": 0,
    "list": [
      {
        "pid": "008rbDxXgy1h4nz35ottoj31900u0wq4",
        "mid": "4796635823476885",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:34ae58ca49fcd09a44397045013a39d9",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h4o043fk2pj30ku13in3h",
        "mid": "4796635823476885",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:2b8a854c07aae880e1dbca34d3928468",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h3ydky1a9kj30wn0dwjwd",
        "mid": "4788655920776472",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:97587a8e4b8490382bcb1d0a9b53c6a1",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h3nr6ejgzej31hc0u0tln",
        "mid": "4785303568780117",
        "is_paid": false,
        "timeline_month": "06",
        "timeline_year": "2022",
        "object_id": "1042018:e5baff22a86e619c35bb138e7f548fce",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h3i6yvb03cj30wn0dwdoq",
        "mid": "4784730057213232",
        "is_paid": false,
        "timeline_month": "06",
        "timeline_year": "2022",
        "object_id": "1042018:9940fc72b75958fa243be500d480ab46",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h3hyyjl2rcj30wn0dwdoq",
        "mid": "4783491886089040",
        "is_paid": false,
        "timeline_month": "06",
        "timeline_year": "2022",
        "object_id": "1042018:196d01f2a5af1c30a730dce1806c0b0a",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h3a4h0312wj30wn0dwdoq",
        "mid": "4781009785325408",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:a978607db0912e53155a9515a8e3637d",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h2turdmfznj31hc0u0kj1",
        "mid": "4775883259250379",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:363552624947aa7a83d8f787464cfdf2",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h2turg76ctj31hc0u0np2",
        "mid": "4775883259250379",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:a839fd918faa4c3564c3651b3b6355f2",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h2ktizbfmnj31hc0u0kjl",
        "mid": "4773078053162341",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:0c2aee6ac109547117ecd7a73f481d22",
        "type": "pic"
      }
    ]
  },
  "bottom_tips_visible": false,
  "bottom_tips_text": "",
  "ok": 1
}

The above situation is what I tested on the album page of Weibo.

Therefore, I think that we cannot stop sending requests based on whether the list is empty alone, and the value of since_id should be considered.

mikf added a commit that referenced this issue Mar 14, 2024
don't automatically stop when receiving an empty status list

shouldn't improve 'tabtype=feed' results, but at least 'tabtype=album'
ones and others using cursors won't end prematurely
@mikf
Copy link
Owner

mikf commented Mar 14, 2024

@YuanGYao should be fixed in 5158cbb, at least for album pages.

@YuanGYao
Copy link

@mikf I now install gallery-dl through scoop. If I clone this repository to get the latest code, how do I run it?

@Hrxn
Copy link
Contributor

Hrxn commented Mar 14, 2024

@YuanGYao python.exe C:\Path\to\gallery-dl-master\gallery_dl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants