Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LivePhoto 没有爬取到,已经添加了cookie #448

Open
hwangzhun opened this issue Jul 11, 2024 · 9 comments
Open

LivePhoto 没有爬取到,已经添加了cookie #448

hwangzhun opened this issue Jul 11, 2024 · 9 comments

Comments

@hwangzhun
Copy link
Contributor

No description provided.

@hwangzhun
Copy link
Contributor Author

2024-07-11 19:44:02,082 - ERROR - 'large'
Traceback (most recent call last):
File "C:\Users\huang\Desktop\weibo-crawler-master\weibo.py", line 884, in get_one_weibo
weibo = self.get_long_weibo(weibo_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\huang\Desktop\weibo-crawler-master\weibo.py", line 443, in get_long_weibo
weibo = self.parse_weibo(weibo_info)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\huang\Desktop\weibo-crawler-master\weibo.py", line 785, in parse_weibo
weibo["pics"] = self.get_pics(weibo_info)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\huang\Desktop\weibo-crawler-master\weibo.py", line 451, in get_pics
pic_list = [pic["large"]["url"] for pic in pic_info]
~~~^^^^^^^^^
KeyError: 'large'

看日志貌似这里有问题?

@dataabc
Copy link
Owner

dataabc commented Jul 11, 2024

看起来没有获取到large的数据,应该是和图片相关,和LivePhoto没关系。我现在无法调试,不确定什么情况,您可以更换cookie,也许它失效了,不确定。

@hwangzhun
Copy link
Contributor Author

hwangzhun commented Jul 12, 2024

试了更换 cookie,也去验证了 cookie 的有效性,还是无法将 livephoto 下载下来,我想应该不是 cookie 是问题,我尝试在源码里 get_pics 函数打印获取到的数据发现是有爬取到 livephoto链接,链接打开是有效可以看到livepohot的。(我是个小白,请大神指点一下,谢谢)

打印获取到的代码:
{
"visible": {
"type": 0,
"list_id": 0
},
"mark": "followtopweibo",
"created_at": "Mon May 24 20:26:57 +0800 2021",
"id": "4640476257583704",
"mid": "4640476257583704",
"can_edit": false,
"photoTag": [
{
"picid": "008gLOvsly1gqtsew42bvj31qz33y4qu",
"hastag": true,
"taginfo": {
"code": "100000",
"msg": "",
"data": {
"4640476262170935": {
"pic_object_id": "1042018:85eccedb7467629dcfb61725e28a599f",
"photo_id": 4640476262170935,
"mid": 4640476257583704,
"pid": "008gLOvsly1gqtsew42bvj31qz33y4qu",
"uid": 7576879598,
"pic_tags": [
{
"tag_uid": 7576879598,
"tag_id": "1022: 2315222e5145e3750ea7d718df96d397e48f3d",
"tag_name": "万达广场",
"tag_type": "search_topic",
"pos_x": "0.74931506849315",
"pos_y": "0.39719900744417",
"dir": 1,
"url": "https://s.weibo.com/pic/%23%E4%B8%87%E8%BE%BE%E5%B9%BF%E5%9C%BA%23",
"mobile_url": "sinaweibo://searchall?containerid=231522&q=%23%E4%B8%87%E8%BE%BE%E5%B9%BF%E5%9C%BA%23&isnewpage=1"
}
]
}
}
}
},
{
"picid": "008gLOvsly1gqtsb9tt57j322i2xye8b",
"hastag": true,
"taginfo": {
"code": "100000",
"msg": "",
"data": {
"4640476262433251": {
"pic_object_id": "1042018: 77c74d742c7de259e8c6176cb5532cd8",
"photo_id": 4640476262433251,
"mid": 4640476257583704,
"pid": "008gLOvsly1gqtsb9tt57j322i2xye8b",
"uid": 7576879598,
"pic_tags": [
{
"tag_uid": 7576879598,
"tag_id": "1022: 231522ab45d20b9d1f082f439923af4210ea0a",
"tag_name": "一番街",
"tag_type": "search_topic",
"pos_x": "0.027397260273973",
"pos_y": "0.30979226423294",
"dir": 2,
"url": "https://s.weibo.com/pic/%23%E4%B8%80%E7%95%AA%E8%A1%97%23",
"mobile_url": "sinaweibo://searchall?containerid=231522&q=%23%E4%B8%80%E7%95%AA%E8%A1%97%23"
}
]
}
}
}
}
],
"text": "先变成自己喜欢的样子,再去遇见无需取悦的人 <a href="https://m.weibo.cn/search?containerid=231522type%3D1%26t%3D10%26q%3D%23%E9%9A%8F%E6%89%8B%E6%8B%8D%23\" data-hide=""><span class="surl-text">#随手拍# <a href="https://m.weibo.cn/search?containerid=231522type%3D1%26t%3D10%26q%3D%23%E4%BB%8A%E5%A4%A9%E7%A9%BF%E4%BB%80%E4%B9%88%23&isnewpage=1\" data-hide=""><span class="surl-text">#今天穿什么# <a href="https://m.weibo.cn/search?containerid=231522type%3D1%26t%3D10%26q%3D%23%E5%A4%8F%E5%A4%A9%23&isnewpage=1\" data-hide=""><span class="surl-text">#夏天# <a href="http://weibo.com/p/100101B2094550D56AABFA499A\" data-hide=""><span class="surl-text">肇庆·广东理工学院鼎湖校区 ",
"textLength": 88,
"source": "iPhone客户端",
"favorited": false,
"pic_ids": [
"008gLOvsly1gqts81gy3ej324c2rungw",
"008gLOvsly1gqts8dp4fuj31pw2hr1kz",
"008gLOvsly1gqtsbcpfh8j32502upe1l",
"008gLOvsly1gqtsb9tt57j322i2xye8b",
"008gLOvsly1gqtsa9c9gqj328y31che8",
"008gLOvsly1gqtsew42bvj31qz33y4qu"
],
"thumbnail_pic": "https://wx1.sinaimg.cn/thumbnail/008gLOvsly1gqts81gy3ej324c2rungw.jpg",
"bmiddle_pic": "http://wx1.sinaimg.cn/bmiddle/008gLOvsly1gqts81gy3ej324c2rungw.jpg",
"original_pic": "https://wx1.sinaimg.cn/large/008gLOvsly1gqts81gy3ej324c2rungw.jpg",
"is_paid": false,
"mblog_vip_type": 0,
"user": {
"id": 7576879598,
"screen_name": "小Miki喵",
"profile_image_url": "https://tvax4.sinaimg.cn/crop.0.0.1080.1080.180/008gLOvsly8hqwc8k66b5j30u00u0n0q.jpg?KID=imgbed,tva&Expires=1720766130&ssig=LudkCXVJHV",
"profile_url": "https://m.weibo.cn/u/7576879598?",
"close_blue_v": false,
"description": "抖y:小MIKI",
"follow_me": false,
"following": true,
"follow_count": 103,
"followers_count": "1627",
"cover_image_phone": "https://tva1.sinaimg.cn/crop.0.0.640.640.640/549d0121tw1egm1kjly3jj20hs0hsq4f.jpg",
"avatar_hd": "https://wx4.sinaimg.cn/orj480/008gLOvsly8hqwc8k66b5j30u00u0n0q.jpg",
"badge": {
"user_name_certificate": 1,
"city_university": 19
},
"statuses_count": 358,
"verified": false,
"verified_type": -1,
"gender": "f",
"mbtype": 11,
"svip": 1,
"urank": 0,
"mbrank": 1,
"followers_count_str": "1627",
"verified_reason": "",
"like": false,
"like_me": false,
"special_follow": false
},
"can_remark": true,
"reposts_count": 0,
"comments_count": 8,
"reprint_cmt_count": 0,
"attitudes_count": 14,
"mixed_count": 0,
"pending_approval_count": 0,
"isLongText": false,
"show_mlevel": 0,
"expire_time": 1624071669,
"ad_state": 1,
"darwin_tags": [],
"ad_marked": false,
"mblogtype": 1,
"item_category": "status",
"rid": "5_0_50_162659327855539511_0_0_0",
"extern_safe": 0,
"number_display_strategy": {
"apply_scenario_flag": 19,
"display_text_min_number": 1000000,
"display_text": "100万+"
},
"content_auth": 0,
"is_show_mixed": false,
"comment_manage_info": {
"comment_permission_type": -1,
"approval_comment_type": 0,
"comment_sort_type": 0
},
"pic_num": 6,
"mlevel": 0,
"mblog_menu_new_style": 0,
"page_info": {
"type": "place",
"icon": "https://h5.sinaimg.cn/upload/2016/03/15/196/timeline_icon_location_default.png",
"page_pic": {
"url": "https://wx2.sinaimg.cn/wap180/82ef82cbly1fjihaypsj2j205k05kwg1.jpg",
"width": "88",
"height": "88"
},
"page_url": "https://m.weibo.cn/p/index?containerid=1008089b01eaf54b1a61c904fbb7e053cf64e0_-_lbs&lcardid=frompoi&extparam=frompoi",
"page_title": "肇庆·广东理工学院鼎湖校区",
"content1": "坑口金鼎路(庆云大道)",
"content2": "2801人来过 8152条微博 6185张图片"
},
"pics": [
{
"pid": "008gLOvsly1gqts81gy3ej324c2rungw",
"url": "https://wx1.sinaimg.cn/orj360/008gLOvsly1gqts81gy3ej324c2rungw.jpg",
"size": "orj360",
"geo": {
"width": 360,
"height": 470,
"croped": false
},
"large": {
"size": "large",
"url": "https://wx1.sinaimg.cn/large/008gLOvsly1gqts81gy3ej324c2rungw.jpg",
"geo": {
"width": 2048,
"height": 2678,
"croped": false
}
},
"videoSrc": "https://video.weibo.com/media/play?livephoto=https%3A%2F%2Flivephoto.us.sinaimg.cn%2F003db0tbjx07MULutRao0f0f0100dBXr0k01.mov",
"type": "livephoto"
},
{
"pid": "008gLOvsly1gqts8dp4fuj31pw2hr1kz",
"url": "https://wx3.sinaimg.cn/orj360/008gLOvsly1gqts8dp4fuj31pw2hr1kz.jpg",
"size": "orj360",
"geo": {
"width": 360,
"height": 521,
"croped": false
},
"large": {
"size": "large",
"url": "https://wx3.sinaimg.cn/large/008gLOvsly1gqts8dp4fuj31pw2hr1kz.jpg",
"geo": {
"width": 2048,
"height": 2969,
"croped": false
}
}
},
{
"pid": "008gLOvsly1gqtsbcpfh8j32502upe1l",
"url": "https://wx2.sinaimg.cn/orj360/008gLOvsly1gqtsbcpfh8j32502upe1l.jpg",
"size": "orj360",
"geo": {
"width": 360,
"height": 480,
"croped": false
},
"large": {
"size": "large",
"url": "https://wx2.sinaimg.cn/large/008gLOvsly1gqtsbcpfh8j32502upe1l.jpg",
"geo": {
"width": 2048,
"height": 2731,
"croped": false
}
},
"videoSrc": "https://video.weibo.com/media/play?livephoto=https%3A%2F%2Flivephoto.us.sinaimg.cn%2F003EY6fWjx07MULts9680f0f01008pOo0k01.mov",
"type": "livephoto"
},
{
"pid": "008gLOvsly1gqtsb9tt57j322i2xye8b",
"url": "https://wx2.sinaimg.cn/orj360/008gLOvsly1gqtsb9tt57j322i2xye8b.jpg",
"size": "orj360",
"geo": {
"width": 360,
"height": 511,
"croped": false
},
"large": {
"size": "large",
"url": "https://wx2.sinaimg.cn/large/008gLOvsly1gqtsb9tt57j322i2xye8b.jpg",
"geo": {
"width": 2048,
"height": 2912,
"croped": false
}
}
},
{
"pid": "008gLOvsly1gqtsa9c9gqj328y31che8",
"url": "https://wx1.sinaimg.cn/orj360/008gLOvsly1gqtsa9c9gqj328y31che8.jpg",
"size": "orj360",
"geo": {
"width": 360,
"height": 486,
"croped": false
},
"large": {
"size": "large",
"url": "https://wx1.sinaimg.cn/large/008gLOvsly1gqtsa9c9gqj328y31che8.jpg",
"geo": {
"width": 2048,
"height": 2766,
"croped": false
}
}
},
{
"pid": "008gLOvsly1gqtsew42bvj31qz33y4qu",
"url": "https://wx3.sinaimg.cn/orj360/008gLOvsly1gqtsew42bvj31qz33y4qu.jpg",
"size": "orj360",
"geo": {
"width": 360,
"height": 639,
"croped": false
},
"large": {
"size": "large",
"url": "https://wx3.sinaimg.cn/large/008gLOvsly1gqtsew42bvj31qz33y4qu.jpg",
"geo": {
"width": 2048,
"height": 3640,
"croped": false
}
}
}
],
"live_photo": [
"https://video.weibo.com/media/play?livephoto=https%3A%2F%2Flivephoto.us.sinaimg.cn%2F003db0tbjx07MULutRao0f0f0100dBXr0k01.mov",
"https://video.weibo.com/media/play?livephoto=https%3A%2F%2Flivephoto.us.sinaimg.cn%2F003EY6fWjx07MULts9680f0f01008pOo0k01.mov"
],
"bid": "KgYYhvORO",
"pic_list": [
"https://wx1.sinaimg.cn/large/008gLOvsly1gqts81gy3ej324c2rungw.jpg",
"https://wx3.sinaimg.cn/large/008gLOvsly1gqts8dp4fuj31pw2hr1kz.jpg",
"https://wx2.sinaimg.cn/large/008gLOvsly1gqtsbcpfh8j32502upe1l.jpg",
"https://wx2.sinaimg.cn/large/008gLOvsly1gqtsb9tt57j322i2xye8b.jpg",
"https://wx1.sinaimg.cn/large/008gLOvsly1gqtsa9c9gqj328y31che8.jpg",
"https://wx3.sinaimg.cn/large/008gLOvsly1gqtsew42bvj31qz33y4qu.jpg"
]
}
貌似这里返回的 图片地址 不对?我打开返回403

@dataabc
Copy link
Owner

dataabc commented Jul 12, 2024

看上面的内容,live photo信息在live_photo后面,应该修改weibo.py的get_live_photo方法,获取live_photo后面的内容。图片问题参考dataabc/weibo-search#473

@hwangzhun
Copy link
Contributor Author

感谢大佬。
把 get_live_photo 方法修改成这样,下载成功

    def get_live_photo(self, weibo_info):
        """获取live photo中的视频url"""
        live_photo_list = weibo_info.get("live_photo", [])
        return live_photo_list

@xiaomeng758
Copy link

是直接把整个函数都改成只有这三行吗?

@hwangzhun
Copy link
Contributor Author

是直接把整个函数都改成只有这三行吗?

def 是定义函数的不能删除,把定义函数下面的代码替换成这两行

@xiaomeng758
Copy link

ac0fa2e5e77b8443441bfecf7de0970f
这样吗

@hwangzhun
Copy link
Contributor Author

ac0fa2e5e77b8443441bfecf7de0970f 这样吗

对的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants