-
Notifications
You must be signed in to change notification settings - Fork 0
/
ATBS Web Scrap - Request .py
30 lines (20 loc) · 1.05 KB
/
ATBS Web Scrap - Request .py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#! python3
import requests
rs = requests.get('http://automatetheboringstuff.com/files/rj.txt')
# to check if link is ok in shell, only raise exception on error
rs.raise_for_status()
#this print the length of the downloaded link as a string.
print('The article is ' + str(len(rs.text)) + ' words long.')
#this print out the text itself, from 0 till the 500 character.
#print(rss.text[:500])
#To write it to a file, you can use a for loop with the iter_content() method.
#You can specify how many byte of "chunk" is written into it, 100000 is generally a good size.
pyFile = open('RJ.txt', 'wb') #wb means write binary
for chunk in rs.iter_content(100000):
pyFile.write(chunk)
pyFile.close()
'''
The for loop and iter_content() stuff may seem complicated compared to the open()/write()/close() workflow you’ve been using to write text files,
but it’s to ensure that the requests module doesn’t eat up too much memory even if you download massive files.
You can learn about the requests module’s other features from http://requests.readthedocs.org/.
'''