使用进程池来爬虫能有效避免ip被封。
调用方式:
proxies = getip()
requests.post('https://'),headers = header1, proxies=proxies)
函数:
ips = []
for q in range(1, 50):
url = 'http://www.66ip.cn/' + str(q) + '.html'
html = requests.get(url, verify=False, headers=header1).content
selector = etree.HTML(html)
content_field = selector.xpath('//table[@width="100%"]')
n = 0
ip = ''
for i in content_field[0].xpath('//td')[7:]:
if n % 5 == 0:
ip = i.text
if n % 5 == 1:
ip += ':' + i.text
ips.append(ip)
ip = ''
n += 1
def getip():
while 1:
ipprot = ips[random.randint(0, len(ips) - 1)]
url = 'https://www.baidu.com'
proxies = {"http": "http://" + ipprot}
request = requests.get(url, proxies=proxies, verify=False)
if request.status_code == 200:
return {"http": "http://" + ipprot}
else:
ips.remove(ipprot)
文章来源互联网,如有侵权,请联系管理员删除。邮箱:417803890@qq.com / QQ:417803890
Python Free
邮箱:417803890@qq.com
QQ:417803890
皖ICP备19001818号
© 2019 copyright www.pythonf.cn - All rights reserved
微信扫一扫关注公众号:
Python Free