Python爬虫使用进程池,python

发表时间:2020-11-16

使用进程池来爬虫能有效避免ip被封。

调用方式:

proxies = getip()
requests.post('https://'),headers = header1, proxies=proxies)

函数:

ips = []
for q in range(1, 50):
    url = 'http://www.66ip.cn/' + str(q) + '.html'
    html = requests.get(url, verify=False, headers=header1).content
    selector = etree.HTML(html)
    content_field = selector.xpath('//table[@width="100%"]')
    n = 0
    ip = ''
    for i in content_field[0].xpath('//td')[7:]:
        if n % 5 == 0:
            ip = i.text
        if n % 5 == 1:
            ip += ':' + i.text
            ips.append(ip)
            ip = ''
        n += 1



def getip():
    while 1:
        ipprot = ips[random.randint(0, len(ips) - 1)]
        url = 'https://www.baidu.com'
        proxies = {"http": "http://" + ipprot}
        request = requests.get(url, proxies=proxies, verify=False)
        if request.status_code == 200:
            return {"http": "http://" + ipprot}
        else:
            ips.remove(ipprot)

文章来源互联网,如有侵权,请联系管理员删除。邮箱:417803890@qq.com / QQ:417803890

微配音

Python Free

邮箱:417803890@qq.com
QQ:417803890

皖ICP备19001818号
© 2019 copyright www.pythonf.cn - All rights reserved

微信扫一扫关注公众号:

联系方式

Python Free