爬虫练习

发表于2020年3月13日由daizao

from bs4 import BeautifulSoup
import requests
import os
import shutil

headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Language": "zh-CN,zh;q=0.8",
    "Connection": "close",
    "Cookie": "_gauges_unique_hour=1; _gauges_unique_day=1; _gauges_unique_month=1; _gauges_unique_year=1; _gauges_unique=1",
    "Referer": "http://www.infoq.com",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER"
}

# url = 'https://www.infoq.com/presentations/'

def download_jpg(imageurl,image_location_path):
    resopnse = requests.get(imageurl,stream=True)
    if resopnse.status_code == 200:
        with open(image_location_path,'wb') as f:
            resopnse.raw.deconde_content = True
            shutil.copyfileobj(resopnse.raw,f)


def craw3(url):
    response = requests.get(url,headers=headers)
    soup = BeautifulSoup(response.text,'lxml')
    isExists = os.path.exists('./download_pic')
    if not isExists:
        os.mkdir('./download_pic')
    for pic_href in soup.find_all('div',class_ = 'card__content'):
        print(pic_href.find_all('img'))
        for pic in pic_href.find_all('img'):
            imgurl = pic.get('src')
            dir = os.path.abspath('./download_pic')
            filename = os.path.basename(imgurl)
            imgpath = os.path.join(dir,filename)
            print('开始下载 %s' %imgurl)
            download_jpg(imgurl,imgpath)

for i in range(12, 37, 12):
    url = 'http://www.infoq.com/cn/presentations' + str(i)
    craw3(url)

此条目发表在Python分类目录。将固定链接加入收藏夹。

发表评论取消回复

搜索：
链接表
- 陈连福的生信博客
- dzwgylab
近期文章
近期评论
- daizao发表在《TCGA与dbGaP账号申请（dbGaP申请指南）》
- zzz发表在《TCGA与dbGaP账号申请（dbGaP申请指南）》
- daizao发表在《TCGA与dbGaP账号申请（dbGaP申请指南）》
- liyuqing发表在《TCGA与dbGaP账号申请（dbGaP申请指南）》
- daizao发表在《TCGA与dbGaP账号申请（dbGaP申请指南）》
分类目录
- 3D_genome
- C
- C++
- Computer
- Container
- Endnote
- Linux
- Molecular Dynamics
- NGS_analysis
- Perl
- Python
- Quantum Chemical Calculation
- R
- TCGA
- 实验操作
- 日常记录

分类目录
- 3D_genome
- C
- C++
- Computer
- Container
- Endnote
- Linux
- Molecular Dynamics
- NGS_analysis
- Perl
- Python
- Quantum Chemical Calculation
- R
- TCGA
- 实验操作
- 日常记录