python爬虫案例

admin 2023年11月26日 04:14 32 0

Python爬虫案例：从入门到精通

一、概述

Python爬虫是使用Python编程语言编写的自动化程序，用于从互联网上抓取数据，爬虫可以用于各种用途，如数据挖掘、竞争情报、舆情监测、数字广告等，本文将通过一个具体的案例，介绍如何使用Python编写爬虫，并从网站上抓取数据。

二、案例背景

假设我们需要从某个电商网站上抓取商品信息，包括商品名称、价格、图片等，以便进行价格比较和分析，我们将使用Python的requests库和BeautifulSoup库来实现这个任务。

三、爬虫实现

1. 导入需要的库

import requests
from bs4 import BeautifulSoup

2. 定义爬虫函数

def crawler(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    return soup

3. 解析页面数据

假设我们要抓取的商品信息在网页的table标签内，可以使用BeautifulSoup库来解析页面数据，具体实现如下：

def parse_data(soup):
    table = soup.find('table', {'class': 'product-table'})
    rows = table.find_all('tr')
    for row in rows:
        cols = row.find_all('td')
        if len(cols) > 1:
            title = cols[0].text.strip()
            price = cols[1].text.strip()
            img_url = cols[2].img['src']
            yield (title, price, img_url)

4. 调用爬虫函数并输出结果

url = 'https://www.example.com/products'  # 假设这是电商网站的商品页面链接
soup = crawler(url)
for title, price, img_url in parse_data(soup):
    print(f"Title: {title}\nPrice: {price}\nImage URL: {img_url}\n")