python爬虫入库操作教程

百变鹏仔 4个月前 (01-16) #Python

文章标签爬虫

Python爬虫入库操作包括：建立数据库连接；准备SQL插入语句；执行插入操作；提交事务；关闭连接。

Python 爬虫入库操作教程

引言

Python爬虫入库是指将爬取到的数据保存到数据库（如MySQL、MongoDB）中。这一步骤对于数据分析、机器学习和数据可视化等任务至关重要。本教程将分步介绍如何使用Python爬虫将数据入库。

数据库设置

立即学习“Python免费学习笔记（深入）”；

Python 爬虫设置

入库操作

1. 建立数据库连接

import mysql.connector as mysqldb = mysql.connect(    host="localhost",    user="root",    password="rootpassword",  # 替换为您的密码    database="my_database",)cursor = db.cursor()

2. 准备 SQL 插入语句

sql = "INSERT INTO my_table (field1, field2, field3) VALUES (%s, %s, %s)"

3. 执行插入操作

data = ("value1", "value2", "value3")cursor.execute(sql, data)

4. 提交事务

db.commit()

5. 关闭连接

cursor.close()db.close()

示例

以下是使用BeautifulSoup和Requests爬取网页数据并存入MySQL数据库的示例代码：

import requestsfrom bs4 import BeautifulSoupimport mysql.connector as mysql# 爬取网页数据url = "example.com"response = requests.get(url)soup = BeautifulSoup(response.text, "html.parser")# 提取数据并准备 SQL 插入语句sql = "INSERT INTO my_table (title, content) VALUES (%s, %s)"data = []for article in soup.find_all("article"):    title = article.find("h1").text    content = article.find("p").text    data.append((title, content))# 建立数据库连接并执行插入操作db = mysql.connect(...)  # 同上cursor = db.cursor()cursor.executemany(sql, data)db.commit()# 关闭连接cursor.close()db.close()

文章推荐

python爬虫入库操作教程

Python实现字典的key和values的交换

使用Python脚本来获取Cisco设备信息的示例

Python的Django中django-userena组件的简单使用教程

零基础写python爬虫之神器正则表达式

零基础写python爬虫之抓取百度贴吧代码分享