Python轻松搞定PDF处理的5大绝招
使用 Python 自动化处理 PDF 的 5 种实用方法
方法 1:提取 PDF 文本内容
PyPDF2 或 pdfplumber 库可用于提取 PDF 中的文本。PyPDF2 适用于简单文本提取,而 pdfplumber 能更好地处理复杂布局。
import PyPDF2
with open("example.pdf", "rb") as file:
reader = PyPDF2.PdfReader(file)
text = ""
for page in reader.pages:
text += page.extract_text()
print(text)
pdfplumber 提供更精确的文本和表格提取功能:
import pdfplumber
with pdfplumber.open("example.pdf") as pdf:
for page in pdf.pages:
print(page.extract_text())
方法 2:合并多个 PDF 文件
PyPDF2 可以轻松合并多个 PDF 文件,适用于批量处理文档。
from PyPDF2 import PdfMerger
merger = PdfMerger()
pdf_files = ["file1.pdf", "file2.pdf"]
for pdf in pdf_files:
merger.append(pdf)
merger.write("merged.pdf")
merger.close()
方法 3:拆分 PDF 文件
PyPDF2 支持按页面拆分 PDF,适用于提取特定部分内容。
from PyPDF2 import PdfReader, PdfWriter
reader = PdfReader("example.pdf")
writer = PdfWriter()
for page_num in range(5, 10): # 提取第 5-9 页
writer.add_page(reader.pages[page_num])
with open("split.pdf", "wb") as output:
writer.write(output)
方法 4:加密或解密 PDF
PyPDF2 支持为 PDF 添加密码保护或解密已加密文件。
from PyPDF2 import PdfWriter, PdfReader
writer = PdfWriter()
reader = PdfReader("example.pdf")
for page in reader.pages:
writer.add_page(page)
writer.encrypt("password") # 设置密码
with open("encrypted.pdf", "wb") as file:
writer.write(file)
解密 PDF 需要提供密码:
reader = PdfReader("encrypted.pdf")
if reader.is_encrypted:
reader.decrypt("password") # 解密
text = reader.pages[0].extract_text()
print(text)
方法 5:从 PDF 提取表格数据
tabula-py 或 camelot 库专门用于提取 PDF 表格数据,适合数据分析场景。
import tabula
tables = tabula.read_pdf("data.pdf", pages="all")
for table in tables:
print(table) # 输出 DataFrame
camelot 提供更精确的表格提取:
import camelot
tables = camelot.read_pdf("data.pdf", flavor="stream")
tables[0].to_csv("table.csv") # 导出为 CSV
这些方法覆盖了 PDF 处理的常见需求,可根据具体场景选择合适的工具。
BbS.okane122.info/PoSt/1121_691099.HtM
BbS.okane123.info/PoSt/1121_884892.HtM
BbS.okane124.info/PoSt/1121_481392.HtM
BbS.okane125.info/PoSt/1121_469294.HtM
BbS.okane126.info/PoSt/1121_124355.HtM
BbS.okane127.info/PoSt/1121_577826.HtM
BbS.okane128.info/PoSt/1121_455744.HtM
BbS.okane129.info/PoSt/1121_300481.HtM
BbS.okane130.info/PoSt/1121_761316.HtM
BbS.okane131.info/PoSt/1121_663692.HtM
BbS.okane122.info/PoSt/1121_355399.HtM
BbS.okane123.info/PoSt/1121_199547.HtM
BbS.okane124.info/PoSt/1121_184180.HtM
BbS.okane125.info/PoSt/1121_665185.HtM
BbS.okane126.info/PoSt/1121_957131.HtM
BbS.okane127.info/PoSt/1121_519121.HtM
BbS.okane128.info/PoSt/1121_473187.HtM
BbS.okane129.info/PoSt/1121_749560.HtM
BbS.okane130.info/PoSt/1121_612384.HtM
BbS.okane131.info/PoSt/1121_144384.HtM
BbS.okane122.info/PoSt/1121_280675.HtM
BbS.okane123.info/PoSt/1121_108900.HtM
BbS.okane124.info/PoSt/1121_046874.HtM
BbS.okane125.info/PoSt/1121_092260.HtM
BbS.okane126.info/PoSt/1121_080207.HtM
BbS.okane127.info/PoSt/1121_672765.HtM
BbS.okane128.info/PoSt/1121_287884.HtM
BbS.okane129.info/PoSt/1121_998243.HtM
BbS.okane130.info/PoSt/1121_583606.HtM
BbS.okane131.info/PoSt/1121_593746.HtM
BbS.okane122.info/PoSt/1121_166213.HtM
BbS.okane123.info/PoSt/1121_828437.HtM
BbS.okane124.info/PoSt/1121_191802.HtM
BbS.okane125.info/PoSt/1121_055481.HtM
BbS.okane126.info/PoSt/1121_532156.HtM
BbS.okane127.info/PoSt/1121_886288.HtM
BbS.okane128.info/PoSt/1121_551337.HtM
BbS.okane129.info/PoSt/1121_101902.HtM
BbS.okane130.info/PoSt/1121_223006.HtM
BbS.okane131.info/PoSt/1121_200419.HtM
BbS.okane122.info/PoSt/1121_555915.HtM
BbS.okane123.info/PoSt/1121_103606.HtM
BbS.okane124.info/PoSt/1121_037123.HtM
BbS.okane125.info/PoSt/1121_892687.HtM
BbS.okane126.info/PoSt/1121_653971.HtM
BbS.okane127.info/PoSt/1121_095497.HtM
BbS.okane128.info/PoSt/1121_182805.HtM
BbS.okane129.info/PoSt/1121_017505.HtM
BbS.okane130.info/PoSt/1121_476661.HtM
BbS.okane131.info/PoSt/1121_968457.HtM
BbS.okane122.info/PoSt/1121_610839.HtM
BbS.okane123.info/PoSt/1121_150155.HtM
BbS.okane124.info/PoSt/1121_641077.HtM
BbS.okane125.info/PoSt/1121_380831.HtM
BbS.okane126.info/PoSt/1121_324424.HtM
BbS.okane127.info/PoSt/1121_708454.HtM
BbS.okane128.info/PoSt/1121_925163.HtM
BbS.okane129.info/PoSt/1121_597616.HtM
BbS.okane130.info/PoSt/1121_248555.HtM
BbS.okane131.info/PoSt/1121_678164.HtM
BbS.okane122.info/PoSt/1121_442376.HtM
BbS.okane123.info/PoSt/1121_577296.HtM
BbS.okane124.info/PoSt/1121_810851.HtM
BbS.okane125.info/PoSt/1121_192440.HtM
BbS.okane126.info/PoSt/1121_278232.HtM
BbS.okane127.info/PoSt/1121_009769.HtM
BbS.okane128.info/PoSt/1121_336581.HtM
BbS.okane129.info/PoSt/1121_584216.HtM
BbS.okane130.info/PoSt/1121_572122.HtM
BbS.okane131.info/PoSt/1121_292735.HtM
BbS.okane122.info/PoSt/1121_480656.HtM
BbS.okane123.info/PoSt/1121_687054.HtM
BbS.okane124.info/PoSt/1121_134625.HtM
BbS.okane125.info/PoSt/1121_170746.HtM
BbS.okane126.info/PoSt/1121_234242.HtM
BbS.okane127.info/PoSt/1121_765794.HtM
BbS.okane128.info/PoSt/1121_551055.HtM
BbS.okane129.info/PoSt/1121_187062.HtM
BbS.okane130.info/PoSt/1121_334554.HtM
BbS.okane131.info/PoSt/1121_646057.HtM
查看11道真题和解析