什么是爬蟲?——從技術原理到現實應用的全面解析 V
二十一、云原生爬蟲架構設計
21.1 無服務器爬蟲(AWS Lambda)
# lambda_function.py
import boto3
import requests
from bs4 import BeautifulSoups3 = boto3.client('s3')def lambda_handler(event, context):# 抓取目標頁面headers = {'User-Agent': 'AWS-Lambda-Crawler/1.0'}response = requests.get('https://news.example.com/latest', headers=headers)# 解析內容soup = BeautifulSoup(response.text, 'html.parser')articles = []for item in soup.select('.news-item'):articles.append({'title': item.select_one('h2').