文章目錄
- Firecrawl 是什么?
- 本地部署
- 驗證
- mcp安裝
- palyground
🔥 5 分鐘上手 Firecrawl
Firecrawl 是什么?
一句話:
開源版的 “最強網頁爬蟲 + 清洗引擎”
? 自動把任意網頁 → 結構化 Markdown / JSON
? 支持遞歸整站抓取、JS 渲染、PDF 解析、圖片 alt 自動生成
? 提供 REST API,LangChain / LlamaIndex 官方集成
官方網站
可以在playground中進行測試
點擊Get Code
可以獲得調用模板代碼
# Install with pip install firecrawl-py
import asyncio
from firecrawl import AsyncFirecrawlAppasync def main():app = AsyncFirecrawlApp(api_key='fc-d7310201c7684ec58408d62fac5d88b2')response = await app.scrape_url(url='https://blog.csdn.net/u012399690/article/details/149668148', formats= [ 'markdown' ],only_main_content= Trueparse_pdf= True,max_age= 14400000)print(response)asyncio.run(main())
本地部署
官方提供500 credits免費額度,對于經常需要使用或者隱私要求高的用戶可以選擇本地部署。
第一步:拉取代碼
git clone https://github.com/mendableai/firecrawl.git
第二步:修改配置
cp apps/api/.env.example .env
按需修改,為了簡單,可以關閉驗證
最小配置
NUM_WORKERS_PER_QUEUE=4
PORT=3002
HOST=0.0.0.0
REDIS_URL=redis://redis:6379
REDIS_RATE_LIMIT_URL=redis://redis:6379
PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/html
USE_DB_AUTHENTICATION=false
🐳 啟動
docker compose build # 第一次拉鏡像
docker compose up -d # 后臺跑
訪問:
- API:
http://localhost:3002
- 隊列管理:
http://localhost:3002/admin/@/queues
驗證
cURL命令,可在終端中快速驗證
curl -X POST http://localhost:3002/v0/scrape \-H 'Content-Type: application/json' \-d '{"url": "https://www.ithome.com/0/871/372.htm","formats": [ "markdown" ],"onlyMainContent": true,"parsePDF": true,"maxAge": 14400000}'
返回示例:
{"success": true,"data": {"content": "xxx","markdown": "xxx","linksOnPage": ["https://www.ithome.com/0/871/372.htm#","https://m.ithome.com/",],"metadata": {"ogImage": "https://img.ithome.com/m/images/logo.png","language": "zh","viewport": "width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no","description": "智譜發布新一代旗艦模型GLM-4.5,專為智能體應用打造,綜合能力達到開源SOTA,實測國內最佳。采用混合專家架構,提供兩種模式,高速低成本。API已上線開放平臺BigModel.cn,也可在智譜清言和z.ai免費體驗。#AI大模型# #智譜GLM4.5#","og:image": "https://img.ithome.com/m/images/logo.png","format-detection": "telephone=no","keywords": "智譜,GLM4.5,智能時代,人工智能","apple-itunes-app": "app-id=570610859, app-argument=ithome://news?id=871372&type=news","title": "智譜發布新一代旗艦開源模型 GLM-4.5,專為智能體應用打造 - IT之家","apple-mobile-web-app-status-bar-style": "white","apple-mobile-web-app-capable": "yes","theme-color": "#fff","favicon": "https://m.ithome.com/favicon.ico","scrapeId": "07988df7-f880-4d8e-85ee-c434a2a931c3","sourceURL": "https://www.ithome.com/0/871/372.htm","url": "https://www.ithome.com/0/871/372.htm","contentType": "text/html; charset=utf-8","proxyUsed": "basic","pageStatusCode": 200}},"returnCode": 200
}
示例
mcp安裝
我們可以通過mcp客戶端,和ai協同工作。以cheery studio為例
復制如下配置,或者在魔搭等mcp廣場進行配置,然后一鍵同步。主要修改API_KEY
{"mcpServers": {"mcp-server-firecrawl": {"command": "npx","args": ["-y", "firecrawl-mcp"],"env": {"FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE"}}}
}
如果需要配置為自建服務
{"mcpServers": {"mcp-server-firecrawl": {"command": "npx","args": ["-y", "firecrawl-mcp"],"env": {"FIRECRAWL_API_URL": "http://localhost:3002","FIRECRAWL_API_KEY": "optional-if-you-enable-auth"}}}
}
cherry studio中進行調用
palyground
開源版并沒有提供playground,只能進行api或者mcp調用。這里提供一個簡單的html頁面。
<!DOCTYPE html>
<html lang="zh-CN"><head><meta charset="UTF-8" /><title>Firecrawl 自建可視化 UI</title><meta name="viewport" content="width=device-width,initial-scale=1" /><link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css" rel="stylesheet" /><link href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.11.3/font/bootstrap-icons.css" rel="stylesheet" /><style>body {padding-top: 70px;background: #f8f9fa;}.card {box-shadow: 0 0.125rem 0.25rem rgba(0, 0, 0, 0.075);}.result-area {max-height: 400px;overflow-y: auto;font-family: SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono","Courier New", monospace;font-size: 0.8rem;}.config-panel {transition: all 0.3s ease;}.collapse:not(.show) {display: none;}</style>
</head><body><nav class="navbar navbar-expand navbar-dark bg-primary fixed-top"><div class="container-fluid"><a class="navbar-brand fw-bold" href="#"><i class="bi bi-fire"></i> Firecrawl UI</a><button class="btn btn-outline-light btn-sm" data-bs-toggle="modal" data-bs-target="#configModal"><i class="bi bi-gear"></i> 配置</button></div></nav><div class="container"><!-- 功能區 --><div class="card mb-3"><div class="card-header"><ul class="nav nav-tabs card-header-tabs" id="mainTabs" role="tablist"><li class="nav-item" role="presentation"><button class="nav-link active" id="scrape-tab" data-bs-toggle="tab" data-bs-target="#scrape-pane"type="button" role="tab" aria-controls="scrape-pane" aria-selected="true">📥 單頁抓取</button></li><li class="nav-item" role="presentation"><button class="nav-link" id="crawl-tab" data-bs-toggle="tab" data-bs-target="#crawl-pane" type="button"role="tab" aria-controls="crawl-pane" aria-selected="false">🕸? 整站抓取</button></li></ul></div><div class="card-body"><div class="tab-content" id="mainTabContent"><!-- 單頁抓取面板 --><div class="tab-pane fade show active" id="scrape-pane" role="tabpanel" aria-labelledby="scrape-tab"><div class="mb-3"><label for="scrapeUrl" class="form-label">網頁地址</label><input type="url" class="form-control" id="scrapeUrl" placeholder="https://docs.firecrawl.dev" /><div class="form-text">輸入要抓取的單個網頁地址</div></div><button class="btn btn-primary" id="scrapeBtn" onclick="handleScrape()"><i class="bi bi-download"></i> 立即抓取</button></div><!-- 整站抓取面板 --><div class="tab-pane fade" id="crawl-pane" role="tabpanel" aria-labelledby="crawl-tab"><div class="mb-3"><label for="crawlUrl" class="form-label">網站地址</label><input type="url" class="form-control" id="crawlUrl" placeholder="https://docs.firecrawl.dev" /><div class="form-text">輸入要爬取的網站根地址</div></div><div class="mb-3"><label for="maxPages" class="form-label">最大頁數</label><input type="number" class="form-control" id="maxPages" placeholder="10" min="1" max="100" value="10" /><div class="form-text">限制爬取的最大頁面數量 (1-100)</div></div><button class="btn btn-warning" id="crawlBtn" onclick="handleCrawl()"><i class="bi bi-globe"></i> 開始爬取</button></div></div></div></div><!-- 結果區 --><div class="card mb-3"><div class="card-header d-flex justify-content-between align-items-center"><span>📝 結果預覽</span><button class="btn btn-sm btn-outline-secondary d-none" id="copyBtn" onclick="copyResult()"><i class="bi bi-clipboard"></i> 復制</button></div><div class="card-body"><pre class="result-area border p-2 bg-light" id="result">
等待結果...</pre></div></div></div><!-- 配置彈框 --><div class="modal fade" id="configModal" tabindex="-1" aria-labelledby="configModalLabel" aria-hidden="true"><div class="modal-dialog"><div class="modal-content"><div class="modal-header"><h5 class="modal-title" id="configModalLabel"><i class="bi bi-gear"></i> 服務配置</h5><button type="button" class="btn-close" data-bs-dismiss="modal" aria-label="Close"></button></div><div class="modal-body"><div class="mb-3"><label for="baseUrl" class="form-label">Base URL</label><input type="url" class="form-control" id="baseUrl" placeholder="http://localhost:3002"value="http://localhost:3002" /><div class="form-text">Firecrawl 服務的基礎地址</div></div><div class="mb-3"><label for="apiKey" class="form-label">API Key</label><input type="password" class="form-control" id="apiKey" placeholder="可選,無鑒權時留空" /><div class="form-text">如果服務需要鑒權,請輸入 API Key</div></div></div><div class="modal-footer"><button type="button" class="btn btn-secondary" data-bs-dismiss="modal">取消</button><button type="button" class="btn btn-primary" onclick="saveConfig()" data-bs-dismiss="modal">保存配置</button></div></div></div></div><script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/js/bootstrap.bundle.min.js"></script>""<script>const $ = (id) => document.getElementById(id);const base = () => $("baseUrl").value.replace(/\/$/, "");const key = () => $("apiKey").value;// 加載保存的配置document.addEventListener('DOMContentLoaded', function () {loadConfig();});function loadConfig() {const savedBaseUrl = localStorage.getItem('firecrawl_baseUrl');const savedApiKey = localStorage.getItem('firecrawl_apiKey');if (savedBaseUrl) $("baseUrl").value = savedBaseUrl;if (savedApiKey) $("apiKey").value = savedApiKey;}function saveConfig() {localStorage.setItem('firecrawl_baseUrl', $("baseUrl").value);localStorage.setItem('firecrawl_apiKey', $("apiKey").value);// 顯示保存成功提示const toast = document.createElement('div');toast.className = 'toast align-items-center text-white bg-success border-0 position-fixed top-0 end-0 m-3';toast.style.zIndex = '9999';toast.innerHTML = `<div class="d-flex"><div class="toast-body"><i class="bi bi-check-circle"></i> 配置已保存</div><button type="button" class="btn-close btn-close-white me-2 m-auto" data-bs-dismiss="toast"></button></div>`;document.body.appendChild(toast);const bsToast = new bootstrap.Toast(toast);bsToast.show();// 3秒后自動移除setTimeout(() => {if (toast.parentNode) {toast.parentNode.removeChild(toast);}}, 3000);}async function request(path, body) {const headers = { "Content-Type": "application/json" };if (key()) headers["Authorization"] = `Bearer ${key()}`;return fetch(`${base()}${path}`, {method: "POST",headers,body: JSON.stringify(body),}).then((r) => r.json());}async function handleScrape() {const url = $("scrapeUrl").value;if (!url) return alert("請輸入網址");const scrapeBtn = $("scrapeBtn");// 禁用按鈕但保持原有樣式scrapeBtn.disabled = true;$("result").textContent = "抓取中...";$("copyBtn").classList.add("d-none");try {const res = await request("/v0/scrape", {url,pageOptions: { onlyMainContent: true },});$("result").textContent =res.data?.markdown || JSON.stringify(res, null, 2);$("copyBtn").classList.remove("d-none");window.lastResult = res;} catch (error) {$("result").textContent = `抓取失敗: ${error.message}`;} finally {// 恢復按鈕狀態scrapeBtn.disabled = false;}}async function handleCrawl() {const url = $("crawlUrl").value;const limit = parseInt($("maxPages").value) || 10;if (!url) return alert("請輸入網址");const crawlBtn = $("crawlBtn");// 禁用按鈕但保持原有樣式crawlBtn.disabled = true;$("result").textContent = "整站爬取中,請稍等...";$("copyBtn").classList.add("d-none");try {const job = await request("/v0/crawl", { url, limit });if (!job.jobId) {$("result").textContent = JSON.stringify(job, null, 2);crawlBtn.disabled = false;return;}const poll = setInterval(async () => {const headers = { "Content-Type": "application/json" };if (key()) headers["Authorization"] = `Bearer ${key()}`;const response = await fetch(`${base()}/v0/crawl/status/${job.jobId}`, {method: "GET",headers,});const status = await response.json();$("result").textContent = JSON.stringify(status, null, 2);if (status.status === "completed") {clearInterval(poll);window.lastResult = status;$("copyBtn").classList.remove("d-none");crawlBtn.disabled = false;}if (status.status === "failed") {clearInterval(poll);crawlBtn.disabled = false;}}, 2000);} catch (error) {$("result").textContent = `爬取失敗: ${error.message}`;crawlBtn.disabled = false;}}async function copyResult() {try {const dataStr = JSON.stringify(window.lastResult, null, 2);await navigator.clipboard.writeText(dataStr);// 顯示復制成功提示const toast = document.createElement('div');toast.className = 'toast align-items-center text-white bg-success border-0 position-fixed top-0 end-0 m-3';toast.style.zIndex = '9999';toast.innerHTML = `<div class="d-flex"><div class="toast-body"><i class="bi bi-check-circle"></i> 結果已復制到剪貼板</div><button type="button" class="btn-close btn-close-white me-2 m-auto" data-bs-dismiss="toast"></button></div>`;document.body.appendChild(toast);const bsToast = new bootstrap.Toast(toast);bsToast.show();// 3秒后自動移除setTimeout(() => {if (toast.parentNode) {toast.parentNode.removeChild(toast);}}, 3000);} catch (error) {// 如果剪貼板 API 不可用,使用備用方法const textArea = document.createElement('textarea');textArea.value = JSON.stringify(window.lastResult, null, 2);document.body.appendChild(textArea);textArea.select();document.execCommand('copy');document.body.removeChild(textArea);alert('結果已復制到剪貼板');}}</script>
</body></html>