第十三章

網路爬蟲
與 API 調用

Web Scraping & APIs

HTTP 基礎

HTTP 請求基本概念

常用 HTTP 方法

GET — 取得資料
POST — 送出資料
PUT — 更新資料
DELETE — 刪除資料

常見狀態碼

200 — 成功
201 — 建立成功
400 — 請求錯誤
401 — 未授權
404 — 找不到
500 — 伺服器錯誤

requests

requests — 發送 HTTP 請求

pip install requests

import requests

# GET 請求
response = requests.get("https://api.github.com/users/octocat")
print(response.status_code)   # 200
data = response.json()
print(data["login"])           # octocat
print(data["public_repos"])    # 公開 repo 數量

# 帶參數的 GET
params = {"q": "python", "sort": "stars"}
r = requests.get("https://api.github.com/search/repositories", params=params)
repos = r.json()["items"]
for repo in repos[:3]:
    print(repo["full_name"], repo["stargazers_count"])

requests

POST 請求與標頭

import requests

# POST 請求（例如登入 API）
response = requests.post(
    "https://api.example.com/login",
    json={"username": "ian", "password": "secret"}
)
token = response.json()["token"]

# 帶 Authorization 標頭
headers = {"Authorization": f"Bearer {token}"}
r = requests.get("https://api.example.com/profile", headers=headers)
print(r.json())

# 錯誤處理
try:
    r = requests.get("https://api.example.com/data", timeout=5)
    r.raise_for_status()   # 如果非 2xx 會丟出例外
except requests.exceptions.RequestException as e:
    print(f"請求失敗：{e}")

BeautifulSoup

網頁爬蟲 — BeautifulSoup

pip install requests beautifulsoup4

import requests
from bs4 import BeautifulSoup

url = "https://news.ycombinator.com"
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

# 找到所有文章標題
titles = soup.find_all("span", class_="titleline")
for i, title in enumerate(titles[:5], 1):
    a = title.find("a")
    print(f"{i}. {a.text}")
    print(f"   {a.get('href')}")

爬蟲注意事項

爬取前請確認網站的 robots.txt 規定，不要造成伺服器負擔，並遵守使用條款。

JSON API

免費 API 實作範例

import requests

# 天氣 API（Open-Meteo，免費不需要 key）
def get_weather(lat, lon):
    url = "https://api.open-meteo.com/v1/forecast"
    params = {
        "latitude": lat,
        "longitude": lon,
        "current_weather": True
    }
    r = requests.get(url, params=params)
    weather = r.json()["current_weather"]
    return weather

# 台北的座標
weather = get_weather(25.0330, 121.5654)
print(f"溫度：{weather['temperature']}°C")
print(f"風速：{weather['windspeed']} km/h")

實作練習

動手試試看

import requests

def get_exchange_rate(base="USD"):
    """取得匯率資料（免費 API）"""
    url = f"https://api.exchangerate-api.com/v4/latest/{base}"
    try:
        r = requests.get(url, timeout=5)
        r.raise_for_status()
        data = r.json()
        return data["rates"]
    except Exception as e:
        print(f"取得匯率失敗：{e}")
        return {}

rates = get_exchange_rate("USD")
if rates:
    print(f"1 USD = {rates.get('TWD', 'N/A')} TWD")
    print(f"1 USD = {rates.get('JPY', 'N/A')} JPY")
    print(f"1 USD = {rates.get('EUR', 'N/A')} EUR")

第十三章完成！

學會了 requests、BeautifulSoup 爬蟲與 REST API 調用。

← 第十二章第十四章：資料分析 →

網路爬蟲與 API 調用

HTTP 請求基本概念

常用 HTTP 方法

常見狀態碼

requests — 發送 HTTP 請求

POST 請求與標頭

網頁爬蟲 — BeautifulSoup

免費 API 實作範例

動手試試看

第十三章完成！

網路爬蟲
與 API 調用