← 返回大綱
第十三章

網路爬蟲
與 API 調用

Web Scraping & APIs

HTTP 基礎

HTTP 請求基本概念

常用 HTTP 方法

  • GET — 取得資料
  • POST — 送出資料
  • PUT — 更新資料
  • DELETE — 刪除資料

常見狀態碼

  • 200 — 成功
  • 201 — 建立成功
  • 400 — 請求錯誤
  • 401 — 未授權
  • 404 — 找不到
  • 500 — 伺服器錯誤
requests

requests — 發送 HTTP 請求

pip install requests
import requests

# GET 請求
response = requests.get("https://api.github.com/users/octocat")
print(response.status_code)   # 200
data = response.json()
print(data["login"])           # octocat
print(data["public_repos"])    # 公開 repo 數量

# 帶參數的 GET
params = {"q": "python", "sort": "stars"}
r = requests.get("https://api.github.com/search/repositories", params=params)
repos = r.json()["items"]
for repo in repos[:3]:
    print(repo["full_name"], repo["stargazers_count"])
requests

POST 請求與標頭

import requests

# POST 請求(例如登入 API)
response = requests.post(
    "https://api.example.com/login",
    json={"username": "ian", "password": "secret"}
)
token = response.json()["token"]

# 帶 Authorization 標頭
headers = {"Authorization": f"Bearer {token}"}
r = requests.get("https://api.example.com/profile", headers=headers)
print(r.json())

# 錯誤處理
try:
    r = requests.get("https://api.example.com/data", timeout=5)
    r.raise_for_status()   # 如果非 2xx 會丟出例外
except requests.exceptions.RequestException as e:
    print(f"請求失敗:{e}")
BeautifulSoup

網頁爬蟲 — BeautifulSoup

pip install requests beautifulsoup4
import requests
from bs4 import BeautifulSoup

url = "https://news.ycombinator.com"
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

# 找到所有文章標題
titles = soup.find_all("span", class_="titleline")
for i, title in enumerate(titles[:5], 1):
    a = title.find("a")
    print(f"{i}. {a.text}")
    print(f"   {a.get('href')}")
爬蟲注意事項

爬取前請確認網站的 robots.txt 規定,不要造成伺服器負擔,並遵守使用條款。

JSON API

免費 API 實作範例

import requests

# 天氣 API(Open-Meteo,免費不需要 key)
def get_weather(lat, lon):
    url = "https://api.open-meteo.com/v1/forecast"
    params = {
        "latitude": lat,
        "longitude": lon,
        "current_weather": True
    }
    r = requests.get(url, params=params)
    weather = r.json()["current_weather"]
    return weather

# 台北的座標
weather = get_weather(25.0330, 121.5654)
print(f"溫度:{weather['temperature']}°C")
print(f"風速:{weather['windspeed']} km/h")
實作練習

動手試試看

import requests

def get_exchange_rate(base="USD"):
    """取得匯率資料(免費 API)"""
    url = f"https://api.exchangerate-api.com/v4/latest/{base}"
    try:
        r = requests.get(url, timeout=5)
        r.raise_for_status()
        data = r.json()
        return data["rates"]
    except Exception as e:
        print(f"取得匯率失敗:{e}")
        return {}

rates = get_exchange_rate("USD")
if rates:
    print(f"1 USD = {rates.get('TWD', 'N/A')} TWD")
    print(f"1 USD = {rates.get('JPY', 'N/A')} JPY")
    print(f"1 USD = {rates.get('EUR', 'N/A')} EUR")

第十三章完成!

學會了 requests、BeautifulSoup 爬蟲與 REST API 調用。