Python ruft den Statuscode der HTTP-Anfrage (200, 404 usw.) ab, ohne auf den gesamten Quellcode der Seite zuzugreifen, was eine Verschwendung von Ressourcen darstellt:
輸入:segmentfault.com 輸出:200
輸入:segmentfault.com/nonexistant 輸出:404
溫故而知新,可以為師矣。 博客:www.ouyangke.com
參考文章:Python實(shí)用腳本清單
http不只有get
方法(請(qǐng)求頭部
+正文
),還有head
方法,只請(qǐng)求頭部
。
import httplib
def get_status_code(host, path="/"):
""" This function retreives the status code of a website by requesting
HEAD data from the host. This means that it only requests the headers.
If the host cannot be reached or something else goes wrong, it returns
None instead.
"""
try:
conn = httplib.HTTPConnection(host)
conn.request("HEAD", path)
return conn.getresponse().status
except StandardError:
return None
print get_status_code("segmentfault.com") # prints 200
print get_status_code("segmentfault.com", "/nonexistant") # prints 404
你用get
請(qǐng)求就會(huì)請(qǐng)求整個(gè)頭部
+正文
, 可以試下head
方法, 直接訪問(wèn)頭部!
import requests
html = requests.head('http://segmentfault.com') # 用head方法去請(qǐng)求資源頭部
print html.status_code # 狀態(tài)碼
html = requests.head('/nonexistant') # 用head方法去請(qǐng)求資源頭部
print html.status_code # 狀態(tài)碼
# 輸出:
200
404