你接近了。這將新增一個(gè)帶有比賽ID的新欄位。
import pandas as pd import re url = 'https://www.espncricinfo.com/records/year/team-match-results/2005-2005/twenty20-internationals-3' base_url = 'https://www.espncricinfo.com' def match(row): match_id = re.findall('t20i-(\d*)/', row[1]) return match_id[0] table = pd.read_html(url, extract_links = "body")[0] table['match'] = table['Scorecard'].apply(match) print(table)
輸出:
Team 1 ... match 0 (新西蘭, None) ... 211048 1 (英格蘭, None) ... 211028 2 (南非, None) ... 222678 [3 行 x 8 列]