国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Community

Learn

Tools Library

AI Tools

Leisure

English

python - 如何爬取URL不變的網(wǎng)站內(nèi)容

伊謝爾倫 2017-04-18 10:13:25

1744

<a href="javascript:__doPostBack('AspNetPager1','3')" class="Pager" title="轉(zhuǎn)到第3頁" style="margin-right:5px;">[3]</a>
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }

對于這種翻頁方式，怎么用爬蟲爬取呢？網(wǎng)站翻頁后URL沒有發(fā)生改變。我之前使用bs4和selenium模擬翻頁操作再爬取，可是數(shù)據(jù)量太大，這種方法速度太慢。80%的時間都浪費在翻頁上。

伊謝爾倫

小伙看你根骨奇佳，潛力無限，來學(xué)PHP伐。

reply all(2)

小葫蘆2017-04-18 10:15:25 2 floor

This problem needs to be analyzed specifically on the website. Different websites will have different handling methods.
Now assume that in a more common situation, this method can be used:

Turn on browser debugging mode
Click the next page to view the Response of the corresponding network request. This response is usually the URL of the next page
View the request headers and request parameters of the request, analyze and find the pattern
Use python to simulate HTTP requests to get URLs in batches
Crawling information, recommend LXML for HTML parsing

As for how to simulate HTTP requests, please refer to python to simulate HTTP requests