1,抓取無(wú)訪問(wèn)控制文件
<?php $ch= curl_init(); curl_setopt($ch, CURLOPT_URL,"http://localhost/mytest/phpinfo.php"); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);//如果把這行注釋掉的話,就會(huì)直接輸出 $result=curl_exec($ch); curl_close($ch); ?>
2,使用代理進(jìn)行抓取
為什么要使用代理進(jìn)行抓取呢?以google為例吧,如果去抓google的數(shù)據(jù),短時(shí)間內(nèi)抓的很頻繁的話,你就抓取不到了。google對(duì)你的ip地址做限制這個(gè)時(shí)候,你可以換代理重新抓。
<?php $ch= curl_init(); curl_setopt($ch, CURLOPT_URL,"http://blog.51yip.com"); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE); curl_setopt($ch, CURLOPT_PROXY, 125.21.23.6:8080); //url_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password');如果要密碼的話,加上這個(gè) $result=curl_exec($ch); curl_close($ch); ?>
3,post數(shù)據(jù)后,抓取數(shù)據(jù)
單獨(dú)說(shuō)一下數(shù)據(jù)提交數(shù)據(jù),因?yàn)橛?curl的時(shí)候,很多時(shí)候會(huì)有數(shù)據(jù)交互的,所以比較重要的。
<?php $ch= curl_init(); /*在這里需要注意的是,要提交的數(shù)據(jù)不能是二維數(shù)組或者更高 *例如array('name'=>serialize(array('tank','zhang')),'sex'=>1,'birth'=>'20101010') *例如array('name'=>array('tank','zhang'),'sex'=>1,'birth'=>'20101010')這樣會(huì)報(bào)錯(cuò)的*/ $data=array('name'=>'test','sex'=>1,'birth'=>'20101010'); curl_setopt($ch, CURLOPT_URL,'http://localhost/mytest/curl/upload.php'); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS,$data); curl_exec($ch); ?>
在 upload.php文件中,print_r($_POST);利用curl就能抓取出upload.php輸出的內(nèi)容Array ( [name] => test [sex] => 1 [birth] => 20101010 )
4,抓取一些有頁(yè)面訪問(wèn)控制的頁(yè)面
以前寫(xiě)過(guò)一篇,頁(yè)面訪問(wèn)控制的3種方法有興趣的可以看一下。
如果用上面提到的方法抓的話,會(huì)報(bào)以下錯(cuò)誤
You are not authorized to view this page
Youdonot have permission to view this directoryorpage using the credentials that you supplied because your Web browser is sending a WWW-Authenticate header field that the Web server is not configured to accept.
這個(gè)時(shí)候,我們就要用CURLOPT_USERPWD來(lái)進(jìn)行驗(yàn)證了
<?php $ch= curl_init(); curl_setopt($ch, CURLOPT_URL,"http://club-china"); /*CURLOPT_USERPWD主要用來(lái)破解頁(yè)面訪問(wèn)控制的 *例如平時(shí)我們所以htpasswd產(chǎn)生頁(yè)面控制等。*/ //curl_setopt($ch, CURLOPT_USERPWD, '231144:2091XTAjmd='); curl_setopt($ch, CURLOPT_HTTPGET, 1); curl_setopt($ch, CURLOPT_REFERER,"http://club-china"); curl_setopt($ch, CURLOPT_HEADER, 0); $result=curl_exec($ch); curl_close($ch); ?>
5,模擬登錄到sina
我們要抓取數(shù)據(jù),可能是登錄以后的內(nèi)容,這個(gè)時(shí)候我們就要用到curl的模擬登錄功能了。
<?php functionchecklogin($user,$password) { if( emptyempty($user) || emptyempty($password) ) { return0; } $ch= curl_init( ); curl_setopt($ch, CURLOPT_REFERER,"http://mail.sina.com.cn/index.html"); curl_setopt($ch, CURLOPT_HEADER, true ); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true ); curl_setopt($ch, CURLOPT_USERAGENT, USERAGENT ); curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIEJAR ); curl_setopt($ch, CURLOPT_TIMEOUT, TIMEOUT ); curl_setopt($ch, CURLOPT_URL,"http://mail.sina.com.cn/cgi-bin/login.cgi"); curl_setopt($ch, CURLOPT_POST, true ); curl_setopt($ch, CURLOPT_POSTFIELDS,"&logintype=uid&u=".urlencode($user)."&psw=".$password); $contents= curl_exec($ch); curl_close($ch); if( !preg_match("/Location: (.*)\\/cgi\\/index\\.php\\?check_time=(.*)\n/",$contents,$matches) ) { return0; }else{ return1; } } define("USERAGENT",$_SERVER['HTTP_USER_AGENT'] ); define("COOKIEJAR", tempnam("/tmp","cookie") ); define("TIMEOUT", 500 ); echochecklogin("zhangying215","xtaj227"); ?> 打開(kāi)/tmp下面的cookie文件看一下 # Netscape HTTP Cookie File # http://curl.haxx.se/rfc/cookie_spec.html # This file was generated by libcurl! Edit at your own risk. mail.sina.com.cn FALSE / FALSE 0 SINAMAIL-WEBFACE-SESSID 65223c4bd8900284ed463d2a3e1ac182 #HttpOnly_.sina.com.cn TRUE / FALSE 0 SUE es%3D8d96db0820c6c79922ad57d422f575e8%26ev%3Dv0%26es2%3Dcddfb8400dc5ca95902367ddcd7f57dd .sina.com.cn TRUE / FALSE 0 SUP cv%3D1%26bt%3D1286900433%26et%3D1286986833%26lt%3D1%26uid%3D1445632344%26user%3D%25E5%25BC%25A0%25E6%2598%25A02001%26ag%3D2%26name%3Dzhangying20015%2540sina.com%26nick%3D%25E5%25BC%25A0%25E6%2598%25A02001%26sex%3D1%26ps%3D0%26email%3Dzhangying20015%2540sina.com%26dob%3D1982-07-18 #HttpOnly_.sina.com.cn TRUE / FALSE 0 SID BihcallomxMx-QZxzGrOlcSQx%2F0B%2F0cmr.NyQ%2F0B%2FcmGGalmarlmcHrcGlSmrmxmfxal_CBZ%2F_afugCmmGirBYHm0Bc%40fr5ciZiGG5i #HttpOnly_.sina.com.cn TRUE / FALSE 0 SPRIAL bfb4102951fd5892a3fd5b42d442cd26 #HttpOnly_.sina.com.cn TRUE / FALSE 0 SINA_USER %D5%C5%D2001

Outils d'IA chauds

Undress AI Tool
Images de déshabillage gratuites

Undresser.AI Undress
Application basée sur l'IA pour créer des photos de nu réalistes

AI Clothes Remover
Outil d'IA en ligne pour supprimer les vêtements des photos.

Clothoff.io
Dissolvant de vêtements AI

Video Face Swap
échangez les visages dans n'importe quelle vidéo sans effort grace à notre outil d'échange de visage AI entièrement gratuit?!

Article chaud

Outils chauds

Bloc-notes++7.3.1
éditeur de code facile à utiliser et gratuit

SublimeText3 version chinoise
Version chinoise, très simple à utiliser

Envoyer Studio 13.0.1
Puissant environnement de développement intégré PHP

Dreamweaver CS6
Outils de développement Web visuel

SublimeText3 version Mac
Logiciel d'édition de code au niveau de Dieu (SublimeText3)