<?php header("Content-Type: text/html; charset=UTF-8"); require("phpQuery.php"); $hj = QueryList::Query('http://mobile.csdn.net/',array("title"=>array('.unit h1','text'))); //dump($hj->data); $data = QueryList::Query('http://cms.querylist.cc/bizhi/453.html',array( 'image' => array('img','src') ))->data; // $data = QueryList::Query('http://cms.querylist.cc/google/list_1.html',array( 'link' => array('a','href') ))->data; $page = 'http://cms.querylist.cc/news/566.html'; $reg = array( 'title' => array('h1','text'), 'date' => array('.pt_info','text','-span -a',function($content){ $arr = explode(' ',$content); return $arr[0]; }), 'content' => array('.post_content','html','a -.content_copyright -script',function($content){ $doc = phpQuery::newDocumentHTML($content); $imgs = pq($doc)->find('img'); foreach ($imgs as $img) { $src = 'http://cms.querylist.cc'.pq($img)->attr('src'); $localSrc = 'w/'.md5($src).'.jpg'; $stream = file_get_contents($src); file_put_contents($localSrc,$stream); pq($img)->attr('src',$localSrc); } return $doc->htmlOuter(); }) ); $rang = '.content'; $ql = QueryList::Query($page,$reg,$rang); $data = $ql->getData(); dump($data);
supports crawling websites and crawling. It is very powerful. It is a server-side open source project based on PHP. It allows PHP developers to easily process DOM document content, such as obtaining the headline information of a news website. What's more interesting is that it uses the idea of ????jQuery. You can process the page content just like using jQuery to get the page information you want.
All resources on this site are contributed by netizens or reprinted by major download sites. Please check the integrity of the software yourself! All resources on this site are for learning reference only. Please do not use them for commercial purposes. Otherwise, you will be responsible for all consequences! If there is any infringement, please contact us to delete it. Contact information: admin@php.cn
Related Article

27 Aug 2025
This article aims to provide a clear and efficient solution for crawling table data from dynamic ASP.NET websites. By simulating the website's POST request, bypass the use of Selenium and directly obtain the HTML source code containing the tabular data. Combined with BeautifulSoup and Pandas libraries, data parsing, cleaning and extraction are realized, and finally presented in an easy-to-read table. This method is suitable for scenarios where data is required to be automatically captured by such websites, and can effectively improve the efficiency and stability of data acquisition.

09 Nov 2024
Distinguishing "Memcache" and "Memcached" in PHPPHP offers two memcached libraries: memcache and memcached. Understanding their differences helps...

19 Nov 2024
Memcache vs Memcached: Choosing the Right PHP Memcached LibraryIntroductionPHP offers two seemingly similar memcached libraries: memcache and...

18 Nov 2024
PHP Email Address Validation Libraries UncoveredEmail address validation plays a crucial role in data validation, but creating a...

12 Nov 2024
Memcache vs. Memcached: Choosing the Right PHP Library for Your Cache NeedsIn the realm of PHP caching libraries, Memcache and Memcached stand out...

13 Dec 2024
Linking Static Libraries to Other Static Libraries: A Comprehensive ApproachStatic libraries provide a convenient mechanism to package reusable...


Hot Tools

PHP library for dependency injection containers
PHP library for dependency injection containers

A collection of 50 excellent classic PHP algorithms
Classic PHP algorithm, learn excellent ideas and expand your thinking

Small PHP library for optimizing images
Small PHP library for optimizing images
