霜天部落 | 专注PHP研发,研究LAMP高性能架构部署与优化

使用phpQuery解析DOM

我们知道jQuery是一个很简单的DOM解析器,它能够在浏览器端很方便的解析网页DOM结构。那么如何在php中解析网页DOM呢?phpQuery诞生了,它是一个利用php语言在服务器端实现了网页DOM解析的API。利用phpQuery可以像jQuery一样方便的解析网页DOM,而且phpQuery与jQuery语法和函数基本一致,也就是说如果你会了jQuery的语法,那么你也就会了phpQuery的用法!有了phpQuery,如果你要制作php采集程序,那就很简单了。

下面是phpQuery的基本用法

phpQuery::newDocumentFileXHTML(‘my-xhtml.html’)->find(‘p’);
$ul = pq(‘ul’);

创建phpQuery对象有以下方法可供选择:

* phpQuery::newDocument($html, $contentType = null) Creates new document from markup. If no $contentType, autodetection is made (based on markup). If it fails, text/html in utf-8 is used.
* phpQuery::newDocumentFile($file, $contentType = null) Creates new document from file. Works like newDocument()
* phpQuery::newDocumentHTML($html, $charset = ‘utf-8′)
* phpQuery::newDocumentXHTML($html, $charset = ‘utf-8′)
* phpQuery::newDocumentXML($html, $charset = ‘utf-8′)
* phpQuery::newDocumentPHP($html, $contentType = null) Read more about it on PHPSupport page
* phpQuery::newDocumentFileHTML($file, $charset = ‘utf-8′)
* phpQuery::newDocumentFileXHTML($file, $charset = ‘utf-8′)
* phpQuery::newDocumentFileXML($file, $charset = ‘utf-8′)
* phpQuery::newDocumentFilePHP($file, $contentType) Read more about it on PHPSupport page

pq 函数说明

pq($param, $context = null);

pq(); function is equivalent of jQuery’s $();. It’s used for 3 type of things:

1. Importing markup

// Import into selected document:
// doesn’t accept text nodes at beginning of input string
pq(‘<div/>’)
// Import into document with ID from $pq->getDocumentID():
pq(‘<div/>’, $pq->getDocumentID())
// Import into same document as DOMNode belongs to:
pq(‘<div/>’, DOMNode)
// Import into document from phpQuery object:
pq(‘<div/>’, $pq)

2. Running queries

// Run query on last selected document:
pq(‘div.myClass’)
// Run query on document with ID from $pq->getDocumentID():
pq(‘div.myClass’, $pq->getDocumentID())
// Run query on same document as DOMNode belongs to and use node(s)as root for query:
pq(‘div.myClass’, DOMNode)
// Run query on document from phpQuery object
// and use object’s stack as root node(s) for query:
pq(‘div.myClass’, $pq)

3. Wrapping DOMNodes with phpQuery objects
foreach(pq(‘li’) as $li)
// $li is pure DOMNode, change it to phpQuery object
pq($li);

jQuery官方地址:http://jquery.com/
phpQuery下载地址:http://code.google.com/p/phpquery/
phpQuery手册:http://code.google.com/p/phpquery/wiki/Manual