ajax, dom, google, html, php, phpQuery, scraping
In phpQuery on 05.10.2008 at 3:05
phpQuery can connect to Google Code’s wiki editing form, authorize itself, replace page contents and finally submit the form.
// declare main variable
$pq = null;
// create dummy document as start point
phpQuery::newDocument('
<div>')
// authorize your google account
->script('google_login')
// redirect authorized XHR component to googlecode's wiki form
->location('http://code.google.com/p/phpquery/w/edit/test')
// save result as $pq, althought we could continue the chain
// but it would break in case of error...
->toReference($pq);
if ($pq) {
// read about CallbackReference later in this post...
$pq->WebBrowser(new CallbackReference($pq))
->find('textarea:first')
->val('lorem ipsum')
->parents('form')
// first submit is Preview, so fire up second
->find(':submit:eq(1)')
// triggering submit event thought input[type=submit]:click
// is a way to choose which submit is send
->click();
if ($pq) {
// print without tags
print $pq->script('safe_print');
}
}
You can notice hot new feature – new CallbackReference($pq). Such callback sets first callback parameter to passed variable, by reference. Such pattern works with all methods accepting callbacks. Thanks to that, we can use if statements instead of function callbacks. In above example, CallbackReference object is called when click event is triggered.
Presented code snippet makes use of new Script plugin, particularly google_login.
Now it can be combined with automated XML documentation to wiki script, but this maybe for jQuery 1.3…
cli, dom, html, php, phpQuery, scraping
In Web Scraping, phpQuery on 05.10.2008 at 3:05
I’ve done what i was thinking about for some time. Terminal-firendly phpQuery CLI interface. Took about 10 minutes of coding… Works like this:
phpquery http://code.google.com/p/phpquery/downloads/list --find '.vt.col_4 a:first' --contents
This will return number of downloads latest phpQuery release file. Notice there is no need to quote url in any way. I was very happy with this so i’ve added callback support in text() and htmlOuter() methods, like so:
phpquery http://code.google.com/p/phpquery/downloads/list --find '.vt.col_4 a:first' --text strip_tags trim
When i had all stuff working, i’ve used it straight away to scrap forums and categories lists from old IPB v1.x. I’ve piped phpQuery result with sed, filtering final output.
// Fetch categories
./phpquery http://forum.wiadomosc.info/ --find '.maintitle a' | sed -r 's/^.*?c=([0-9]+).+?>(.+?)]*>([^<]*)<.*$/1: // 2/g'