Tobiasz Cudnik

Posts Tagged ‘cli’

Web Scraping with cli version of phpquery piped with sed

In Web Scraping, phpQuery on 05.10.2008 at 3:05

I’ve done what i was thinking about for some time. Terminal-firendly phpQuery CLI interface. Took about 10 minutes of coding… Works like this:

phpquery http://code.google.com/p/phpquery/downloads/list --find '.vt.col_4 a:first' --contents

This will return number of downloads latest phpQuery release file. Notice there is no need to quote url in any way. I was very happy with this so i’ve added callback support in text() and htmlOuter() methods, like so:

phpquery http://code.google.com/p/phpquery/downloads/list --find '.vt.col_4 a:first' --text strip_tags trim

When i had all stuff working, i’ve used it straight away to scrap forums and categories lists from old IPB v1.x. I’ve piped phpQuery result with sed, filtering final output.

// Fetch categories
./phpquery http://forum.wiadomosc.info/ --find '.maintitle a' | sed -r 's/^.*?c=([0-9]+).+?>(.+?)]*>([^<]*)<.*$/1: // 2/g'

PHP in CLI using $argv

In Snippets on 05.10.2008 at 3:05

Just an example showing how easy it is to implement CLI in PHP scripts. Sandbox.php file:

#!/usr/bin/php
<?php
var_dump($argv);
?>

./sandbox.php param1 –param2 param3 param4 –param5 -p 6 -fbi

array(9) {
[0]=>
string(13) "./sandbox.php"
[1]=>
string(6) "param1"
[2]=>
string(8) "--param2"
[3]=>
string(6) "param3"
[4]=>
string(6) "param4"
[5]=>
string(8) "--param5"
[6]=>
string(2) "-p"
[7]=>
string(1) "6"
[8]=>
string(4) "-fbi"
}