dom, html, php, phpQuery, scraping, test cases, web
In Ideas, phpQuery on 13.01.2009 at 23:09
Using phpQuery and some UnitTest framework (SimpleTest in this example) you can automatically test web page for presentence of specific part and it’s position. Set of such tests can save a lot of time during website development and after that even more.
Not only you can do simple tests like “are articles visible” but using WebBrowser plugin you can test whole process, let’s say a user registration. Example of such test i would like to present. This test will include following steps:
- Enter main page and follow the registration link
- Fill registration form and submit it
- Check if result is expected
Like i sad before, SimpleTest is framework of choice, but it doesn’t matter so much. What’s important:
- WebBrowser needs callbacks
- Callbacks should be declared as functions (for PHP < 5.3)
- Inside callbacks, $this variable is unavailable
- First step of the test should check if last step has succeeded
Full code below:
require('simpletest/autorun.php');
require('phpQuery/phpQuery.php');
class CustomerTest extends UnitTestCase {
public static $_this;
public static $registration = array(
'username' => null,
'success' => false
);
function testRegistration() {
self::$_this = $this;
phpQuery::browserGet('http://localhost/tested-site/',
array('CustomerTest', '_testRegistrationLink')
);
$this->assertTrue(
self::$registration['success'], "Registration unsuccessful"
);
}
function _testRegistrationLink($browser) {
$registrationLink = null;
$browser->find('a:contains(rejestracja)')
->WebBrowser(array('CustomerTest', '_testRegistrationForm'))
->toReference($registrationLink)
// jump to _testRegistrationForm
->click();
self::$_this->assertTrue(
$registrationLink->length, "Registration link missing"
);
}
function _testRegistrationForm($browser) {
$registrationForm = null;
$username = md5(microtime());
$browser['.customers.form form']
->toReference($registrationForm)
->WebBrowser(array('CustomerTest', '_testRegistrationResult'))
->find('input[name*=login]')->val($username)->end()
->find('input[name*=email]')->val($username.'@test.com')->end()
// jump to _testRegistrationResult
->submit();
self::$_this->assertTrue(
$registrationForm->length, "Registration form missing"
);
self::$registration['username'] = $username;
}
function _testRegistrationResult($browser) {
$loginForm = $browser->find('h2:text(Logowanie)');
self::$_this->assertTrue($loginForm->length, "Login form missing");
if ($loginForm->length)
self::$registration['success'] = true;
}
}
WebBrowser doesn’t support AJAX, so not all sites can be tested like this (although you can do it with AHAH after some work), but cookies and HTTP authentication should satisfy most needs.
Of course that’s noting new, projects accomplishing similar goal exist quite time now, eg jWebUnit, but neither of them have jQuery under the hood ;)
bash, dom, php, phpQuery, scraping
In Web Scraping on 05.11.2008 at 17:11
When some project doesn’t use SVN or any other version-control system (or you can’t use it) you have to download things manually. I don’t have to say that nobody wants to do this, so what can you do to not do it ? You can simulate yourself doing it…
Example below downloads latest release of madwifi branch with new HAL (which i need for my WiFi adapter).
#!/usr/bin/php
find('table tr')->slice(-2, -1)->downloadTo('/target/local/path');
}
?>
Now, to get latest release all i need is to run above script from command line. One missing thing is checking if anything has changed but i leave it to you to resolve ;)
For files which names doesn’t change you can just use wget, like so:
wget 'http://host.net/somefile.zip' -O new-name.zip
ajax, dom, google, html, php, phpQuery, scraping
In phpQuery on 05.10.2008 at 3:05
phpQuery can connect to Google Code’s wiki editing form, authorize itself, replace page contents and finally submit the form.
// declare main variable
$pq = null;
// create dummy document as start point
phpQuery::newDocument('
<div>')
// authorize your google account
->script('google_login')
// redirect authorized XHR component to googlecode's wiki form
->location('http://code.google.com/p/phpquery/w/edit/test')
// save result as $pq, althought we could continue the chain
// but it would break in case of error...
->toReference($pq);
if ($pq) {
// read about CallbackReference later in this post...
$pq->WebBrowser(new CallbackReference($pq))
->find('textarea:first')
->val('lorem ipsum')
->parents('form')
// first submit is Preview, so fire up second
->find(':submit:eq(1)')
// triggering submit event thought input[type=submit]:click
// is a way to choose which submit is send
->click();
if ($pq) {
// print without tags
print $pq->script('safe_print');
}
}
You can notice hot new feature – new CallbackReference($pq). Such callback sets first callback parameter to passed variable, by reference. Such pattern works with all methods accepting callbacks. Thanks to that, we can use if statements instead of function callbacks. In above example, CallbackReference object is called when click event is triggered.
Presented code snippet makes use of new Script plugin, particularly google_login.
Now it can be combined with automated XML documentation to wiki script, but this maybe for jQuery 1.3…
cli, dom, html, php, phpQuery, scraping
In Web Scraping, phpQuery on 05.10.2008 at 3:05
I’ve done what i was thinking about for some time. Terminal-firendly phpQuery CLI interface. Took about 10 minutes of coding… Works like this:
phpquery http://code.google.com/p/phpquery/downloads/list --find '.vt.col_4 a:first' --contents
This will return number of downloads latest phpQuery release file. Notice there is no need to quote url in any way. I was very happy with this so i’ve added callback support in text() and htmlOuter() methods, like so:
phpquery http://code.google.com/p/phpquery/downloads/list --find '.vt.col_4 a:first' --text strip_tags trim
When i had all stuff working, i’ve used it straight away to scrap forums and categories lists from old IPB v1.x. I’ve piped phpQuery result with sed, filtering final output.
// Fetch categories
./phpquery http://forum.wiadomosc.info/ --find '.maintitle a' | sed -r 's/^.*?c=([0-9]+).+?>(.+?)]*>([^<]*)<.*$/1: // 2/g'