Tobiasz Cudnik

Archive for the ‘phpQuery’ Category

Personal torrent stream (part 1)

In Web Scraping, phpQuery on 19.03.2009 at 2:52

I really don’t like reading movie reviews before actually seeing them. For this reason, i found a new nanocrowd.com service quite useful. It claims to provide very specific movie recommendations based on opinions of anonymous masses. Although it’s accounts are unfortunately still in private beta, we can test recommendations based on one movie.

Long time i’ve been thinking about downloading movies, without actually downloading them. Finding out nanocrowd convinced me to do this. Of course i’ve used phpQuery’s WebBrowser plugin. Scenario is as follows:

  1. Connect to torrentz.net
  2. Fill search box and submit
  3. Filter results
  4. Print results and ask which to download
  5. Get result sites and find supported one
  6. Download to local filesystem
  7. Optionally, send via SSH to another machine

To be practically usefull, script supports following filters:

–min-size
–max-size
–min-peers

So to get a 1st matching movie in most common cases would be

echo 1 | torrent-download.php “Movie Name” –min-peers=50 –max-size=1500 –min-size=500

Now almost every decent torrent downloading app supports autoloding torrent files found in specified directory. As for today, only thing you need to do is to type movie name. Like i’ve mentioned before, nanocrowd is still not open to registration. When it will be, i will post an update (or maybe someone have a free invitation ?). Anyway, don’t use this code to download illegal movies…

Read the rest of this entry »

QueryTemplates – template’s logic and markup in one file

In QueryTemplates, phpQuery on 25.01.2009 at 15:41

Using latest phpQuery revisions you can easily keep template’s markup and logic in one place with 2 extra lines of code. This heavily breaks the general concept, but it may be useful in some cases. Idea is based on support for callbacks as template sources. First you start dumping markup with 1 line.


<?php ob_start(); ?>

After that the markup goes


<html>
  <body>
<div>Hello world</div>
</body>
</html>

And at the end there is whole logic. $markup is dumped using new CallbackReturnValue, which takes a variable as parameter. When callback is called it just returns it.


<?php
require('../src/QueryTemplates.php');
$markup = new CallbackReturnValue(ob_get_clean());
require template('test')->parse($markup)
	->find('div')
		->text('Hello template!')
;

This approach could also be used for getting markup with requestAction in CakePHP views.

toReference() inside chains

In QueryTemplates, phpQuery on 25.01.2009 at 15:26

There is one extremely useful and not-so widely known toReference() method. It saves actually matched elements inside variable, by reference (as the name says). It can be used to target such cases as:

1. Preserving chain


// declare our variables (this is VERY important)
$deepClass = $section = null;
$template['div.main div.otherclass .deep-class:first']
  ->toReference($deepClass)
//  ->...  // do something
  ->find('.section')
    ->toReference($section)
//  ->...  // do something
  ->end()
  ->next()
    // highlighter but; only one empty()
    ->empty()
    ->append($deepClass->contents())
    ->eq(0)->add($section)->eq(1)
//      ->...  // do something, stack is [$section]
    ->end()->end()->end()
//      ->...  // stack is same as after ->next()
;

2. Splitting chains


// declare our variables (this is VERY important)
$row = $titleBody = null;
$template
  ->find('ul:first > li')
    ->loopOne('posts', 'postNum', 'r')
      // add dynamic class
      ->addClassPHP('if (! $postNum) print "first"')
      ->find('> .title, > .body')
        ->varsToStack('r["Post"]', $postFields)
        // save our $titleBody variable
        ->toReference($titleBody)
      ->end()
      // save our $row variable
      ->toReference($row)
//    ->...  // lots of code ;)
;
// just continue your work
$row->find('h3:first, .comments')
// ->...
;
// anywhere...
doSomethingOnFields($titleBody);

In SVN version of QueryTemplates there is a fix for cache issue. The workaround for beta2 is to do “if ($ref)” before, or turning off the cache with “$cacheTimeout = -1″.

Extending QueryTemplates with closures

In QueryTemplates, phpQuery on 25.01.2009 at 14:24

Lastest changes in phpQuery allows us to extend it much more easily. One of the ways to do this is directly same as in jQuery, using phpQuery::extend() method. When you combine this with upcoming closures in PHP 5.3, you will feel like-almost-in-browser. Take a look at this:


$source = dirname(__FILE__).'/table.html';
$rows = array(
	array(
		'field-1' => 'foo1',
		'field-2' => 'foo2',
	),
	array(
		'field-1' => 'bar1',
		'field-2' => 'bar2',
	),
);
// this will work in PHP 5.3
// this code will evaluate on onLoad event, thus is cache firendly
$onload = function() {
  $tableFirstRowClass = function($self, $var, $classname = 'first') {
    return $self->addClassPHP("
      if ($$var == 0)
        print '$classname';
    ");
  };
  // here we pass new closure into extend method, just like in jQuery
  // compact() creates array for us
  phpQuery::extend('phpQueryObject', compact('tableFirstRowClass'));
};
require template('table')->parse($source)
  // just bind it
  ->bind('onload', $onload)
  // just fire it up ;)
  ->trigger('onload')
  ->find('table > tr, table > * > tr')
    ->loopOne('rows', 'i', 'row')
      ->varsToSelector('row', $rows[0])
      // now you can use this just-like-this
      ->tableFirstRowClass('i')
;

But for now to extend phpQuery on-the-fly we have to use create_function or delegate existing one.

We can also use new Scripts plugin feature, which does almost the same. Just like all Scripts’ plugin script files, newly attached function will have all available variables. Example below works in PHP < 5.3:


function onload() {
  // in PHP 5.3, this can be done simply like this
//  phpQuery::script(...
  // but for now we have to use fake phpQuery::$plugins namespace
  phpQuery::$plugins->script('tableFirstRowClass', create_function(
    '$self, $params, &$return, $config', '
    $className = ! $params[1] ? "first" : $params[1];
    $self->addClassPHP("
      if ($$params[0] == 0) print '$className';
    ");
  '));
};
$onload = 'onload';

One major difference between extending Scripts and phpQuery itself is way of using new functions. With our new script, we have to small change to main code.


// using phpQuery::extend()
//->tableFirstRowClass('i')
// using Scripts extend
->script('tableFirstRowClass', 'i')

Test cases for websites using phpQuery and SimpleTest

In Ideas, phpQuery on 13.01.2009 at 23:09

Using phpQuery and some UnitTest framework (SimpleTest in this example) you can automatically test web page for presentence of specific part and it’s position. Set of such tests can save a lot of time during website development and after that even more.

Not only you can do simple tests like “are articles visible” but using WebBrowser plugin you can test whole process, let’s say a user registration. Example of such test i would like to present. This test will include following steps:

  1. Enter main page and follow the registration link
  2. Fill registration form and submit it
  3. Check if result is expected

Like i sad before, SimpleTest is framework of choice, but it doesn’t matter so much. What’s important:

  • WebBrowser needs callbacks
  • Callbacks should be declared as functions (for PHP < 5.3)
  • Inside callbacks, $this variable is unavailable
  • First step of the test should check if last step has succeeded

Full code below:


require('simpletest/autorun.php');
require('phpQuery/phpQuery.php');

class CustomerTest extends UnitTestCase {
	public static $_this;
	public static $registration = array(
		'username' => null,
		'success' => false
	);
	function testRegistration() {
		self::$_this = $this;
		phpQuery::browserGet('http://localhost/tested-site/',
			array('CustomerTest', '_testRegistrationLink')
		);
		$this->assertTrue(
			self::$registration['success'], "Registration unsuccessful"
		);
	}
	function _testRegistrationLink($browser) {
		$registrationLink = null;
		$browser->find('a:contains(rejestracja)')
			->WebBrowser(array('CustomerTest', '_testRegistrationForm'))
			->toReference($registrationLink)
				// jump to _testRegistrationForm
				->click();
		self::$_this->assertTrue(
			$registrationLink->length, "Registration link missing"
		);
	}
	function _testRegistrationForm($browser) {
		$registrationForm = null;
		$username = md5(microtime());
		$browser['.customers.form form']
			->toReference($registrationForm)
			->WebBrowser(array('CustomerTest', '_testRegistrationResult'))
			->find('input[name*=login]')->val($username)->end()
			->find('input[name*=email]')->val($username.'@test.com')->end()
			// jump to _testRegistrationResult
			->submit();
		self::$_this->assertTrue(
			$registrationForm->length, "Registration form missing"
		);
		self::$registration['username'] = $username;
	}
	function _testRegistrationResult($browser) {
		$loginForm = $browser->find('h2:text(Logowanie)');
		self::$_this->assertTrue($loginForm->length, "Login form missing");
		if ($loginForm->length)
			self::$registration['success'] = true;
	}
}

WebBrowser doesn’t support AJAX, so not all sites can be tested like this (although you can do it with AHAH after some work), but cookies and HTTP authentication should satisfy most needs.

Of course that’s noting new, projects accomplishing similar goal exist quite time now, eg jWebUnit, but neither of them have jQuery under the hood  ;)

Having fun using PHP 5.3 closures with phpQuery

In phpQuery on 08.01.2009 at 16:34

Some time ago i’ve wrote about new PHP 5.3 closures feature. Today i would like to show you it in action with phpQuery. If you’re using jQuery you will feel like in home :)

First example illustrates classic inline function, which is used to iterate over li nodes, incrementing each one’s content.


$markup = '
<ul>
	<li>1</li>
	<li>2</li>
	<li>3</li>
</ul>
';
$doc = phpQuery::newDocument($markup);
$doc['li']->each(function($node){
	pq($node)->text(
		pq($node)->text()+1
	);
});
print $doc;

Result will be someting like this (something because i’ve corrected indentation manually).


<ul>
	<li>2</li>
	<li>3</li>
	<li>4</li>
</ul>

Now more complicated stuff – scope inheritance. Scope inheritance means nothing else than ability to use variables declared outside code block (inline function in this case) inside this particular block. In JavaScript we have full inheritance right away. In PHP 5.3 we have to explicitly declare which variable we would like to inherit. It’s done by use keyword.


$markup = '
<div>
	<span>1</span>
	<span>2</span>
	<span>3</span></div>
<div>
	<span>1</span>
	<span>2</span>
	<span>3</span></div>
';
$doc = phpQuery::newDocument($markup);
$doc['div']->each(function($div){
	$div = pq($div);
	$div['span']->each(function($span) use ($div){
		pq($span)->insertBefore($div);
	});
});
print $doc;

We’ve nested one closure inside another. The inner one inherits $div variable from the outer. Lack of such feature for create_function was really problematic. Below you can see the result. Both divs are now empty.


<span>1</span>
<span>2</span>
<span>3</span>
<div></div>
<span>1</span>
<span>2</span>
<span>3</span>
<div></div>

Third example i want to show is about assigning inline function to a variable. It’s handful technique to avoid namespace collisions and of course allow easily pass closure thou the parts of code.


$markup = '
<div>1</div>
<div>2</div>
<div>3</div>
';
$callback = function($node){
	$node = pq($node);
	$node->text(
		'Callbacked: '.$node->text()
	);
};
print phpQuery::newDocument($markup)
	->find('div')->each($callback)->end();

Just how you suspect, every node’s content will be prefixed with “Callbacked: “.


<div>Callbacked: 1</div>
<div>Callbacked: 2</div>
<div>Callbacked: 3</div>

I think that this post clearly illustrates how closures work and that they are important for PHP as web development language.

If you would like to test code from this post and PHP 5.3 in general, download windows build from php.net or compile source yourself for other platform. CLI version will be enough, that means no apache module struggling.

phpQuery edits it’s own wiki

In phpQuery on 05.10.2008 at 3:05

phpQuery can connect to Google Code’s wiki editing form, authorize itself, replace page contents and finally submit the form.

// declare main variable
$pq = null;
// create dummy document as start point
phpQuery::newDocument('
<div>')
// authorize your google account
	->script('google_login')
// redirect authorized XHR component to googlecode's wiki form
	->location('http://code.google.com/p/phpquery/w/edit/test')
// save result as $pq, althought we could continue the chain
// but it would break in case of error...
		->toReference($pq);
if ($pq) {
// read about CallbackReference later in this post...
	$pq->WebBrowser(new CallbackReference($pq))
		->find('textarea:first')
			->val('lorem ipsum')
			->parents('form')
// first submit is Preview, so fire up second
				->find(':submit:eq(1)')
// triggering submit event thought input[type=submit]:click
// is a way to choose which submit is send
					->click();
	if ($pq) {
// print without  tags
		print $pq->script('safe_print');
	}
}

You can notice hot new feature – new CallbackReference($pq). Such callback sets first callback parameter to passed variable, by reference. Such pattern works with all methods accepting callbacks. Thanks to that, we can use if statements instead of function callbacks. In above example, CallbackReference object is called when click event is triggered.

Presented code snippet makes use of new Script plugin, particularly google_login.

Now it can be combined with automated XML documentation to wiki script, but this maybe for jQuery 1.3

Scripts plugin for phpQuery

In phpQuery on 05.10.2008 at 3:05

Scripts plugin is an easy file includer. Each script file have 4 variables in scope:

  • $self Represents $this
  • $params Represents parameters passed to script() method (without script name)
  • $return If not null, will be used as method result
  • $config Content of __config.php file

Possible use cases

  • Authorizations
  • Custom Chains
  • Simple phpQuery plugins
  • Sort functions
  • Website APIs
  • Event binding groups

Example

$return = $self->find($params[0]);

Actually there is only one script which is Google Login. It doesn’t support GMail yet, unfortunately.

Web Scraping with cli version of phpquery piped with sed

In Web Scraping, phpQuery on 05.10.2008 at 3:05

I’ve done what i was thinking about for some time. Terminal-firendly phpQuery CLI interface. Took about 10 minutes of coding… Works like this:


phpquery http://code.google.com/p/phpquery/downloads/list --find '.vt.col_4 a:first' --contents

This will return number of downloads latest phpQuery release file. Notice there is no need to quote url in any way. I was very happy with this so i’ve added callback support in text() and htmlOuter() methods, like so:


phpquery http://code.google.com/p/phpquery/downloads/list --find '.vt.col_4 a:first' --text strip_tags trim

When i had all stuff working, i’ve used it straight away to scrap forums and categories lists from old IPB v1.x. I’ve piped phpQuery result with sed, filtering final output.

// Fetch categories
./phpquery http://forum.wiadomosc.info/ --find '.maintitle a' | sed -r 's/^.*?c=([0-9]+).+?>(.+?)]*>([^<]*)<.*$/1: // 2/g'

phpQuery – a jQuery port to PHP

In phpQuery on 07.07.2007 at 10:56

This post is deprecated and some informations below are outdated.

phpQuery is PHP-port of jQuery – well known and great web2.0 JS library

It’s something different than jQPie, which is form of JS code generator and server-client layer.

For example You can do something like this:


print _('file.htm')
    ->find('body div.cls1.cls2 ul > li:first')
        ->parent()
            ->prepend('
	<li>my new first LI</li>
')
            ->parents('.myClass')
                ->remove()
                ->end()
            ->appendTo('body')
            ->parents('html')
                ->html();

Code above will find first LI inside specific UL, then move pointer into it’s parent (UL),
then prepend (add at the beginning) new LI, then pointer will move to parent element with class .myClass,
which will be removed, and pointer will go back to UL (with end() method), and then UL will be appended to BODY (moved, not copied).
Atfer all this operations parent with tag HTML will be searched and it’s content will be returned to print statement.

phpQuery acts almost like jQuery – it returns new instance on certain methods and allows to revert stack.

It works on DOM Extension and is designed for PHP5 only.

There is almost no docs yet, so please refer to jQuery’s one (DOM section).

Difference against jQuery

phpQuery differs in some cases from jQuery:

  1. Iteration
  2. Callbacks
  3. No DOM nodes
  4. In some method names (PHP reserved words)
  5. PHP specific addons
  6. Other addons

Iteration

phpQuery makes use of PHP’s SPL Iterator interface, so You can do:


foreach(_('ul>*') as $_li) {
	$_li->prepend('new beginning of every LI');
}

Callbacks

PHP doesn’t have closures, but You can still use callbacks – direct or created with create_function() like so:


function imTheCallback($_node){
	$_node->html("i'm changed content");
}
class imTheClass {
	static function imTheStaticCallback($_node){
		$_node->html("i'm changed content v2");
	}
	function imTheCallbackToo($_node){
		$_node->html("i'm changed content v3");
	}
}
$class = new imTheClass;
_('ul>*')
	.each('imTheCallback')
	.each(array('imTheClass', 'imTheStaticCallback'))
	.each(array($class, 'imTheCallbackToo'))
	.each(create_function('$_node', '
		$_node->html("i\'m changed content v4");'
	));
}

No DOM nodes

Every node passed to callback or inside iteration is phpQuery object, not a DOM node. Also there isn’t a get() method.

Method names

There are several methods in jQuery’s interface with names which couldn’t be used as PHP class method or was changed to preserve consistent naming convention.

All those methods have been prefixed with _underscore and here’s the list:

  • _clone
  • _next
  • _prev
  • _empty

PHP specific addons

There are couple of PHP specific addons in phpQuery for easier developement:

  • appendPHP($code) – equals to append(<?php $code ?>)
  • prependPHP($code) – equals to prepend(<?php $code ?>)
  • beforePHP($code) – equals to before(<?php $code ?>)
  • afterPHP($code) – equals to after(<?php $code ?>)
  • attrPHP($attr, $code) – equals to attr($attr, <?php $code ?>)
  • php($code) – equals to html(<?php $code ?>)
  • phpPrint($code) – equals to html(<?php print $code ?>)
  • phpMeta($selector, $code) – equals to find($selector)->php($code)->end()
  • __toString() – equals to htmlWithTag()

Other addons

There is/will be several methods not present in standard jQuery, which i use (with jQuery) in my projects. More about this later.

Development status

Actually phpQuery seems to be quite stable and is main part of plainTemplates lib, which powers this blog.

Although there are couple of things to be done:

  • Dedicated docs (copy jQuery’s one, add PHP specific, generate phpdoc)
  • Missing methods (css, val)

Download and links

Here are the link which could be helpfull when dealing with phpQuery: