Tobiasz Cudnik

Personal torrent stream (part 1)

In Web Scraping, phpQuery on 19.03.2009 at 2:52

I really don’t like reading movie reviews before actually seeing them. For this reason, i found a new nanocrowd.com service quite useful. It claims to provide very specific movie recommendations based on opinions of anonymous masses. Although it’s accounts are unfortunately still in private beta, we can test recommendations based on one movie.

Long time i’ve been thinking about downloading movies, without actually downloading them. Finding out nanocrowd convinced me to do this. Of course i’ve used phpQuery’s WebBrowser plugin. Scenario is as follows:

  1. Connect to torrentz.net
  2. Fill search box and submit
  3. Filter results
  4. Print results and ask which to download
  5. Get result sites and find supported one
  6. Download to local filesystem
  7. Optionally, send via SSH to another machine

To be practically usefull, script supports following filters:

–min-size
–max-size
–min-peers

So to get a 1st matching movie in most common cases would be

echo 1 | torrent-download.php “Movie Name” –min-peers=50 –max-size=1500 –min-size=500

Now almost every decent torrent downloading app supports autoloding torrent files found in specified directory. As for today, only thing you need to do is to type movie name. Like i’ve mentioned before, nanocrowd is still not open to registration. When it will be, i will post an update (or maybe someone have a free invitation ?). Anyway, don’t use this code to download illegal movies…

torrent-download.php

#!~/Sources/linux/php-5.3.0alpha3/sapi/cli/php
<?php
$defaultDownloadDir = '/dir/path';
$defaultSshHost = "user@host:/dir/path";

$args = arguments($argv);

if (! isset($args['arguments'][0]) || ! $args['arguments'][0])
	die("Usage: torrent-download.php QUERY
Options:
  --target=DIR
  --min-size=MB
  --max-size=MB
  --min-peers=COUNT
    Seeds are counted as 2 peers
  --ssh-target=HOST
    Ex user@host:/target/dir
");
if (! isset($args['options']['target']))
	$args['options']['target'] = $defaultDownloadDir;
if (! isset($args['options']['ssh-target']))
	$args['options']['ssh-target'] = $defaultSshHost;
require('/home/bob/workspace/phpQuery/phpQuery/phpQuery.php');
phpQuery::plugin('WebBrowser');
#phpQuery::$debug = 1;
$url = 'http://torrentz.com';
$sites = array(
	'mininova.org' => 'h2 a',
	'thepiratebay.org' => '.download a:first',
	'btjunkie.org' => 'h1 a',
);
$scope = array(
	'args' => $args,
	'sites' => $sites,
);
$browser = phpQuery::browserGet($url, function($browser) use ($scope) {
	extract($scope);
	//var_dump($browser);
	//die();
	$browser['#thesearchbox']->
		val($args['arguments'][0])->
		parents('form')->
			trigger('submit', array(function($browser) use ($scope) {
				extract($scope);
				//ver_dump($browser);
				$matches = $browser['dt a'];
				if (isset($args['options']['min-peers']))
					$matches = $matches->filter(function($i, $node) use ($scope) {
						extract($scope);
						$spans = pq($node)->parent()->next()->find('span');
						$peers = 2*$spans->eq(2)->text()+$spans->eq(3)->text();
						if ($peers < $args['options']['min-peers'])
							return false;
					});
				if (isset($args['options']['min-size']))
					$matches = $matches->filter(function($i, $node) use ($scope) {
						extract($scope);
						$size = pq($node)->parent()->next()->find('span:eq(1)')->text();
						list($size, ) = explode(' ', $size);
						if ($size < $args['options']['min-size'])
							return false;
					});
				if (isset($args['options']['max-size']))
					$matches = $matches->filter(function($i, $node) use ($scope) {
						extract($scope);
						$size = pq($node)->parent()->next()->find('span:eq(1)')->text();
						list($size, ) = explode(' ', $size);
						if ($size > $args['options']['max-size'])
							return false;
					});
				print "Available results:n";
				$i = 1;
				foreach($matches as $node) {
					$spans = pq($node)->parent()->next()->find('span');
					print ($i++).'. '.pq($node)->text()."t"
						.$spans->eq(1)->text()." ("
							.$spans->eq(2)->text()."/".$spans->eq(3)->text()
						.")n";
				}
				print "Choose result number: ";
				$choice = trim(fgets(STDIN));
				if (! $choice)
					return;
				$result = $matches->eq($choice-1);
				$scope['name'] = trim(str_replace('/', '-', $result->text()));
				$result->WebBrowser()->trigger('click', array(function($browser) use ($scope) {
					extract($scope);
					$siteFound = false;
					foreach($browser['.download .u'] as $node) {
						$site = trim(pq($node)->text());
						if (isset($sites[$site])) {
							$scope['site'] = $site;
							print "Requesting $site...n";
							phpQuery::ajaxAllowHost($site, "www.$site");
							pq($node)->parent()->
								WebBrowser()->
								//dump()->
								trigger('click', array(function($browser) use ($scope) {
									extract($scope);
									// TODO torrent filename
									$browser[$sites[$site]]->downloadTo(
										$args['options']['target'], "$name.torrent"
									);
									$path = $args['options']['target']."/$name.torrent";
									if (file_exists($path) && filesize($path))
										print "Got itn";
									if ($args['options']['ssh-target']) {
										print "Sending via ssh...n";
										exec("scp "$path" {$args['options']['ssh-target']}");
									}
								}));
							$siteFound = true;
							break;
						}
					}
					if (! $siteFound)
						print "No supported site availablen";
				}));
			}));
});

/**
 * @link http://pl.php.net/features.commandline
 *         'exec'      => '',
 *         'options'   => array(),
 *         'flags'     => array(),
 *         'arguments' => array(),
 */
function arguments($args ) {
    $ret = array(
        'exec'      => '',
        'options'   => array(),
        'flags'     => array(),
        'arguments' => array(),
    );

    $ret['exec'] = array_shift( $args );

    while (($arg = array_shift($args)) != NULL) {
        // Is it a option? (prefixed with --)
        if ( substr($arg, 0, 2) === '--' ) {
            $option = substr($arg, 2);

            // is it the syntax '--option=argument'?
            if (strpos($option,'=') !== FALSE) {
            	$t = explode('=', $option, 2);
            	$ret['options'][$t[0]] = $t[1];
//                array_push( $ret['options'], explode('=', $option, 2) );
            } else
            	$ret['options'][$option] = true;
//                array_push( $ret['options'], $option );

            continue;
        }

        // Is it a flag or a serial of flags? (prefixed with -)
        if ( substr( $arg, 0, 1 ) === '-' ) {
            for ($i = 1; isset($arg[$i]) ; $i++)
                $ret['flags'][] = $arg[$i];

            continue;
        }

        // finally, it is not option, nor flag
        $ret['arguments'][] = $arg;
        continue;
    }
    return $ret;
}//function arguments

It uses PHP 5.3 closures, so be sure to be using this version. This was a real test for their’s (closures) functionality and in my opinion without using extract($scope) they couldn’t't be so usefull. Writing the selectors i’ve used quite handy tool SelectorGadget bookmarklet, althought it wasn’t perfect.