mashup, php, torrent, webbrowser
In Web Scraping, phpQuery on 19.03.2009 at 2:52
I really don’t like reading movie reviews before actually seeing them. For this reason, i found a new nanocrowd.com service quite useful. It claims to provide very specific movie recommendations based on opinions of anonymous masses. Although it’s accounts are unfortunately still in private beta, we can test recommendations based on one movie.
Long time i’ve been thinking about downloading movies, without actually downloading them. Finding out nanocrowd convinced me to do this. Of course i’ve used phpQuery’s WebBrowser plugin. Scenario is as follows:
- Connect to torrentz.net
- Fill search box and submit
- Filter results
- Print results and ask which to download
- Get result sites and find supported one
- Download to local filesystem
- Optionally, send via SSH to another machine
To be practically usefull, script supports following filters:
–min-size
–max-size
–min-peers
So to get a 1st matching movie in most common cases would be
echo 1 | torrent-download.php “Movie Name” –min-peers=50 –max-size=1500 –min-size=500
Now almost every decent torrent downloading app supports autoloding torrent files found in specified directory. As for today, only thing you need to do is to type movie name. Like i’ve mentioned before, nanocrowd is still not open to registration. When it will be, i will post an update (or maybe someone have a free invitation ?). Anyway, don’t use this code to download illegal movies…
Read the rest of this entry »
bash, dom, php, phpQuery, scraping
In Web Scraping on 05.11.2008 at 17:11
When some project doesn’t use SVN or any other version-control system (or you can’t use it) you have to download things manually. I don’t have to say that nobody wants to do this, so what can you do to not do it ? You can simulate yourself doing it…
Example below downloads latest release of madwifi branch with new HAL (which i need for my WiFi adapter).
#!/usr/bin/php
find('table tr')->slice(-2, -1)->downloadTo('/target/local/path');
}
?>
Now, to get latest release all i need is to run above script from command line. One missing thing is checking if anything has changed but i leave it to you to resolve ;)
For files which names doesn’t change you can just use wget, like so:
wget 'http://host.net/somefile.zip' -O new-name.zip
cli, dom, html, php, phpQuery, scraping
In Web Scraping, phpQuery on 05.10.2008 at 3:05
I’ve done what i was thinking about for some time. Terminal-firendly phpQuery CLI interface. Took about 10 minutes of coding… Works like this:
phpquery http://code.google.com/p/phpquery/downloads/list --find '.vt.col_4 a:first' --contents
This will return number of downloads latest phpQuery release file. Notice there is no need to quote url in any way. I was very happy with this so i’ve added callback support in text() and htmlOuter() methods, like so:
phpquery http://code.google.com/p/phpquery/downloads/list --find '.vt.col_4 a:first' --text strip_tags trim
When i had all stuff working, i’ve used it straight away to scrap forums and categories lists from old IPB v1.x. I’ve piped phpQuery result with sed, filtering final output.
// Fetch categories
./phpquery http://forum.wiadomosc.info/ --find '.maintitle a' | sed -r 's/^.*?c=([0-9]+).+?>(.+?)]*>([^<]*)<.*$/1: // 2/g'