So, the whole fat rhetorical stuff has nicely been put away in the previous post and now I can comfortably get a bit greasy about how Planet DCS@UofT was set up.
For one thing, a call out to the community: if you can think of a better name for the aggregator (eg with more vowels) please let me know. Also, if you feel you have creativity to spare and you know CSS, you are welcome to contribute.
The Planet uses …planet, an “awesome ‘river of news’ feed reader”. Planet is essentially an elaborate Python script that “downloads news feeds published by web sites and aggregates their content together into a single combined feed, latest news first. It uses Mark Pilgrim‘s Universal Feed Parser to read from RDF, RSS and Atom feeds; and Tomas Styblo’s templating engine to output static files in any format you can dream up.” It “was originally developed for Planet GNOME and Planet Debian.” That said, our Planet uses the stable 2.0 version of planet.
After Alan Rosenthal‘s sound advice, the Planet now lives in a User-managed webserver at the main CSLab server. User-managed webservers themselves are a neat idea, and following the instructions at the User-Managed Webserver HOWTO, setting one up was straightforward. The only point where things diverged from the howto was that I was not asked for a prefix path when running the makesite script and ended up with the default folder named “site”. Apart from that, I have to note that planet does not require mysql (which is seemingly a little bit more fuss), and that it only needs one apache module, mod_python which comes enabled by default.
Alan also made sure that the user-managed webserver was proxied from http://www.cs.toronto.edu:40098 to http://www.cs.toronto.edu/~famelis/planet.
The instructions in the INSTALL file in the planet distribution are also pretty much strightforward. I will only write about some points I think need some clarification.
I extracted the tarball in ~/site/var/www/planet/
As mentioned in the installation instructions, I created a copy of the examples folder which I named “dcs”, and therefore now had ~/site/var/www/planet/dcs/
I chose the “fancy” version (lazyness is a virtue) and therefore the configuration of the planet was done by editing ~/site/var/www/planet/dcs/fancy/config.ini
Some important points in config.ini:
link = http://www.cs.toronto.edu/~famelis/planet/dcs
(This is important so as to have the outputed atom feed of the aggregator work properly)
cache_directory = dcs/cache
template_files = dcs/fancy/index.html.tmpl dcs/atom.xml.tmpl dcs/rss20.xml.tmpl dcs/rss10.xml.tmpl dcs/opml.xml.tmpl dcs/foafroll.xml.tmpl
output_dir = u/famelis/site/var/www/
In the bottom of the file, live links to the various feeds that are syndicated in the Planet. To add or remove a feed, I simply have to add/remove a line there.
The last one is important as it is the directory where planet outputs its generated files, so it must be the place that the user-managed server serves and also, the place where the stylesheet and the images folder live, as set up in ~/site/var/www/planet/dcs/fancy/index.html.tmpl where (again, after Alan’s sound advice) I set the stylesheet as
rel="stylesheet" href="planet/planet.css"
thereby fixing a trailing slash problem that surfaced when the planet was proxied to http://www.cs.toronto.edu/~famelis/planet. The file index.html.tmpl can also be edited to pleasure (to eg add the phrase “If the feed of your blog is not syndicated here and you think it should, or if the feed of your blog is syndicated here and you think it shouldn’t, please contact me” in the sidebar, etc) .
All that said, planet works essentially as a script that has to be invoked each time we want our aggregator to fetch feeds. I therefore created this trivial shellscript at ~/bin/update-planet.sh:
#!/bin/bash cd ~/site/var/www/planet ./planet.py dcs/fancy/config.ini
After making it executable, I added the following line to crontab (with “crontab -e”) so as to have the Planet be refreshed every 10 minutes.
*/10 * * * * /u/famelis/bin/update-planet.sh
And that’s all 🙂
One more thing to note is that I initially had planet set up on my personal computer (which is inaccessible to the general internet by CSLab policy, and with good reason). The setup I did there was from the appropriate Ubuntu package. The major difference between that installation and the one on the user-managed webserver is that instead of “config.ini” you get /etc/planet.conf.
*boggled*
I’ve gone through the hassle of setting up the user-managed webserver (for the csgsbs website), and I know what a pain it can be (keep an eye on it cause the servers will randomly go down from time to time).
Have you ever heard of Yahoo Pipes? The aggregation of blogs would take about (3sec x feed)…and feed aggregation is just the simplest of all things Pipes can do:
Query the new york times headlines on flickr and display a feed of photos:
http://pipes.yahoo.com/pipes/pipe.info?_id=vvW1cD212xGMiR9aqu5lkA
Feed of apartment listings near a specified place:
http://pipes.yahoo.com/pipes/pipe.info?_id=5bd39564344cffbc9c9fabbeecec1576
…etc…
The possibilities are endless once you figure out the power you’ve got under the hood, and I’ve never seen anything as simple to use that does the same thing.
Cheers
I hope that it won’t be much pain 🙂
I know about Pipes, they are really cool, and I really like their “visual programming” smell. But for this I wanted something more concrete than a pipe (and I’ve wanted to setup a planet for some time anyway).
Pipes are great, but: they are slower than planets and as plagal said above, planet feels to be “more concrete”.