From the command line¶
Once installed, you can start using storytracker’s command-line tools immediately, like storytracker.archive().
$ storytracker-archive http://www.latimes.com
That should pour out a scary looking stream of data to your console. That is the content of the page you requested compressed using gzip. If you’d prefer to see the raw HTML, add the --do-not-compress option.
$ storytracker-archive http://www.latimes.com --do-not-compress
You could save that yourself using a standard UNIX pipeline.
$ storytracker-archive http://www.latimes.com --do-not-compress > archive.html
But why do that when storytracker.create_archive_filename() will work behind the scenes to automatically come up with a tidy name that includes both the URL and a timestamp?
$ storytracker-archive http://www.latimes.com --do-not-compress --output-dir="./"
Run that and you’ll see the file right away in your current directory.
# Try opening the file you spot here with your browser $ ls | grep .html
UNIX-like systems typically come equipped with a built in method for scheduling tasks known as cron. To utilize it with storytracker, one approach is to write a Python script that retrieves a series of sites each time it is run.
import storytracker SITE_LIST = [ # A list of the sites to archive 'http://www.latimes.com', 'http://www.nytimes.com', 'http://www.kansascity.com', 'http://www.knoxnews.com', 'http://www.indiatimes.com', ] # The place on the filesystem where you want to save the files OUTPUT_DIR = "/path/to/my/directory/" # Runs when the script is called with the python interpreter # ala "$ python cron.py" if __name__ == "__main__": # Loop through the site list for s in SITE_LIST: # Spit out what you're doing print "Archiving %s" % s try: # Attempt to archive each site at the output directory # defined above storytracker.archive(s, output_dir=OUTPUT_DIR) except Exception as e: # And just move along and keep rolling if it fails. print e
Scheduling with cron¶
Then edit the cron file from the command line.
$ crontab -e
And use cron’s custom expressions to schedule the job however you’d like. This example would schedule the script to run a file like the one above at the top of every hour. Though it assumes that storytracker is available to your global Python installation at /usr/bin/python. If you are using a virtualenv or different Python configuration, you should begin the line with a path leading to that particular python executable.
0 * * * * /usr/bin/python /path/to/my/script/cron.py