Command-line interfaces¶
storytracker-archive¶
Usage: storytracker-archive [URL]... [OPTIONS]
Archive the HTML from the provided URLs
Options:
-h, --help show this help message and exit
-v, --do-not-verify Skip verification that HTML is in the response's
content-type header
-m, --do-not-minify Skip minification of HTML response
-e, --do-not-extend-urls
Do not extend relative urls discovered in the HTML
response
-c, --do-not-compress
Skip compression of the HTML response
-d OUTPUT_DIR, --output-dir=OUTPUT_DIR
Provide a directory for the archived data to be stored
Example usage:
# This will pipe out gzipped content of the page to stdout
$ storytracker-archive http://www.latimes.com
# You can save it to an automatically named file a directory you provide
$ storytracker-archive http://www.latimes.com -d ./
# If you'd prefer to have the HTML without compression
$ storytracker-archive http://www.latimes.com -c
# Which of course can be piped into other commands like anything else
$ storytracker-archive http://www.latimes.com -cm | grep lakers
storytracker-get¶
Usage: storytracker-get [URL]... [OPTIONS]
Retrieves HTML from the provided URLs
Options:
-h, --help show this help message and exit
-v, --do-not-verify Skip verification that HTML is in the response's
content-type header
Example usage:
# Download an url like this
$ storytracker-get http://www.latimes.com
# Or two like this
$ storytracker-get http://www.latimes.com http://www.columbiamissourian.com
storytracker-links2csv¶
Usage: storytracker-links2csv [ARCHIVE PATHS OR DIRECTORIES]...
Extracts hyperlinks from archived files or streams and outputs them as comma-
delimited values
Options:
-h, --help show this help message and exit
Example usage:
# Extract from an archived file
$ storytracker-links2csv /path/to/my/directory/http!www.cnn.com!!!!@2014-07-22T04:18:21.751802+00:00.html
# Extract from a directory filled with archived file
$ storytracker-links2csv /path/to/my/directory/