"3:16 All scripture is given by a procedural argument to instantiate."
(kingjamesprogramming.tumblr.com)
This script downloads serial web pages. It follows the Next or Next Chapter link of each page (no 'Table of Contents' page is required), does some formatting and cleanup of the retrieved html, and outputs all chapters to one large HTML file.
ChapterChainer is heavily commented and uses descriptive variable names to make adding new serials fairly easy. Non-story pages (such as 'Author's Notes') can optionally be skipped or appended to the story.
- Abelson, Sussman & Sussman: Structure and Interpretation of Computer Programs, 2nd edition ('SICP')
- Alexander, Scott: Unsong (Author’s Notes optional, non-story announcements/greetings omitted)
- Walter: The Fifth Defiance ('T5D')
Python 3, BeautifulSoup4, lxml, html5lib
ChapterChainer.py Title [option] [URL]
ChapterChainer.py URL
Invoke the script with one of the builtin titles (SICP
, T5D
, Unsong
),
one of the switches if applicable (see below), your start URL if you don't want to start at the serial's first page.
Alternatively, just state the URL where you want to start downloading.
All arguments are case sensitive.
Optional switches for pages not being part of the story (e.g., Author's Notes, Greetings, Postscript); currently only for 'Unsong':
[--omit | --append | --chrono[logical]]
-omit
skips these pages, --append
collects and puts them after the story, the default --chronological
(or --chrono
) keeps them interspersed between chapters in order of publication.
ChapterChainer.py Unsong --omit
downloads Unsong without the non-story pages to the working directory.
Pages not published at the time of this script update may not be found if the 'Next' link has been changed. Links from a story to epilogue, afterword, author's blog, next story, etc. are not followed.
Please donate to the authors for their writing! Using this script can deny them some needful income from advertising. Easy donation options are usually on their sites. And you can vote daily on topwebfiction.com if you enjoy reading.
- Structure and Interpretation of Computer Programs:
- ???
- The Fifth Defiance:
- Unsong:
This script must not be used to publish or circulate a serial without its author's permission. This would severely curtail their chances to sell the manuscript, and with no money to make they may give up writing for the web altogether. Also, few could afford the punitive damage for a lost film series deal. Sorry for the moralizing. Wildbow's works have been deleted from the built-in serials because he does not endorse scraping.
This project contains code originally (c) 2014 JordanSekky (https://github.com/JordanSekky/BookWorm).