Or how to read HN offline.
- Uses the official HN API
- No configuration files
- CLI
- Compatible w/ cron jobs
- MIME
multipart/alternative
mails w/ html & txt portions - Mostly stateless
- Read-only
- No up-voting support or score updates
- nodejs 0.10.3x (doesn't work w/ node 0.12 or iojs due to request dependency)
rnews
CL util from INN package- w3m browser
(in Fedora 21)
# yum install w3m inn
Add this to sudoers (replacing alex
w/ your user name):
alex ALL = (news) NOPASSWD: /bin/rnews
Then in the cloned repo:
$ make
or just
# npm install -g hackernews2nntp
# /usr/libexec/news/ctlinnd newgroup news.ycombinator
must not raise an error.
Then
$ hackernews2nntp-get exact 8874 -v | hackernews2nntp-convert -v -f mbox > 1.mbox
will download a HN comment & convert it to mbox format. If you have mutt
installed, you can view it via mutt -f 1.box
.
$ hackernews2nntp-get exact 8874 -v | hackernews2nntp-convert -v | sudo rnews -N
will post the same comment to news.ycombinator
group. If the message
didn't appear (because it's too old (Apr 2007) for a default INN
settings), run
$ journalctl /bin/rnews
$ journalctl -u innd
-
Get top 100 stories & all comments for them, then exit:
$ hackernews2nntp-get top100 -v | hackernews2nntp-convert -v | sudo rnews -N
If you get an EPIPE error, don't pipe to rnews but try to invoke
hackernews2nntp-conver
w/--fork
option:$ hackernews2nntp-get top100 -v | hackernews2nntp-convert -v --fork
(It will call
sudo rnews -N
internally for each article.) -
Get last 200 stories/comments, then exit:
$ hackernews2nntp-get last 200 -v --nokids | hackernews2nntp-convert -v | sudo rnews -N
-
Don't post anything to an NNTP server but create 1 big .mbox file:
$ rm 1.mbox $ hackernews2nntp-get top100 -v | hackernews2nntp-convert -v -f mbox >> 1.mbox
-
Get stories/comments in range from 8,000,000 to 8,000,100:
$ hackernews2nntp-get -v --nokids range 8000000 8000100 | hackernews2nntp-convert -v | sudo rnews -N
-
Get stories/comments from 8859730 up to the most current one & save the last (highest numerical value) item id in
/tmp/last-item.txt
:$ hackernews2nntp-get -v --maxitem-save /tmp/last-item.txt --nokids range 8859730 | hackernews2nntp-convert -v | sudo rnews -N
-
I have a problem w/ rnews.
Please, don't ask me any questions about INN. I have a very vague idea how it works. I've chosen rnews because it (a) can read articles form stdin in a batch mode, (b) doesn't modify the incoming article, (c) fast, (d) comes w/ INN.
Unfortunately it's not possible to know 'was the article posted or not' w/o reading INN logs.
-
Can hackernews2nntp run as a daemon?
No.
-
What does
hackernews2nntp-convert warning: json validation failed
message mean?Usually it means that a HN post was deleted & there was no usefull data in the json payload. For example,
$ hackernews2nntp-get exact 126809 | json -g -c 'this.deleted' [ { "deleted": true, "id": 127217 } ]
vs.
$ hackernews2nntp-get exact 126809 | json -g \ -c '!this.kids && this.by == "pg" && this.type == "comment"' | json 0 { "by": "pg", "id": 126816, "parent": 126809, "text": "As you can see, we do. You can read more [...]", "time": 1204404016, "type": "comment" }
- Barely tested on Fedora 21 only.
- Supports only UTF-8 locale.
- Don't follow 'parent' property, e.g. if it gets a comment, it tries to download all its 'kids', but ignores the 'parent'.
hackernews2nntp-get
can pause node 0.10.x process if you're not using--nokids
option.src/crawler2.coffee
is too long.
rnews(1), w3m(1), mbox(5), sudoers(5)
- hackernews2nntp-get
- fix a crash in json validation
- hackernews2nntp-get
- totally rewrite Crawler
- throttle a max number of http requests by 20/s (see
--conn-per-sec
)
-
hackernews2nntp-get
range
mode--maxitem-save
CLO-s
CLO- always print statistics on exit w/
-v
or-s
CLOs
-
hackernews2nntp-convert
--template-dir
CLO- fix a bug in mbox header w/ missing leading zeros
Many thanks to John Magolske for suggestions
for hackernews2nntp-get range
mode & --maxitem-save
CLO & also for
reporting bugs.
MIT.