Python Readability - Strip Webpage of Junk Content - Linux CLI

preview_player
Показать описание

readability-lxml
Given a html document, it pulls out the main body text and cleans it up.

nodejs readability-cli version
Рекомендации по теме
Комментарии
Автор

Excellent information! Installed immediately.

GertBoers
Автор

Thx for this video ! Take care of you man !

cyberdram
Автор

really cool. thanks man! wish this would be a standard in modern day browsers.

fabianfi
Автор

in my experience mercury parser is much more proficient in site extraction, it's a bit more involved to set up though. I'm using it in conjunction with my rss reader

MrBlaskat
Автор

this is great dude. I have the original javascript (no node) readability version that i can use as a bookmarklet, but the code is obscured so i can't customize or update it. I have been wanting to make a script, that takes a list of urls and turns it into a "magazine" optimized for e-ink (kindle). I think this python script you show here will be very useful for that.

dubbeltumme
Автор

@gotbletu I have improved your w3m keymap so you dont have to press ok to open the cleaned page,
use READ_SHELL instead of READ in the command, i also send standard error to /dev/null to suppress errors in the terminal

I made a video about the w3m keymap using READ_SHELL

w3m Browser - Reader view, strip junk from webpages and show only the article

Heres the w3m keymap using READ_SHELL and sending standard error to /dev/null

keep up the good work

NapoleonWilsn
visit shbcf.ru