html2text is convenient for making HTML files readable in CLI

4 min

language: ja bn en es hi pt ru zh-cn zh-tw

Hello, I'm incompetent.
For about a year now, I've been checking the posts of a former Japanese bond dealer at the following link:
Young Wisdom
However, I can't check past articles because this person seems to be directly editing html files. I want to make it easier to read later what I couldn't keep up with by regularly fetching and reading, but reading html as is is fine from a browser, but it's hard to manage when catting it.
So, I wondered if there was something that could convert it into a Markdown format that even a lazy person like me could read, and it turns out there is.

Install html2text

sudo pacman -Sy artix-archlinux-support
sudo pacman -S html2text

Pipe curl to html2text

This makes it easier to read.

curl "https://soulminingrig.com/" | html2text 

If it's a text-based site, this might be quite good for archiving, but images and such need to have their paths specified correctly, and that processing needs to be done separately.

Above all, the advantage of being able to view it in text format is that it's easy to grep, and with Markdown, it becomes easy to pass the HTML conversion process to an SSG and set up a local site for self-checking.
I'm certainly not a Markdown absolutist, but I don't dislike Markdown as much as the Scrapbox developers do.
Markup Languages - Toshiyuki Masui

One of the recently popular markup languages is Markdown. It seems to have been developed to write HTML more concisely, and it has become popular among engineers because it's standard on GitHub and elsewhere, but frankly, it's too much trouble. I even sometimes think it's easier to write raw HTML. I wish it would go extinct quickly, but it's problematic that many engineers who are half-heartedly accustomed to Markdown misunderstand it as 'Markdown is the best!' Scrapbox's markup notation was adopted after experiencing Wiki, HTML, Scribe, TeX, roff, markdown, and all others, so I would appreciate it if discussions could be held with that in mind. (If there's an even better, supreme notation, I'd gladly adopt it.)

I don't have a particular preference for writing styles as long as it's versatile and not troublesome to write, so for now, I'm immersed in Markdown.

Related Posts