HTTP Range Requests - SOULMINIGRIG

Hello, Munou.

I've always thought that HTTP Get requests retrieve the entire page every time, which I felt was not at all eco-friendly.

If I only want to retrieve necessary information, for example, just the content within <head>, I should be able to retrieve only a part of it, and if not, retrieve the whole thing and extract it.

If this were possible, network transfer costs and the CPU processing for parsing tags like those within head would be lighter and shorter, which seems like an all-around good thing.

Moreover, I thought there's no way HTTP couldn't do this, especially with TCP communication, so I looked it up and it turns out it's possible.

HTTP Range Requests

Test with curl

It seems that you can restrict it by adding a range to the header information using the -H option, like this.

$ curl -H "Range: bytes=0-1024" https://soulminingrig.com/
<!DOCTYPE html>
<html lang="ja"><head><meta charset="utf-8"><meta content="width=device-width,initial-scale=1.0" name="viewport"><title>Home - SOULMINIGRIG</title><meta content="light dark" name="supported-color-schemes"><meta content="hsl(220, 20%, 100%)" media="(prefers-color-scheme: light)" name="theme-color"><meta content="hsl(220, 20%, 10%)" media="(prefers-color-scheme: dark)" name="theme-color"><link href="/pagefind/pagefind-ui.css" rel="stylesheet"><link href="/styles.css" rel="stylesheet"><link href="/feed.xml" rel="alternate" title="SOULMINIGRIG" type="application/atom+xml"><link href="/feed.json" rel="alternate" title="SOULMINIGRIG" type="application/json"><link href="/favicon.png" rel="icon" sizes="32x32" type="image/png"><link href="https://soulminingrig.com/" rel="canonical"><script src="/js/main.js" type="module"></script><style>.page-title{background:var(--color-highlight);padding:.5em;font-size:1.2em}</style><script data-website-id="6031aa47-e715-4f87-a99a-9e3046e5dcdc" defer="" src="https://n

Since it's a range specification, it seems okay even if it's not from the beginning.

$ curl -H "Range: bytes=1025-1025" https://soulminingrig.com/
o

Even just 1 byte returns correctly.

In practice

Since range restrictions can be applied during retrieval just by adding them to the header information, it's basically possible in any language, and above all, it allows for eco-friendly use of network bandwidth, which is often not consciously considered.

If you want to extract only the content within HTML's <head>, you can specify Range: bytes=0-4096. If that doesn't work, you can either retrieve everything, or to be even more eco-friendly, send a Get request for the next range in 4096-byte chunks.

Ultimately, this shortens the HTML tag extraction time for subsequent processing libraries and is also kinder to the target server.

However, as stated on the MDN site,

Assuming that "checking if the server supports partial requests" is generally supported everywhere now is not good; if you want to do it strictly, you must also use conditions to determine it from the header information returned.

However, a new emperor of the internet...

If you are using CloudFlare, which bears the ticker symbol NET, as if to declare itself the new king of the internet, it seems that requests are limited per unit. So, even if processing becomes faster, if a DDoS protection block occurs midway, the benefit doesn't feel strong in the end.

However, if you're running a bot on a VPS, for example, and it consumes a lot of bandwidth with frequent external requests (though probably no one uses it that much...), then partial requests like these might be effective.

Nevertheless, since HTTP requests are supposed to be stored in memory space once made, it's not impossible that memory usage might decrease somewhat. So, if someone is creating a tool to retrieve website information, incorporating this behavior would make it a bit kinder to the target, so it seems necessary to do so as needed.

It was about HTTP, something I thought I knew, but actually didn't.

Test with curl

In practice

However, a new emperor of the internet...

Related Posts