When using cURL or other tools, sometimes we need a proper HTTP request header to avoid being identified as robots. Here are some examples.
Chrome on Linux (x64):
curl '$URL' \
-H 'cache-control: max-age=0' \
-H 'dnt: 1' \
-H 'upgrade-insecure-requests: 1' \
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36' \
-H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
-H 'accept-language: zh-CN,zh;q=0.9' \
--compressed
Firefox with privacy.resistFingerprinting
setting to true:
curl '$URL' \
-H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0' \
-H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' \
-H 'Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2' \
--compressed \
-H 'DNT: 1' \
-H 'Connection: keep-alive' \
-H 'Upgrade-Insecure-Requests: 1' \
-H 'Cache-Control: max-age=0' \
-H 'TE: Trailers'
Besides, the User-Agent
of Chrome on Windows 10 (x64) should be like this:
Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36
The Accept-Language
of visitors from the US should be like this:
en-US,en;q=0.5
And Cache-Control
can be set to no-store
to bypass cache.