Windows 上使用 Git bash 打开使用 curl 和 grep 抓取目标信息的 .sh 文件
跳到导航
跳到搜索
抖音抓取指定 UP IP 属地
douyin.sh
curl 'https://www.douyin.com/user/MS4wLjABAAAAzzmS2TgIEvxGftMpWD13Ty8k5HmsjlGsLJ1yBUEm2Ew' \
-H 'authority: www.douyin.com' \
-H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
-H 'accept-language: en' \
-H 'cache-control: max-age=0' \
-H 'cookie: __ac_nonce=063dca5970090fe858f5a; __ac_signature=_02B4Z6wo00f01NIFEqwAAIDBsQ.Sxa5tYWjSJRYAAFdjd4; __ac_referer=__ac_blank' \
-H 'referer: https://www.douyin.com/user/MS4wLjABAAAAzzmS2TgIEvxGftMpWD13Ty8k5HmsjlGsLJ1yBUEm2Ew' \
-H 'sec-ch-ua: "Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Windows"' \
-H 'sec-fetch-dest: document' \
-H 'sec-fetch-mode: navigate' \
-H 'sec-fetch-site: same-origin' \
-H 'upgrade-insecure-requests: 1' \
-H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36' \
-s --compressed|grep -Po "IP属地:[\p{Han}]+"
exec /bin/bash
Windows 上应用了默认使用 git-bash 打开 .sh 文件;
“exec /bin/bash ”是为了让命令执行完后不马上关闭
IMDB 抓取流浪地球2评分
imdb.sh
curl 'https://www.imdb.com/title/tt13539646/' \
-H 'authority: www.imdb.com' \
-H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
-H 'accept-language: en,zh-CN;q=0.9,zh;q=0.8,fr;q=0.7,zh-TW;q=0.6,ja;q=0.5' \
-H 'cookie: session-id=139-2334100-2972150; session-id-time=2082787201l; ubid-main=135-5881782-9815538; session-token=Cyab6j6SHZaTPdq0yojXPUaaNt9O1hksOOJ4pWtEWA8TH9WWTKTW0aVzfxdKZqYbLWzAKBIAU4Mvex4Aa4chyneCgRbXtsqrKBC52kH9t/ZVeRYwdrWOkCie+bRpzuxyC4RVzheR2M/4Hk43xXYZ8XOGnvPzibPUiDo8GB83D/i3+73GwFHvvSxvR55eqUZGbKwpBEfYFE2vRe0zxItE9w==; csm-hit=tb:88M9APYHDF5Y9A8317N4+b-FN1KDEPABZDN620D4DM8|1675488312573&t:1675488312573&adb:adblk_no' \
-H 'sec-ch-ua: "Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Windows"' \
-H 'sec-fetch-dest: document' \
-H 'sec-fetch-mode: navigate' \
-H 'sec-fetch-site: none' \
-H 'sec-fetch-user: ?1' \
-H 'upgrade-insecure-requests: 1' \
-H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36' \
-s --compressed|grep -oP "<span class=\"sc-7ab21ed2-1 eUYAaq\">(\d\.\d)</span>|<div class=\"sc-7ab21ed2-3 iDwwZL\">((\d+|\d+\.\d)K)</div>"|head -2|grep -oP "\d+K|\d+\.\dK|(\d\.\d)"
exec /bin/bash
head -1 是存在多个匹配捕获的情况下只要第1个
grep -oP 后面跟着的 | 符号分隔的正则表达式是有优先顺序的,如果是把“\d+K|\d+\.\dK|(\d\.\d)”改成“(\d\.\d)|\d+K|\d+\.\dK”,那么在匹配“6.2K”这种字符串的时候,出来的就是“6.2”,但现在如果是按现在这样“\d+K|\d+\.\dK|(\d\.\d)”,就能把“6.2K”匹配出来。