用户代理字符串_用户代理字符串(或者,不要让我追随您)
用戶代理字符串
A very long time ago (read: ten years ago), we were in-between the so-called First and Second Browser Wars. Internet Explorer had killed Netscape Navigator by taking advantage of their desktop monopoly and Scrooge McDuck-like financial reserves to install a free copy of Internet Explorer on every single computer in the world (basically). Internet Explorer 6 was the dominant browser, and Netscape as a company was over.
很久以前(讀:十年前),我們處于所謂的“第一次和第二次瀏覽器大戰(zhàn)”之間。 Internet Explorer通過利用其臺(tái)式機(jī)壟斷和類似Scrooge McDuck的財(cái)務(wù)準(zhǔn)備金殺死了Netscape Navigator,從而在世界上的每臺(tái)計(jì)算機(jī)上(基本上)安裝了Internet Explorer的免費(fèi)副本。 Internet Explorer 6是主要的瀏覽器,而Netscape公司已經(jīng)結(jié)束。
Netscape, before their demise, had embarked on a project to totally rewrite their web browser. Their new code was open-sourced and given to the Mozilla foundation. In hindsight, this was a stunningly successful move, with the ever-awesome Mozilla Foundation going from strength to strength now, nearly ten years after its foundation.
Netscape滅亡之前,已經(jīng)著手進(jìn)行一個(gè)項(xiàng)目以完全重寫其Web瀏覽器。 他們的新代碼是開源的,并提供給Mozilla基金會(huì)。 事后看來,這是一次令人驚訝的成功舉動(dòng),令人難以置信的Mozilla基金會(huì)在成立近十年后,如今正不斷壯大。
The second browser war was initially a festering cold war between the reborn Netscape Navigator (now entitled Mozilla Firefox) and the dormant Internet Explorer 6 (eventually updated to IE 7 after a 6 year development freeze). Later, other parties like Google Chrome joined the party. Oh, and Safari and Opera were kinda floating around in this war too, but honestly they’re not that important to the story I’m trying to tell.
第二次瀏覽器之戰(zhàn)最初是重生的Netscape Navigator(現(xiàn)稱Mozilla Firefox)和Hibernate的Internet Explorer 6(在長達(dá)6年的開發(fā)凍結(jié)后最終更新為IE 7)之間的一場(chǎng)激烈的冷戰(zhàn)。 后來,像Google Chrome這樣的其他聚會(huì)也加入了聚會(huì)。 哦,在這場(chǎng)戰(zhàn)爭中,Safari和Opera也在其中徘徊,但說實(shí)話,它們對(duì)我要講的故事并不那么重要。
Anyway, long story still kinda long, as part of these two browser wars, browsers felt the need to compete with each other on features. However, to use these features you needed to get web developers to build web sites that used them. The problem is that your new feature would only work on your browser. This meant that, when some poor soul came along trying to view your super-awesome ActiveX powered web page, and they had the misfortune to be using Netscape Navigator, your website would at best look awful, and at worst explode in several mysterious ways.
無論如何,長話短說還算長,作為這兩次瀏覽器大戰(zhàn)的一部分,瀏覽器感到有必要在功能上相互競(jìng)爭。 但是,要使用這些功能,您需要使Web開發(fā)人員構(gòu)建使用它們的網(wǎng)站。 問題在于您的新功能只能在您的瀏覽器上使用。 這意味著,當(dāng)一些可憐的人試圖查看您的超贊ActiveX驅(qū)動(dòng)的網(wǎng)頁,而他們不幸使用Netscape Navigator時(shí),您的網(wǎng)站最好看起來糟透了,最糟糕的是會(huì)以幾種神秘的方式爆炸。
These people would then go away and tell their friends about your crappy website that wouldn’t even render properly! And they’d say that their friends should use your competitor’s website, even though your competitor can’t even spell ActiveX! And you’d go out of business and your children would have to go to a state school, and it would just be horrible.
然后這些人會(huì)走開,并告訴他們的朋友您的糟糕網(wǎng)站甚至無法正確呈現(xiàn)! 他們說他們的朋友應(yīng)該使用您競(jìng)爭對(duì)手的網(wǎng)站,即使您的競(jìng)爭對(duì)手甚至無法拼寫ActiveX! 而且您將倒閉,您的孩子將不得不上公立學(xué)校,這簡直太可怕了。
So you needed some way to tell what features a browser had. There was a way to do that, of course: Javascript. Unfortunately, some features couldn’t be easily detected in Javascript, and writing Javascript was, well, weird, and Javascript was slow, and so lots of websites didn’t want to do that (or didn’t know they should). What would they do instead?
因此,您需要某種方式來告訴瀏覽器具有哪些功能。 當(dāng)然,有一種方法可以實(shí)現(xiàn):Javascript。 不幸的是,某些功能無法用Javascript輕易檢測(cè)到,并且編寫Javascript很奇怪而且Javascript速度很慢,因此很多網(wǎng)站都不想這樣做(或者不知道應(yīng)該這樣做)。 他們會(huì)怎么做?
Well, RFC 1945 and RFC 2616 (the HTTP 1.0 and HTTP 1.1 specifications) stated that all browsers, web crawlers and other tools that interacted with web servers should identify themselves using a special header in the HTTP they send: the User-Agent header. This header should be (as much as possible) unique to a specific type of agent. This means that Internet Explorer should send a User-Agent header that is different to all other browsers and to all other versions of IE.
好吧,RFC 1945和RFC 2616(HTTP 1.0和HTTP 1.1規(guī)范)規(guī)定,所有與Web服務(wù)器交互的瀏覽器,Web爬網(wǎng)程序和其他工具都應(yīng)使用其發(fā)送的HTTP中的特殊標(biāo)頭來標(biāo)識(shí)自己: User-Agent標(biāo)頭。 此標(biāo)頭應(yīng)(盡可能)對(duì)于特定類型的代理是唯一的。 這意味著Internet Explorer應(yīng)該發(fā)送與所有其他瀏覽器和所有其他版本的IE不同的User-Agent標(biāo)頭。
“Perfect!” cry the web developers. “Our servers can check for this string,. And so begins the the trouble.
“完善!” 讓網(wǎng)絡(luò)開發(fā)人員大哭。 “我們的服務(wù)器可以檢查此字符串。 這樣就開始了麻煩。
麻煩 (The Trouble)
You see, the problem with using the User-Agent string to check for features is that the User-Agent string tells you nothing about what features a given User-Agent has. After all, that’s not what it’s for! So you, na?ve late-1990s web programmer, might write your site when only Mozilla Firefox has support for the hot new Twiddlor feature (note: not a real feature). So you only server Twiddlor-enabled pages to people whose User-Agent strings identify them as being a version of Firefox.
您會(huì)看到,使用User-Agent字符串檢查功能的問題在于,User-Agent字符串無法告訴您給定User-Agent具有的功能。 畢竟,這不是它的目的! 因此,您(1990年末才真正的Web程序員)可能只在Mozilla Firefox支持新的熱門Twiddlor功能(注意:不是真正的功能)時(shí)編寫您的網(wǎng)站。 因此,您僅將啟用了Twiddlor的頁面服務(wù)器提供給其User-Agent字符串將其標(biāo)識(shí)為Firefox版本的用戶。
The problem is, six months later the guys in Redmond get around to adding Twiddlor support to Internet Explorer. But all their users are still complaining that none of their favourite websites will let them use Twiddlor, instead claiming that the website is “Best used in Mozilla Firefox” or some such nonsense.
問題是,六個(gè)月后,雷德蒙德的家伙開始為Internet Explorer添加Twiddlor支持。 但是所有用戶仍然抱怨他們最喜歡的網(wǎng)站都不會(huì)讓他們使用Twiddlor,而是聲稱該網(wǎng)站是“ Mozilla Firefox中最佳使用”網(wǎng)站或類似的廢話。
How does Microsoft get you to show them the Twiddlor-enabled page? Simple: they change their User-Agent string! Sadly, I’m not even joking: this is actually what happened. To prove it, I’m going to show you a few modern browser UA strings.
Microsoft如何讓您向他們顯示啟用Twiddlor的頁面? 很簡單:他們更改了用戶代理字符串! 可悲的是,我什至沒有在開玩笑:這實(shí)際上是發(fā)生了什么。 為了證明這一點(diǎn),我將向您展示一些現(xiàn)代的瀏覽器UA字符串。
Here’s the UA string sent by Google Chrome version 27.0.1453.47 beta (yeah), running on my Mac:
這是在我的Mac上運(yùn)行的Google Chrome版本27.0.1453.47 beta(是)發(fā)送的UA字符串:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.47 Safari/537.36 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.47 Safari/537.36“What is all that crap?”, I hear you ask, quite rightly. Why does it say it’s Mozilla? It’s not Mozilla! You’re quite right. But enough people have tested for Firefox by just checking that the word ‘Mozilla’ is in the UA string that everyone puts it there. And I mean everyone. Check out Safari, also on my Mac:
“那是什么廢話?”,我很正確地聽到你問。 為什么說是Mozilla? 不是Mozilla! 你說得很對(duì)。 但是,已經(jīng)有足夠多的人通過僅檢查UA字符串中是否包含每個(gè)人都在其中的“ Mozilla”一詞來測(cè)試Firefox。 我是指每個(gè)人 。 也在我的Mac上查看Safari:
Notice that both Safari and Chrome claim to be versions of Safari. That’s pretty damn weird.
請(qǐng)注意,Safari和Chrome都聲稱是Safari的版本。 真是不可思議。
What about Internet Explorer 10, on my Windows machine?
我的Windows機(jī)器上的Internet Explorer 10怎么樣?
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0) Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)At least it’s not claiming to be Safari! In fact, this is the best UA string I’ve seen, being a fairly honest representation of the browser.
至少它不是自稱為Safari! 實(shí)際上,這是我見過的最好的UA字符串,是瀏覽器的一個(gè)非常誠實(shí)的表示。
Finally, let’s check Firefox, also on my Windows box.
最后,讓我們?cè)赪indows框中選中Firefox。
用戶代理字符串應(yīng)該是什么樣? (What Should A User-Agent String Look Like?)
To see an example of how these were supposed to look when the standard was originally proposed, we can see what Requests sends.
要查看有關(guān)最初提出該標(biāo)準(zhǔn)時(shí)這些外觀的示例,我們可以看到請(qǐng)求發(fā)送的內(nèi)容。
python-requests/1.2.0 CPython/2.7.2 Darwin/12.2.0 python-requests/1.2.0 CPython/2.7.2 Darwin/12.2.0Short and to the point. The ‘browser’ and its version, the ‘platform’ and its version, and the OS (sort of) and its version.
簡明扼要。 “瀏覽器”及其版本,“平臺(tái)”及其版本,以及OS(某種)及其版本。
為什么如此重要? (Why Does This Matter?)
In principle, the new Javascript-heavy world should have cured us of this problem. People should write JS that tests for features and then uses them, and serves a less interesting version of the web page if you don’t support it. And, mostly, this is what happens! Libraries like JQuery have taken a lot of the hard work out of doing this, so most websites you’ll encounter nowadays do the right thing.
原則上,新的Java繁重的世界應(yīng)該已經(jīng)解決了這個(gè)問題。 人們應(yīng)該編寫用于測(cè)試功能的JS,然后再使用它們,并且如果您不支持該功能,則可以提供不太有趣的網(wǎng)頁版本。 而且,大多數(shù)情況下,這就是發(fā)生的事情! 像JQuery這樣的庫已經(jīng)為完成此工作付出了很多辛苦的工作,因此,如今您會(huì)遇到的大多數(shù)網(wǎng)站都做對(duì)了。
The problem is, sometimes they don’t. And when they don’t, you can encounter strange and confusing bugs. These bugs then tie up developer time and generally make everyone’s life worse. To provide an example, I’m going to briefly walk you through a bug that appeared on the Requests GitHub page a few days ago.
問題是,有時(shí)他們沒有。 如果沒有,您可能會(huì)遇到奇怪而令人困惑的錯(cuò)誤。 這些錯(cuò)誤會(huì)占用開發(fā)人員的時(shí)間,并且通常會(huì)使每個(gè)人的生活變得更糟。 為了提供示例,我將簡要介紹幾天前出現(xiàn)在Requests GitHub頁面上的錯(cuò)誤。
一個(gè)例子 (An Example)
A user reported that, when he accessed a specific web page by doing a simple GET with no complicated stuff, he was getting a httplib.IncompleteRead exception thrown into his face.
一個(gè)用戶報(bào)告說,當(dāng)他通過執(zhí)行簡單的GET而沒有復(fù)雜的內(nèi)容訪問特定的網(wǎng)頁時(shí),他的臉上出現(xiàn)了httplib.IncompleteRead異常。
This was odd in itself. This exception is only ever thrown when either the user or the remote server is using chunked encoding, but the user reported that he didn’t think either party was doing so. He also kindly provided the URL, so that I could reproduce the bug locally. (This is excellent practice, by the way: I’m far more likely to help out if I can easily reproduce your bug on my machine.)
這本身很奇怪。 僅當(dāng)用戶或遠(yuǎn)程服務(wù)器使用分塊編碼時(shí)才會(huì)拋出此異常,但是用戶報(bào)告他認(rèn)為任何一方都沒有這樣做。 他還提供了URL,以便我可以在本地重現(xiàn)該錯(cuò)誤。 (順便說一句,這是一種很好的做法:如果我可以輕松地在計(jì)算機(jī)上重現(xiàn)您的錯(cuò)誤,我很有可能會(huì)提供幫助。)
When I made the same request, I also got the IncompleteRead exception thrown in my face. Further investigation showed that the web server claimed to be serving using chunked encoding, but in fact was just sending the page as normal. This is pretty bad, and there’s not much Requests can do about this: the web server is simply doing the wrong thing. First note for website developers: do NOT claim to be using chunked encoding when you are not!
當(dāng)我發(fā)出相同的請(qǐng)求時(shí),我的臉上也拋出了IncompleteRead異常。 進(jìn)一步的調(diào)查表明,Web服務(wù)器聲稱使用分塊編碼進(jìn)行服務(wù),但實(shí)際上只是正常發(fā)送頁面。 這是非常糟糕的,請(qǐng)求對(duì)此無能為力:Web服務(wù)器只是在做錯(cuò)事。 網(wǎng)站開發(fā)人員的首要注意事項(xiàng):請(qǐng)勿在未使用時(shí)聲明使用分塊編碼!
I was interested to see if we could get the page data anyway, so I patched my local copy of the standard library to see what we got when I returned the data instead of throwing an exception. What I saw was the second unpleasant thing this web site had done. The HTML for this page was about 20 lines long. All it did was embed, at full size, a frame containing another page, or a warning if your browser doesn’t support frames.
我很想看看是否仍然可以獲得頁面數(shù)據(jù),所以我修補(bǔ)了標(biāo)準(zhǔn)庫的本地副本,以查看返回?cái)?shù)據(jù)時(shí)得到的結(jié)果,而不是引發(fā)異常。 我看到的是該網(wǎng)站造成的第二個(gè)令人不快的事情。 該頁面HTML大約有20行。 它所做的全部是全尺寸嵌入包含另一個(gè)頁面的框架 ,或者如果您的瀏覽器不支持框架則發(fā)出警告。
This is pretty obnoxious: why not just server the other page? Why require frames? You aren’t even doing anything with them, you’re just using them for the sake of using them! Second note for website developers: do not use frames when you don’t need them! They are awkward for anything that isn’t a browser.
這很令人討厭:為什么不只服務(wù)器其他頁面? 為什么需要鏡架? 您甚至沒有對(duì)它們做任何事情,只是為了使用它們而使用它們! 網(wǎng)站開發(fā)人員的第二個(gè)注意事項(xiàng):不需要框架時(shí)不要使用框架! 他們對(duì)于不是瀏覽器的任何東西都很尷尬。
In an attempt to be helpful, I pulled the URL being framed out of the HTML and suggested the user hit that instead. Out of sheer curiosity, I then did a Requests GET on the URL.
為了提供幫助,我從HTML中拉出了被框架化的URL,并建議用戶點(diǎn)擊該URL。 出于好奇,我隨后在URL上執(zhí)行了Requests GET。
Requests threw an exception again.
請(qǐng)求再次引發(fā)異常。
I was pretty surprised here, the page rendered fine in my browser. So I looked at the exception. Connection Reset By Peer, read the socket error text. For those who don’t know their network protocols, this indicates that the TCP connection to the web server was closed while we were expecting data on it.
我在這里感到很驚訝,頁面在瀏覽器中呈現(xiàn)良好。 因此,我查看了異常。 Connection Reset By Peer ,讀取套接字錯(cuò)誤文本。 對(duì)于那些不了解其網(wǎng)絡(luò)協(xié)議的用戶,這表明在我們期望其上有數(shù)據(jù)時(shí),與Web服務(wù)器的TCP連接已關(guān)閉。
This is very odd. Requests sent a totally compliant, basic HTTP GET request, and the remote server was shutting the connection in response to it. Doing this is totally against the HTTP specification. Any compliant server is required to respond with an HTTP error code and a Connection: close header if it wants to tear the connection down. Additionally, why did it work fine in Chrome but fail in Requests?
這很奇怪。 請(qǐng)求發(fā)送了完全合規(guī)的基本HTTP GET請(qǐng)求,并且遠(yuǎn)程服務(wù)器正在響應(yīng)該請(qǐng)求而關(guān)閉連接。 這樣做完全違反了HTTP規(guī)范。 如果任何兼容的服務(wù)器想要斷開連接,則需要使用HTTP錯(cuò)誤代碼和Connection: close標(biāo)頭進(jìn)行響應(yīng)。 此外,為什么它在Chrome中工作正常,但在請(qǐng)求中失敗?
There’s really only one obvious thing to do. I grabbed Chrome’s User-Agent string and got Requests to send that instead of its own UA string. (For those who want to spoof their UA string, Requests allows you to pass it as a header. We only set one ourselves if you don’t provide one for us.)
確實(shí)只有一件顯而易見的事情要做。 我抓取了Chrome的User-Agent字符串,并收到了發(fā)送該請(qǐng)求而不是其自己的UA字符串的請(qǐng)求。 (對(duì)于那些想要欺騙其UA字符串的用戶,Requests允許您將其作為標(biāo)頭傳遞。如果您不為我們提供一個(gè),我們只會(huì)設(shè)置一個(gè)。)
Success! The page rendered and returned to us.
成功! 頁面呈現(xiàn)并返回給我們。
For those who want a summary, what was happening here is that the remote site was sniffing the User-Agent header. Instead of checking for features, however, what it was doing was using the header as a gatekeeper! If you don’t have the right User-Agent, you don’t just get a less feature-filled site: you get nothing. Not even an HTTP error page.
對(duì)于那些想要摘要的人,這里發(fā)生的是遠(yuǎn)程站點(diǎn)正在嗅探User-Agent標(biāo)頭。 但是,它沒有檢查功能,而是在使用標(biāo)頭作為網(wǎng)守! 如果您沒有合適的User-Agent,您不僅會(huì)獲得功能較少的網(wǎng)站:您一無所獲。 甚至沒有HTTP錯(cuò)誤頁面。
This is probably the worst example of User-Agent sniffing I’ve ever seen. This was a website developer using a bad practice to violate the HTTP specification. In addition to simply being rude, this is also a genuine cost for many developers. And crap like this leads to stupid UA strings like the ones I showed above.
這可能是我見過的最嚴(yán)重的User-Agent監(jiān)聽示例。 這是一位網(wǎng)站開發(fā)人員,使用不良做法違反了HTTP規(guī)范。 除了簡單起見,這對(duì)于許多開發(fā)人員來說也是一筆真正的代價(jià)。 像這樣的胡扯會(huì)導(dǎo)致愚蠢的UA字符串,就像我上面顯示的那樣。
This is also the third note for website developers: always send HTTP error codes, don’t just close connections.
這也是網(wǎng)站開發(fā)人員的第三個(gè)注意事項(xiàng):始終發(fā)送HTTP錯(cuò)誤代碼,而不僅僅是關(guān)閉連接 。
這個(gè)故事的主旨 (The Moral Of The Story)
The most important lesson, however, is this.
然而,最重要的一課是這個(gè)。
Ignore the User-Agent string unless you absolutely have to.
除非絕對(duì)必要,否則請(qǐng)忽略User-Agent字符串。
Detecting browser features is not what the User Agent string is for, so please don’t use it for that. And if you do (which I’m sure you will, because no-one listens to me anyway), make sure that you don’t refuse service based on the User-Agent. If you want to render a slightly different page, fine, I get that. But don’t refuse to render it at all. It’s obnoxious, it’s brittle, and it’s so 1990s. And besides, as I showed above, all modern User-Agents can lie in their User-Agent string! You can set it in Firefox, and in Chrome, and (probably) in IE, Safari and Opera as well. So not only are you mis-using it, you’re not even getting accurate information!
檢測(cè)瀏覽器功能不是用戶代理字符串的用途,因此請(qǐng)勿將其用于此目的。 而且如果您這樣做了(我相信您會(huì)這樣做,因?yàn)闊o論如何也沒人聽我說),請(qǐng)確保您不拒絕基于User-Agent的服務(wù)。 如果您要呈現(xiàn)稍有不同的頁面,可以了。 但是,請(qǐng)不要拒絕渲染它。 它令人討厭,它很脆,并且是1990年代。 此外,正如我在上面顯示的那樣,所有現(xiàn)代User-Agent都可以位于其User-Agent字符串中! 您可以在Firefox,Chrome和(可能)IE,Safari和Opera中進(jìn)行設(shè)置。 因此,您不僅會(huì)濫用它,甚至無法獲得準(zhǔn)確的信息!
翻譯自: https://www.pybloggers.com/2013/04/user-agent-strings-or-dont-make-me-come-after-you/
用戶代理字符串
總結(jié)
以上是生活随笔為你收集整理的用户代理字符串_用户代理字符串(或者,不要让我追随您)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: STEPS to Success – D
- 下一篇: MATLAB数组矩阵的拼接