匿名內部類和匿名類
Everybody loves a fad. You can pinpoint someone’s generation better than carbon dating by asking them what their favorite toys and gadgets were as a kid. Tamagotchi and pogs? You were born around 1988, weren’t you? Coleco Electronic Quarterback and Garanimals? Well well, an early X-er. A fad is cultural currency and social lubricant at the same time: even if you don’t have the thing itself, it’s a shared reference point that helps locate you as part of a particular time and place. Paradoxically, fads also help identify when a concept has gone stale, depending on who does it.
每個人都喜歡時尚。 通過詢問某人小時候最喜歡的玩具和小玩意,可以比碳測年更好地確定某人的年齡。 他媽哥池和豬? 您出生于1988年左右,不是嗎? Coleco電子四分衛和Garanimals? 好吧,早期的X-er。 一時流行是文化貨幣和社會潤滑劑:即使您本身沒有東西,它也是一個共享的參考點,可以幫助您在特定的時間和地點定位自己。 矛盾的是,時尚還有助于確定概念何時過時,具體取決于誰。
Fads happen in business, too. From corporate retreats to themed attire days (back in the olden times when we went to retreats, offices or, you know, anywhere) or the more recent mandatory fun on Zoom, enterprises are no less susceptible to fads, especially when they involve technology. Part of it is a desire to seem cutting edge, but a large part of it, we think, is simple misunderstanding. Without a good grasp of new systems and tools or the concepts that underlie them, it’s hard to tell the difference between a fad and a future.
時尚也在商業中發生。 從公司務虛會到主題化的裝束日子(從前我們到務虛會,辦公室或任何地方都可以參觀)或最近在Zoom上享受的強制性娛樂,企業都同樣容易受到時尚的影響, 特別是當涉及技術時。 它的一部分是希望看起來很前沿,但我們認為,很大一部分是簡單的誤解。 如果不能很好地掌握新系統和工具或它們背后的概念,就很難說出時尚與未來之間的區別。
Guess Who?!
猜猜是誰?!
Case in point: anonymization. Although the concept of masking identity or erasing identifiable features has long been a component of data science, it was not a widespread topic of discussion in industry in the US until the late 2000s and, really, just before GDPR came into effect and fears of 4% penalties kicked in. Hundreds of vendors promise services that allow you to “anonymize” user data in an effort to find safe harbors or avoid liability, but most businesses have only a vague understanding of what the concept of anonymized data really is and how to do it.
例子:匿名化。 盡管掩蓋身份或擦除可識別特征的概念長期以來一直是數據科學的組成部分,但直到2000年代后期,而且直到GDPR生效和人們擔心4時,它才成為美國工業界廣泛討論的話題。罰款率開始上升。成百上千的供應商承諾提供服務,使您可以“匿名”用戶數據,以尋找安全港或避免承擔責任,但大多數企業對匿名數據的真正含義以及如何使用這些概念只有模糊的了解。做吧。
To unpack anonymous data, it’s important to clear up a few terms so that we don’t run into confusion. First, what is anonymized? Anonymous data is data that does not relate to an identified or identifiable natural person, or data modified such that the data subject is not or no longer identifiable.
要解包匿名數據,重要的是要清理一些術語,以免引起混亂。 首先,匿名是什么? 匿名數據是與已識別或可識別的自然人無關的數據,或者經過修改使得數據主體不再或不再可識別的數據。
That is an extremely vague definition for a concept that is so important, and so let’s dive into that a little more, because this is a game of definitions (every lawyer’s favorite game). If data, on its own or with other data, can identify you, it’s personal data. We don’t talk about personally identifiable information, any more; that fad has passed. These days, you only talk about personal data.
對于一個非常重要的概念來說,這是一個非常模糊的定義,因此讓我們再深入一點,因為這是一個定義游戲(每個律師最喜歡的游戲)。 如果數據本身或與其他數據一起可以識別您的身份 ,那就是個人數據 。 我們不再談論個人身份信息; 這種時尚已經過去。 這些天,您只談論個人數據。

There are ways to make data less useful in identifying a person, but that does not mean that it is anonymous. Instead, there are varying degrees of data obfuscation — means hiding attributes to make reidentification more difficult — on the way to actual anonymization. Here are the two most important kinds.
有一些方法可以使數據在識別個人時不那么有用,但這并不意味著它是匿名的。 取而代之的是,在進行實際匿名處理的過程中,存在各種程度的數據混淆 -意味著隱藏屬性以使重新識別更加困難。 這是兩個最重要的種類。
Masked Data
屏蔽數據
Masked Data is information modified to hide (or “mask”) the underlying, true data. This is a common practice in business, and it is most effective against unauthorized internal review (and pilfering) of valuable business/customer data and against external actors learning important details about clients and vendors. A simplified explanation of masked data is a customer list that details first and last name, age, address, and amount spent with surnames changed to dummy names, ages shifted, and amounts spent reallocated randomly. Much of the derivative analytic data remains the same (amounts spent, total number of customers, locations of accounts, etc) but it is difficult to reidentify any individual user.
屏蔽數據是經過修改以隱藏(或“屏蔽”)基礎真實數據的信息。 這是業務中的常見做法,對于防止對有價值的業務/客戶數據進行未經授權的內部審閱(和竊取)以及對了解有關客戶和供應商重要細節的外部參與者而言,這是最有效的。 屏蔽數據的簡化說明是一個客戶列表,其中詳細列出了姓氏和名字,年齡,地址和花費的金額,其中姓氏更改為虛擬名稱,年齡變化和花費的費用隨機分配。 許多派生分析數據保持不變(花費金額,客戶總數,帳戶位置等),但是很難重新識別任何單個用戶。
What it Isn’t
不是什么
Having a list where the names and identifiers are shifted is a great business approach, but it usually falls short of anonymous in the real world. Why? Because usable data is accurate data, and being able to run the kind of analytics you want means being able to easily mix and match the true underlying information. As such, having the master list (the non-masked data) available means that you will always hold onto the original information, which means you’re still holding personal data, which means you’re not protected by the anonymity safe harbor. Thanks for playing.
列出名稱和標識符在其中進行了移位的列表是一種很好的業務方法,但是在現實世界中通常缺少匿名性。 為什么? 因為可用數據是準確的數據,并且能夠運行您想要的那種分析,則意味著能夠輕松地混合和匹配真實的基礎信息。 因此,擁有主列表(未屏蔽的數據)意味著您將始終保留原始信息,這意味著您仍在保留個人數據,這意味著您不受匿名安全港的保護。 感謝參與。

Pseudonymized Data
假名數據
Pseudonymous data is data that has the most important identifiers removed: names, email addresses, social security numbers, etc. Pseudonymous data still identifies a person, but it isn’t obvious on its face who that person is. Think back to school when they would post grades outside of a classroom but only use student numbers on the chart. In the Mad-Max rush to the sheet of paper to see your grades, it wasn’t possible to see anyone else’s name, and so you only were able to know what your outcome was. This is a good example of pseudonymization and a good example of why it’s used: to protect the rights of individuals from unnecessary exposure of their personal details, including a devastatingly embarrassing failed geometry test in ninth grade.
假名數據是除去了最重要的標識符的數據:姓名,電子郵件地址,社會保險號等。假名數據仍可以識別一個人,但從表面上看不出該人是誰。 當他們想在教室外發布成績但只在圖表上使用學生人數時,請回想學校。 在瘋狂的麥克斯(Mad-Max)急于瀏覽紙質成績的過程中,不可能看到別人的名字,因此您只能知道結果是什么。 這是假名的一個很好的例子,也是一個為什么使用假名的很好的例子:保護個人的權利免于不必要地暴露其個人詳細信息,包括在九年級時令人尷尬的幾何測試失敗。
The more attributes you remove from a dataset, the thinking goes, the more pseudonymized the data becomes, and the closer it gets to full anonymization, at which point you’re in the clear.
從數據集中刪除的屬性越多,人們的想法就越多,數據變得越假名化,就越接近完全匿名化,這時您就很清楚了。
What it Isn’t
不是什么
A panacea, or, honestly, nearly as useful as it might sound. Pseudonymization in practice is often something like this:
靈丹妙藥,或者說,聽起來幾乎一樣有用。 在實踐中,化名通常是這樣的:
- We have an excel spreadsheet with names, addresses, account numbers, customer spend, and profile data. 我們有一個Excel電子表格,其中包含名稱,地址,帳號,客戶支出和個人資料數據。
- We delete the customer name. 我們刪除客戶名稱。
- Presto, pseudonymized data! 預先加密的數據!
Of course, that might technically count as pseudonymization, but it’s virtually useless: you still have every other identifier for an individual, which means that not only is it not difficult to re-identify the person at issue, you haven’t even de-identified them to begin with. Think about it from a data perspective, rather than a human perspective: Column A contains alphanumeric characters used to identify an individual account, so does Column B. If they both do the same thing, what difference does it make if you delete Column A (where the alphanumeric characters are organized into what humans recognize as names) and keep Column B (where the alphanumeric characters are organized into what humans think of as an “account ID number.”)? Under the law, it’s all the same, and the database/algorithm analyzing the data won’t have any problem continuing on as before the deletion.
當然,從技術上講 ,這可以算作假名,但這實際上是沒有用的:您仍然擁有一個人的所有其他標識符,這意味著不僅不難重新識別出該人,而且甚至沒有取消身份驗證,確定了它們的開始。 從數據角度而不是從人類角度考慮:A列包含用于標識個人帳戶的字母數字字符,B列也是如此。如果它們都執行相同的操作,則刪除A列會產生什么不同(將字母數字字符組織成人類可以識別的名字)并保留B列(其中字母數字字符組織成人類認為的“帳戶ID號”)? 根據法律,都是一樣的,并且分析數據的數據庫/算法不會像刪除之前那樣繼續存在任何問題。
“Fine!” you shout, annoyed, “why don’t we just delete names, addresses, account numbers, and credit card information and only keep the more vague data attributes!” A great idea, and it’s the thought process behind GDPR’s approach to anonymization: if you delete enough data and remove enough identifiers, eventually you’ll get to a place where you don’t have personal data any more and the rights of natural persons are protected.
“精細!” 您大喊大叫,“為什么我們不刪除姓名,地址,帳號和信用卡信息,而只保留更模糊的數據屬性!” 一個好主意,這是GDPR匿名化方法的思想過程:如果刪除足夠的數據并刪除足夠的標識符,最終您將到達一個地方,不再擁有個人數據,自然人的權利得到保護。受保護的。
Except not really.
除了不是真的。
If you’re keeping any data at all, and especially if you’re keeping multiple data points and attributes, the likelihood is that you’re going to wind up capable of reidentifying an individual. A very important study in Nature Communications reviewed a variety of “anonymized” datasets and came to a pretty striking conclusion:
如果您要保留所有數據, 尤其是要保留多個數據點和屬性,則很有可能您將能夠重新識別個人。 自然通訊中一項非常重要的研究 回顧了各種“匿名”數據集,得出了一個非常驚人的結論:
Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model
使用我們的模型,我們發現使用15個人口統計屬性的任何數據集都可以正確地重新識別99.98%的美國人。 我們的結果表明,即使采樣大量的匿名數據集也不太可能滿足GDPR設定的現代匿名標準,并嚴重挑戰去身份化“遺忘釋放”模型的技術和法律適用性
In other words, if you have enough data attributes, even “anonymous” data is nothing of the sort, which means that GDPR’s approach to anonymization (followed around the world) has a fatal flaw in the underlying thought process, and the Get-Out-Of-Brussels-Free Card that data companies thought would protect them is actually fairly useless.
換句話說,如果您有足夠的數據屬性,那么即使“匿名”數據也算不上什么,這意味著GDPR的匿名化方法(遍及全球)在潛在的思維過程和“走出去”中具有致命的缺陷。數據公司認為可以保護他們的無布魯塞爾卡實際上是毫無用處的。
A Newer, Better Fad
更新,更好的時尚
This is usually the point in our blogs where we say “the good news is that there is another option” and lay out how to approach things differently. But today, we’re actually going to suggest following an older strategy to avoid some of this anonymization difficulty.
通常,在我們的博客中,我們說“好消息是還有另一種選擇”,并闡明了如何以不同的方式處理事情。 但是今天,我們實際上將建議采用一種較舊的策略來避免某些匿名化難題。
Step 1: Get rid of all the data you don’t need to fulfill your core purposes tied to the data.
第1步:擺脫所有不需要的數據,即可滿足與數據相關的核心目的。
Step 2: Then, once the core purpose is fulfilled, aggregate all of the data you need to run your analytics.
步驟2:然后,一旦實現了核心目的,就可以匯總運行分析所需的所有數據。
Step 3: Now delete the rest of the underlying data. Yes, all of it.
步驟3:現在刪除其余的基礎數據。 是的,全部。
You may be thinking that you’ve just deleted all of the data and you’d be right. That’s often the best answer: you can’t be held liable or responsible for data you no longer own. Get rid of it! Aggregated data is, in our view, the only truly anonymous data out there, because it’s not possible to walk the process back and reidentify an individual from aggregated statistics.
您可能會認為您剛剛刪除了所有數據,這是對的。 通常,這是最好的答案:您不再對不再擁有的數據承擔責任或承擔責任。 擺脫它! 在我們看來,匯總數據是那里唯一的真正匿名數據,因為無法回退流程并從匯總統計信息中重新識別個人。
Now, will this work for everyone and for every dataset? Of course not. Sometimes you need the data for business purposes or for regulatory reasons. But in those cases, anonymization wasn’t appropriate anyway, because you have ongoing duties to protect data based on usage. Put another way, the problem with the anonymization fad is that it encourages shortcut thinking about data: “If we pseudonymize well enough, we can just do whatever we want with the data!” Except no, you can’t, and the data protection authorities are very touchy about what qualifies as properly pseudonymous or anonymized.
現在,這對所有人和每個數據集都適用嗎? 當然不是。 有時您出于業務目的或出于法規原因需要數據。 但是在那種情況下,匿名化還是不合適的 ,因為您有持續的職責要根據使用情況保護數據。 換句話說,匿名化時尚的問題在于,它鼓勵人們對數據進行捷徑思考:“如果我們對假名足夠好,我們就可以對數據做任何想做的事!” 除非否,否則您不能這樣做,并且數據保護機構對于什么是適當的假名或匿名資格非常敏感 。
Is it possible to truly anonymize data? Yes. Is it the answer to all of your data concerns? Probably not, because the most important aspect to your data is how you use it, how you learn from it, and how you leverage it to grow. Anonymized data is stripped of much of its usefulness in favor of a flimsy sense of getting out of regulatory oversight. In the end, it’s a far better plan to protect the data you want, delete the data you don’t, create anonymous data only if it fits certain limited parameters, and leave the fads to the other folks. This approach gives you more time, resources, and money — and they never go out of fashion.
是否可以真正匿名化數據? 是。 是您所有數據問題的答案嗎? 可能不是,因為數據最重要的方面是如何使用數據,如何學習數據以及如何利用數據進行增長。 匿名數據被剝奪了大部分有用性,轉而擺脫了監管監督的脆弱感。 最后,這是一個更好的計劃,可以保護所需的數據,刪除不需要的數據,僅在滿足某些有限參數的情況下創建匿名數據,然后再將風尚交給其他人。 這種方法為您提供了更多的時間,資源和金錢-而且它們永遠不會過時。

Originally published at https://wardpllc.com on September 1, 2020.
最初于 2020年9月1日 發布在 https://wardpllc.com 上。
翻譯自: https://medium.com/swlh/anonymous-schanonymous-b6f6db9156bb
匿名內部類和匿名類
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389575.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389575.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389575.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!