匿名內部類和匿名類_匿名schanonymous

匿名內部類和匿名類

Everybody loves a fad. You can pinpoint someone’s generation better than carbon dating by asking them what their favorite toys and gadgets were as a kid. Tamagotchi and pogs? You were born around 1988, weren’t you? Coleco Electronic Quarterback and Garanimals? Well well, an early X-er. A fad is cultural currency and social lubricant at the same time: even if you don’t have the thing itself, it’s a shared reference point that helps locate you as part of a particular time and place. Paradoxically, fads also help identify when a concept has gone stale, depending on who does it.

每個人都喜歡時尚。 通過詢問某人小時候最喜歡的玩具和小玩意,可以比碳測年更好地確定某人的年齡。 他媽哥池和豬? 您出生于1988年左右,不是嗎? Coleco電子四分衛和Garanimals? 好吧,早期的X-er。 一時流行是文化貨幣和社會潤滑劑:即使您本身沒有東西,它也是一個共享的參考點,可以幫助您在特定的時間和地點定位自己。 矛盾的是,時尚還有助于確定概念何時過時,具體取決于誰。

Fads happen in business, too. From corporate retreats to themed attire days (back in the olden times when we went to retreats, offices or, you know, anywhere) or the more recent mandatory fun on Zoom, enterprises are no less susceptible to fads, especially when they involve technology. Part of it is a desire to seem cutting edge, but a large part of it, we think, is simple misunderstanding. Without a good grasp of new systems and tools or the concepts that underlie them, it’s hard to tell the difference between a fad and a future.

時尚也在商業中發生。 從公司務虛會到主題化的裝束日子(從前我們到務虛會,辦公室或任何地方都可以參觀)或最近在Zoom上享受的強制性娛樂,企業都同樣容易受到時尚的影響, 特別是當涉及技術時。 它的一部分是希望看起來很前沿,但我們認為,很大一部分是簡單的誤解。 如果不能很好地掌握新系統和工具或它們背后的概念,就很難說出時尚與未來之間的區別。

Guess Who?!

猜猜是誰?!

Case in point: anonymization. Although the concept of masking identity or erasing identifiable features has long been a component of data science, it was not a widespread topic of discussion in industry in the US until the late 2000s and, really, just before GDPR came into effect and fears of 4% penalties kicked in. Hundreds of vendors promise services that allow you to “anonymize” user data in an effort to find safe harbors or avoid liability, but most businesses have only a vague understanding of what the concept of anonymized data really is and how to do it.

例子:匿名化。 盡管掩蓋身份或擦除可識別特征的概念長期以來一直是數據科學的組成部分,但直到2000年代后期,而且直到GDPR生效和人們擔心4時,它才成為美國工業界廣泛討論的話題。罰款率開始上升。成百上千的供應商承諾提供服務,使您可以“匿名”用戶數據,以尋找安全港或避免承擔責任,但大多數企業對匿名數據的真正含義以及如何使用這些概念只有模糊的了解。做吧。

To unpack anonymous data, it’s important to clear up a few terms so that we don’t run into confusion. First, what is anonymized? Anonymous data is data that does not relate to an identified or identifiable natural person, or data modified such that the data subject is not or no longer identifiable.

要解包匿名數據,重要的是要清理一些術語,以免引起混亂。 首先,匿名什么? 匿名數據是與已識別或可識別的自然人無關的數據,或者經過修改使得數據主體不再或不再可識別的數據。

That is an extremely vague definition for a concept that is so important, and so let’s dive into that a little more, because this is a game of definitions (every lawyer’s favorite game). If data, on its own or with other data, can identify you, it’s personal data. We don’t talk about personally identifiable information, any more; that fad has passed. These days, you only talk about personal data.

對于一個非常重要的概念來說,這是一個非常模糊的定義,因此讓我們再深入一點,因為這是一個定義游戲(每個律師最喜歡的游戲)。 如果數據本身或與其他數據一起可以識別您的身份 ,那就是個人數據 。 我們不再談論個人身份信息; 這種時尚已經過去。 這些天,您只談論個人數據。

Image for post
“PII? Are you kidding me?”
“ PII? 你在跟我開玩笑嗎?”

There are ways to make data less useful in identifying a person, but that does not mean that it is anonymous. Instead, there are varying degrees of data obfuscation — means hiding attributes to make reidentification more difficult — on the way to actual anonymization. Here are the two most important kinds.

有一些方法可以使數據在識別個人時不那么有用,但這并不意味著它是匿名的。 取而代之的是,在進行實際匿名處理的過程中,存在各種程度的數據混淆 -意味著隱藏屬性以使重新識別更加困難。 這是兩個最重要的種類。

Masked Data

屏蔽數據

Masked Data is information modified to hide (or “mask”) the underlying, true data. This is a common practice in business, and it is most effective against unauthorized internal review (and pilfering) of valuable business/customer data and against external actors learning important details about clients and vendors. A simplified explanation of masked data is a customer list that details first and last name, age, address, and amount spent with surnames changed to dummy names, ages shifted, and amounts spent reallocated randomly. Much of the derivative analytic data remains the same (amounts spent, total number of customers, locations of accounts, etc) but it is difficult to reidentify any individual user.

屏蔽數據是經過修改以隱藏(或“屏蔽”)基礎真實數據的信息。 這是業務中的常見做法,對于防止對有價值的業務/客戶數據進行未經授權的內部審閱(和竊取)以及對了解有關客戶和供應商重要細節的外部參與者而言,這是最有效的。 屏蔽數據的簡化說明是一個客戶列表,其中詳細列出了姓氏和名字,年齡,地址和花費的金額,其中姓氏更改為虛擬名稱,年齡變化和花費的費用隨機分配。 許多派生分析數據保持不變(花費金額,客戶總數,帳戶位置等),但是很難重新識別任何單個用戶。

What it Isn’t

不是什么

Having a list where the names and identifiers are shifted is a great business approach, but it usually falls short of anonymous in the real world. Why? Because usable data is accurate data, and being able to run the kind of analytics you want means being able to easily mix and match the true underlying information. As such, having the master list (the non-masked data) available means that you will always hold onto the original information, which means you’re still holding personal data, which means you’re not protected by the anonymity safe harbor. Thanks for playing.

列出名稱和標識符在其中進行了移位的列表是一種很好的業務方法,但是在現實世界中通常缺少匿名性。 為什么? 因為可用數據是準確的數據,并且能夠運行您想要的那種分析,則意味著能夠輕松地混合和匹配真實的基礎信息。 因此,擁有主列表(未屏蔽的數據)意味著您將始終保留原始信息,這意味著您仍在保留個人數據,這意味著您不受匿名安全港的保護。 感謝參與。

Image for post
I…I didn’t realize we were playing a game?
我……我不知道我們在玩游戲嗎?

Pseudonymized Data

假名數據

Pseudonymous data is data that has the most important identifiers removed: names, email addresses, social security numbers, etc. Pseudonymous data still identifies a person, but it isn’t obvious on its face who that person is. Think back to school when they would post grades outside of a classroom but only use student numbers on the chart. In the Mad-Max rush to the sheet of paper to see your grades, it wasn’t possible to see anyone else’s name, and so you only were able to know what your outcome was. This is a good example of pseudonymization and a good example of why it’s used: to protect the rights of individuals from unnecessary exposure of their personal details, including a devastatingly embarrassing failed geometry test in ninth grade.

假名數據是除去了最重要的標識符的數據:姓名,電子郵件地址,社會保險號等。假名數據仍可以識別一個人,但從表面上看不出該人是誰。 當他們想在教室外發布成績但只在圖表上使用學生人數時,請回想學校。 在瘋狂的麥克斯(Mad-Max)急于瀏覽紙質成績的過程中,不可能看到別人的名字,因此您只能知道結果是什么。 這是假名的一個很好的例子,也是一個為什么使用假名的很好的例子:保護個人的權利免于不必要地暴露其個人詳細信息,包括在九年級時令人尷尬的幾何測試失敗。

The more attributes you remove from a dataset, the thinking goes, the more pseudonymized the data becomes, and the closer it gets to full anonymization, at which point you’re in the clear.

從數據集中刪除的屬性越多,人們的想法就越多,數據變得越假名化,就越接近完全匿名化,這時您就很清楚了。

What it Isn’t

不是什么

A panacea, or, honestly, nearly as useful as it might sound. Pseudonymization in practice is often something like this:

靈丹妙藥,或者說,聽起來幾乎一樣有用。 在實踐中,化名通常是這樣的:

  1. We have an excel spreadsheet with names, addresses, account numbers, customer spend, and profile data.

    我們有一個Excel電子表格,其中包含名稱,地址,帳號,客戶支出和個人資料數據。
  2. We delete the customer name.

    我們刪除客戶名稱。
  3. Presto, pseudonymized data!

    預先加密的數據!

Of course, that might technically count as pseudonymization, but it’s virtually useless: you still have every other identifier for an individual, which means that not only is it not difficult to re-identify the person at issue, you haven’t even de-identified them to begin with. Think about it from a data perspective, rather than a human perspective: Column A contains alphanumeric characters used to identify an individual account, so does Column B. If they both do the same thing, what difference does it make if you delete Column A (where the alphanumeric characters are organized into what humans recognize as names) and keep Column B (where the alphanumeric characters are organized into what humans think of as an “account ID number.”)? Under the law, it’s all the same, and the database/algorithm analyzing the data won’t have any problem continuing on as before the deletion.

當然,從技術上講 ,這可以算作假名,但這實際上是沒有用的:您仍然擁有一個人的所有其他標識符,這意味著不僅不難重新識別出該人,而且甚至沒有取消身份驗證,確定了它們的開始。 從數據角度而不是從人類角度考慮:A列包含用于標識個人帳戶的字母數字字符,B列也是如此。如果它們都執行相同的操作,則刪除A列會產生什么不同(將字母數字字符組織成人類可以識別的名字)并保留B列(其中字母數字字符組織成人類認為的“帳戶ID號”)? 根據法律,都是一樣的,并且分析數據的數據庫/算法不會像刪除之前那樣繼續存在任何問題。

Image for post
“Can’t tell the difference don’t care lol”
“不能說出區別不在乎大聲笑”

“Fine!” you shout, annoyed, “why don’t we just delete names, addresses, account numbers, and credit card information and only keep the more vague data attributes!” A great idea, and it’s the thought process behind GDPR’s approach to anonymization: if you delete enough data and remove enough identifiers, eventually you’ll get to a place where you don’t have personal data any more and the rights of natural persons are protected.

“精細!” 您大喊大叫,“為什么我們不刪除姓名,地址,帳號和信用卡信息,而只保留更模糊的數據屬性!” 一個好主意,這是GDPR匿名化方法的思想過程:如果刪除足夠的數據并刪除足夠的標識符,最終您將到達一個地方,不再擁有個人數據,自然人的權利得到保護。受保護的。

Except not really.

除了不是真的。

If you’re keeping any data at all, and especially if you’re keeping multiple data points and attributes, the likelihood is that you’re going to wind up capable of reidentifying an individual. A very important study in Nature Communications reviewed a variety of “anonymized” datasets and came to a pretty striking conclusion:

如果您要保留所有數據, 尤其是要保留多個數據點和屬性,則很有可能您將能夠重新識別個人。 自然通訊中一項非常重要的研究 回顧了各種“匿名”數據集,得出了一個非常驚人的結論:

Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model

使用我們的模型,我們發現使用15個人口統計屬性的任何數據集都可以正確地重新識別99.98%的美國人。 我們的結果表明,即使采樣大量的匿名數據集也不太可能滿足GDPR設定的現代匿名標準,并嚴重挑戰去身份化“遺忘釋放”模型的技術和法律適用性

In other words, if you have enough data attributes, even “anonymous” data is nothing of the sort, which means that GDPR’s approach to anonymization (followed around the world) has a fatal flaw in the underlying thought process, and the Get-Out-Of-Brussels-Free Card that data companies thought would protect them is actually fairly useless.

換句話說,如果您有足夠的數據屬性,那么即使“匿名”數據也算不上什么,這意味著GDPR的匿名化方法(遍及全球)在潛在的思維過程和“走出去”中具有致命的缺陷。數據公司認為可以保護他們的無布魯塞爾卡實際上是毫無用處的。

Image for post
“But I traded you Ventnor Avenue for it!”
“但是我用它取代了您的Ventnor Avenue!”

A Newer, Better Fad

更新,更好的時尚

This is usually the point in our blogs where we say “the good news is that there is another option” and lay out how to approach things differently. But today, we’re actually going to suggest following an older strategy to avoid some of this anonymization difficulty.

通常,在我們的博客中,我們說“好消息是還有另一種選擇”,并闡明了如何以不同的方式處理事情。 但是今天,我們實際上將建議采用一種較舊的策略來避免某些匿名化難題。

Step 1: Get rid of all the data you don’t need to fulfill your core purposes tied to the data.

第1步:擺脫所有不需要的數據,即可滿足與數據相關的核心目的。

Step 2: Then, once the core purpose is fulfilled, aggregate all of the data you need to run your analytics.

步驟2:然后,一旦實現了核心目的,就可以匯總運行分析所需的所有數據。

Step 3: Now delete the rest of the underlying data. Yes, all of it.

步驟3:現在刪除其余的基礎數據。 是的,全部。

Image for post
This is crazy talk.
這是瘋話。

You may be thinking that you’ve just deleted all of the data and you’d be right. That’s often the best answer: you can’t be held liable or responsible for data you no longer own. Get rid of it! Aggregated data is, in our view, the only truly anonymous data out there, because it’s not possible to walk the process back and reidentify an individual from aggregated statistics.

您可能會認為您剛剛刪除了所有數據,這是對的。 通常,這是最好的答案:您不再對不再擁有的數據承擔責任或承擔責任。 擺脫它! 在我們看來,匯總數據是那里唯一的真正匿名數據,因為無法回退流程并從匯總統計信息中重新識別個人。

Now, will this work for everyone and for every dataset? Of course not. Sometimes you need the data for business purposes or for regulatory reasons. But in those cases, anonymization wasn’t appropriate anyway, because you have ongoing duties to protect data based on usage. Put another way, the problem with the anonymization fad is that it encourages shortcut thinking about data: “If we pseudonymize well enough, we can just do whatever we want with the data!” Except no, you can’t, and the data protection authorities are very touchy about what qualifies as properly pseudonymous or anonymized.

現在,這對所有人和每個數據集都適用嗎? 當然不是。 有時您出于業務目的或出于法規原因需要數據。 但是在那種情況下,匿名化還是不合適的 ,因為您有持續的職責要根據使用情況保護數據。 換句話說,匿名化時尚的問題在于,它鼓勵人們對數據進行捷徑思考:“如果我們對假名足夠好,我們就可以對數據做任何想做的事!” 除非否,否則您不能這樣做,并且數據保護機構對于什么是適當的假名或匿名資格非常敏感 。

Is it possible to truly anonymize data? Yes. Is it the answer to all of your data concerns? Probably not, because the most important aspect to your data is how you use it, how you learn from it, and how you leverage it to grow. Anonymized data is stripped of much of its usefulness in favor of a flimsy sense of getting out of regulatory oversight. In the end, it’s a far better plan to protect the data you want, delete the data you don’t, create anonymous data only if it fits certain limited parameters, and leave the fads to the other folks. This approach gives you more time, resources, and money — and they never go out of fashion.

是否可以真正匿名化數據? 是。 是您所有數據問題的答案嗎? 可能不是,因為數據最重要的方面是如何使用數據,如何學習數據以及如何利用數據進行增長。 匿名數據被剝奪了大部分有用性,轉而擺脫了監管監督的脆弱感。 最后,這是一個更好的計劃,可以保護所需的數據,刪除不需要的數據,僅在滿足某些有限參數的情況下創建匿名數據,然后再將風尚交給其他人。 這種方法為您提供了更多的時間,資源和金錢-而且它們永遠不會過時。

Image for post
The best things in life never do.
生活中最好的事情永遠做不到。

Originally published at https://wardpllc.com on September 1, 2020.

最初于 2020年9月1日 發布在 https://wardpllc.com 上。

翻譯自: https://medium.com/swlh/anonymous-schanonymous-b6f6db9156bb

匿名內部類和匿名類

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389575.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389575.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389575.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Pytorch框架中SGD&Adam優化器以及BP反向傳播入門思想及實現

因為這章內容比較多,分開來敘述,前面先講理論后面是講代碼。最重要的是代碼部分,結合代碼去理解思想。 SGD優化器 思想: 根據梯度,控制調整權重的幅度 公式: 權重(新) 權重(舊) - 學習率 梯度 Adam…

朱曄和你聊Spring系列S1E3:Spring咖啡罐里的豆子

標題中的咖啡罐指的是Spring容器,容器里裝的當然就是被稱作Bean的豆子。本文我們會以一個最基本的例子來熟悉Spring的容器管理和擴展點。閱讀PDF版本 為什么要讓容器來管理對象? 首先我們來聊聊這個問題,為什么我們要用Spring來管理對象&…

ab實驗置信度_為什么您的Ab測試需要置信區間

ab實驗置信度by Alos Bissuel, Vincent Grosbois and Benjamin HeymannAlosBissuel,Vincent Grosbois和Benjamin Heymann撰寫 The recent media debate on COVID-19 drugs is a unique occasion to discuss why decision making in an uncertain environment is a …

基于Pytorch的NLP入門任務思想及代碼實現:判斷文本中是否出現指定字

今天學了第一個基于Pytorch框架的NLP任務: 判斷文本中是否出現指定字 思路:(注意:這是基于字的算法) 任務:判斷文本中是否出現“xyz”,出現其中之一即可 訓練部分: 一&#xff…

erlang下lists模塊sort(排序)方法源碼解析(二)

上接erlang下lists模塊sort(排序)方法源碼解析(一),到目前為止,list列表已經被分割成N個列表,而且每個列表的元素是有序的(從大到小) 下面我們重點來看看mergel和rmergel模塊,因為我…

洛谷P4841 城市規劃(多項式求逆)

傳送門 這題太珂怕了……如果是我的話完全想不出來…… 題解 1 //minamoto2 #include<iostream>3 #include<cstdio>4 #include<algorithm>5 #define ll long long6 #define swap(x,y) (x^y,y^x,x^y)7 #define mul(x,y) (1ll*(x)*(y)%P)8 #define add(x,y) (x…

支撐阻力指標_使用k表示聚類以創建支撐和阻力

支撐阻力指標Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seek…

高版本(3.9版本)python在anaconda安裝opencv庫及skimage庫(scikit_image庫)諸多問題解決辦法

今天開始CV方向的學習&#xff0c;然而剛拿到基礎代碼的時候發現 from skimage.color import rgb2gray 和 import cv2標紅&#xff08;這里是因為我已經配置成功了&#xff0c;所以沒有紅標&#xff09;&#xff0c;我以為是單純兩個庫沒有下載&#xff0c;去pycharm中下載ski…

python 實現斐波那契數列

# coding:utf8 __author__ blueslidef fun(arg1,arg2,stop):if arg10:print(arg1,arg2)arg3 arg1arg2print(arg3)if arg3<stop:arg3 fun(arg2,arg3,stop)fun(0,1,100)轉載于:https://www.cnblogs.com/bluesl/p/9079705.html

單機安裝ZooKeeper

2019獨角獸企業重金招聘Python工程師標準>>> zookeeper下載、安裝以及配置環境變量 本節介紹單機的zookeeper安裝&#xff0c;官方下載地址如下&#xff1a; https://archive.apache.org/dist/zookeeper/ 我這里使用的是3.4.11版本&#xff0c;所以找到相應的版本點…

均線交易策略的回測 r_使用r創建交易策略并進行回測

均線交易策略的回測 rR Programming language is an open-source software developed by statisticians and it is widely used among Data Miners for developing Data Analysis. R can be best programmed and developed in RStudio which is an IDE (Integrated Development…

opencv入門課程:彩色圖像灰度化和二值化(采用skimage庫和opencv庫兩種方法)

用最簡單的辦法實現彩色圖像灰度化和二值化&#xff1a; 首先采用skimage庫&#xff08;skimage庫現在在scikit_image庫中&#xff09;實現&#xff1a; from skimage.color import rgb2gray import numpy as np import matplotlib.pyplot as plt""" skimage庫…

SVN中Revert changes from this revision 跟Revert to this revision

譬如有個文件&#xff0c;有十個版本&#xff0c;假定版本號是1&#xff0c;2&#xff0c;3&#xff0c;4&#xff0c;5&#xff0c;6&#xff0c;7&#xff0c;8&#xff0c;9&#xff0c;10。Revert to this revision&#xff1a; 如果是在版本6這里點擊“Revert to this rev…

歸 [拾葉集]

歸 心歸故鄉 想象行走在 鄉間恬靜小路上 讓那些疲憊的夢 都隨風飛散吧&#xff01; 不去想那些世俗 人來人往 熙熙攘攘 秋日午后 陽光下 細數落葉 來日方長 世上的路 有詩人、浪子 歌詠吟唱 世上的人 在欲望、信仰中 彷徨 彷徨又迷茫 親愛的人兒 快結束那 無休止的獨自流浪 莫要…

instagram分析以預測與安的限量版運動鞋轉售價格

Being a sneakerhead is a culture on its own and has its own industry. Every month Biggest brands introduce few select Limited Edition Sneakers which are sold in the markets according to Lottery System called ‘Raffle’. Which have created a new market of i…

opencv:用最鄰近插值和雙線性插值法實現上采樣(放大圖像)與下采樣(縮小圖像)

上采樣與下采樣 概念&#xff1a; 上采樣&#xff1a; 放大圖像&#xff08;或稱為上采樣&#xff08;upsampling&#xff09;或圖像插值&#xff08;interpolating&#xff09;&#xff09;的主要目的 是放大原圖像,從而可以顯示在更高分辨率的顯示設備上。 下采樣&#xff…

CSS魔法堂:那個被我們忽略的outline

前言 在CSS魔法堂&#xff1a;改變單選框顏色就這么吹毛求疵&#xff01;中我們要模擬原生單選框通過Tab鍵獲得焦點的效果&#xff0c;這里涉及到一個常常被忽略的屬性——outline&#xff0c;由于之前對其印象確實有些模糊&#xff0c;于是本文打算對其進行稍微深入的研究^_^ …

初創公司怎么做銷售數據分析_初創公司與Faang公司的數據科學

初創公司怎么做銷售數據分析介紹 (Introduction) In an increasingly technological world, data scientist and analyst roles have emerged, with responsibilities ranging from optimizing Yelp ratings to filtering Amazon recommendations and designing Facebook featu…

opencv:灰色和彩色圖像的像素直方圖及直方圖均值化的實現與展示

直方圖及直方圖均值化的理論&#xff0c;實現及展示 直方圖&#xff1a; 首先&#xff0c;我們來看看什么是直方圖&#xff1a; 理論概念&#xff1a; 在圖像處理中&#xff0c;經常用到直方圖&#xff0c;如顏色直方圖、灰度直方圖等。 圖像的灰度直方圖就描述了圖像中灰度分…

mysql.sock問題

Cant connect to local MySQL server through socket /tmp/mysql.sock 上述提示可能在啟動mysql時遇到&#xff0c;即在/tmp/mysql.sock位置找不到所需要的mysql.sock文件&#xff0c;主要是由于my.cnf文件里對mysql.sock的位置設定導致。 mysql.sock默認的是在/var/lib/mysql,…