海量數據尋找最頻繁的數據
Some activities are instinctive. A baby doesn’t need to be taught how to suckle. Most people can use an escalator, operate an elevator, and open a door instinctively. The same isn’t true of playing a guitar, driving a car, or analyzing data. Once you get comfortable with what to look for in a data set, you’ll find data analysis can be as much fun as playing a guitar or driving a car.
有些活動是本能的。 不需要教嬰兒如何哺乳。 大多數人可以本能地使用自動扶梯,操作電梯和開門。 彈吉他,開汽車或分析數據并非如此。 一旦您對數據集中的內容感到滿意,就會發現數據分析和彈吉他或開車一樣有趣。
目的 (Objective)
When faced with new data, the first thing to consider is the objective you, your boss, or your client have in analyzing the dataset. Consider these four possibilities, three are comparatively easy and one is a relative challenge.
面對新數據時,首先要考慮的是您,您的老板或客戶在分析數據集時要達到的目標。 考慮這四種可能性 ,三種相對容易,一種相對挑戰。
Conduct a Specific Analysis — Your client only wants you to conduct a specific analysis, perhaps like descriptive statistics or a statistical test between two groups. No problem, just conduct the analysis. There’s no need to go further. That’s easy.
進行特定分析 -您的客戶只希望您進行特定分析,例如描述性統計或兩組之間的統計檢驗。 沒問題,只需進行分析即可。 無需進一步。 這很簡單。
Answer a Specific Question — Some clients only want one thing — answer a specific question. Maybe it’s something like “is my water safe to drink” or “is traffic on my street worse on Wednesdays.” This will require more thought and perhaps some experience, but again, you have a specific direction to go in. That makes it easier.
回答一個特定的問題 -有些客戶只想要一件事-回答一個特定的問題。 可能是“我的水可以安全飲用”或“星期三街道上的交通情況是否更糟”。 這將需要更多的思考和也許的一些經驗,但是同樣,您有一個特定的方向可以進入。這使它更容易。
Address a General Need — Projects with general goals often involve model building. You’ll have to establish whether they need a single forecast, map or model, or a tool that can be used again in the future. This will require quite a bit of thought and experience but at least you know what you need to do and where you need to end up. Not easy but straightforward.
解決一般需求 -具有一般目標的項目通常涉及模型構建。 您必須確定他們是否需要單個預測,地圖或模型,或者需要將來可以再次使用的工具。 這將需要大量的思想和經驗,但是至少您知道您需要做些什么以及最終需要去哪里。 不容易,但直接。
Explore the Unknown — Every once in a while, a client will have nothing specific in mind, but will want to know whatever can be determined from the dataset. This is a challenge because there’s no guidance for where to start or where to finish. This blog will help you address this objective.
探索未知 -每隔一段時間,客戶就不會有什么特別的主意,但希望知道可以從數據集中確定的內容。 這是一個挑戰,因為沒有關于從哪里開始或從哪里結束的指導。 該博客將幫助您解決此目標。
If your client is not clear about their objective, start at the very end. Ask what decisions will need to be made based on the results of your analysis. Ask what kind of outputs would be appropriate — a report, an infographic, a spreadsheet file, a presentation, or an application. If they have no expectations, it’s time to explore.
如果您的客戶端沒有明確自己的目標,開始在最后 。 詢問根據分析結果需要做出哪些決定。 詢問哪種輸出是合適的-報告,信息圖,電子表格文件,演示文稿或應用程序。 如果他們沒有期望,那就該去探索了。
有數據嗎? (Got data?)
Scrubbing your data will make you familiar with what you have. That’s why it’s a good idea to know your objective first. There are many things you can do to scrub your data but the first thing is to put it into a matrix. Statistical analyses all begin with matrices. The form of the matrix isn’t always the same, but most commonly, the matrix has columns that represent variables (e.g., metrics, measurements) and rows that represent observations (e.g., individuals, students, patients, sample units, or dates). Data on the variables for each observation go into the cells. Usually, this is done with spreadsheet software.
整理數據將使您熟悉所擁有的內容。 這就是為什么首先了解您的目標是一個好主意。 您可以執行許多操作來清理數據,但首先要將其放入矩陣中。 統計分析都是從矩陣開始的。 矩陣的形式并不總是相同的,但是最常見的是,矩陣具有代表變量(例如度量,度量)的列和代表觀察值的行(例如個人,學生,患者,樣本單位或日期) 。 每個觀察變量的數據都進入單元格。 通常,這是通過電子表格軟件完成的。
Data scrubbing can be cursory or exhaustive. Assuming the data are already available in electronic form, you’ll still have to achieve two goals — getting the numbers right and getting the right numbers.
數據清理可能是粗略的或詳盡的。 假設數據已經以電子形式提供,您仍然必須實現兩個目標-正確地編號和正確地編號。
Getting the numbers right requires correcting at least three types of data errors:
正確計算數字要求至少糾正三種類型的數據錯誤 :
Alphanumeric substitution, which involves mixing letters and numbers (e.g., 0 and o or O, 1 and l, 5 and S, 6 and b), dropped or added digits, spelling mistakes in text fields that will be sorted or filtered, and random errors.
字母數字替換 ,包括字母和數字的混合(例如0和o或O,1和l,5和S,6和b),數字的掉落或增加,文本字段中的拼寫錯誤(將被排序或過濾)以及隨機錯誤。
Specification errors involve bad data generation, perhaps attributable to recording mistakes, uncalibrated equipment, lab mistakes, or incorrect sample IDs and aliases.
規范錯誤涉及不良的數據生成,可能歸因于記錄錯誤,未校準的設備,實驗室錯誤或不正確的樣品ID和別名。
Inappropriate Data Formats, such as extra columns and rows, inconsistent use of ND, NA, or NR flags, and the inappropriate presence of 0s versus blanks.
不適當的數據格式 ,例如多余的列和行,ND,NA或NR標志的使用不一致,以及0與空白之間的不適當存在。
Getting the right numbers requires addressing a variety of data issues:
獲取正確的數字需要解決各種數據問題:
Variables and phenomenon. Are the variables sufficient to explore the phenomena in question?
變量和現象 。 這些變量是否足以探索所討論的現象 ?
Variable scales. Review the measurement scales of the variables so you know what analyses might be applicable to the data. Also, look for nominal and ordinal scale variables to consider how you might segment the data.
可變比例尺 。 查看變量的度量范圍 ,以了解哪些分析可能適用于數據。 另外,查找名義和次序比例變量以考慮如何分割數據。
Representative sample. Considering the population being explored, does the sample appear to be representative.
代表性樣品 。 考慮到正在探索的種群,樣本是否具有代表性。
Replicates. If there are replicate or other quality control samples, they should be removed from the analysis appropriately.
復制 。 如果有重復樣品或其他質量控制樣品 ,則應將其從分析中適當除去。
Censored data. If you have censored data (i.e., unquantified data above or below some limit), you can recode the data as some fraction of the limit, but not zero.
審查數據 。 如果您檢查了數據(即,超出或低于某個限制的未量化數據),則可以將數據重新編碼為限制的一部分,但不能為零。
Missing data. If you have missing data, they should be recoded as blanks or use another accepted procedure for treating missing data.
缺少數據 。 如果您有丟失的數據,應將它們重新編碼為空白或使用其他可接受的過程來處理丟失的數據。
Data scrubbing can consume a substantial amount of time, even more than the statistical calculations.
數據清理會消耗大量時間,甚至比統計計算還要多。
要找什么 (What To Look For)
If you’re new to applied statistics, you might wonder where to start looking at a dataset. Here are five places to consider looking.
如果您不熟悉應用統計信息,則可能想知道從哪里開始查看數據集。 這里有五個要考慮的地方。

- Snapshot 快照
- Population or Sample Characteristics 總體或樣本特征
- Change 更改
- Trends and Patterns 趨勢與模式
- Anomalies 異常現象
Start with the entire dataset. Don’t divide the data into groups based on categoral variables. Divide and aggregate groupings later after you have a feel for the global situation. The reason for this is that the number of possible combinations of variables and levels of grouping variables can be large, overwhelming, each one being an analysis in itself. Like peeling an onion, explore one layer of data at a time until you get to the core.
從整個數據集開始。 不要根據類別變量將數據分為幾類。 在對全球形勢有所了解之后,請對分組進行分組和匯總。 這樣做的原因是,變量的可能組合和分組變量級別的數量可能很大,令人不知所措,每個變量本身就是一項分析。 就像剝洋蔥一樣,一次瀏覽一層數據,直到到達核心為止。
快照 (Snapshot)
What does the data look like at one point. Usually it’s at the same point in time but it could also be some common conditions, like after a specific business activity, or at a certain temperature and pressure.
數據在某一點是什么樣的。 通常是在同一時間點,但也可能是某些常見條件,例如在進行特定業務活動之后,或在一定溫度和壓力下。
Snapshots aren’t difficult to analyze. You just decide where you want a snapshot and record all the variable values at that point. There are no descriptive statistics, graphs, or tests unless you decide to subdivide the data later. The only challenge is deciding whether taking a snapshot makes any sense for exploring the data.
快照并不難分析。 您只需確定要快照的位置,然后記錄所有變量值。 除非您決定稍后再細分數據,否則沒有描述性的統計信息,圖表或測試。 唯一的挑戰是確定拍攝快照是否對瀏覽數據有意義。
The only thing you look for in a snapshot is something unexpected or unusual that might direct further analysis. It can also be used as a baseline to evaluate change.
您在快照中唯一需要查找的是意外或異常情況,可能會導致進一步的分析。 它也可以用作評估變化的基準。
人口特征 (Population Characteristics)
It’s always a good idea to know everything you can about the populations you are exploring. The approach is straightfoward; calculate descriptive statistics. Here’s a summary of what you might look at. It’s based on the measurement scale of the variable you are assessing.
了解您所探索的人群的一切都是一個好主意。 這種方法是直截了當的; 計算描述統計 。 這是您可能會看到的摘要。 它基于您正在評估的變量的度量范圍。

For grouping (nominal scale) variables, look at the frequencies of the groups. You’ll want to know if there are enough observations in each group to break them out for further analysis. For progression (continuous) scales, look at the median and the mean. If they’re close, the frequency distribution is probably symmetrical. You can confirm this by looking at a histogram or the skewness. If the standard-deviation-divided-by-the-mean (called the coefficient of variation) is over 1, the distribution may be lognormal, or at least, asymmetrical. Quartiles and deciles will support this finding. Look at the measures of central tendency and dispersion. If the dispersion is relatively large, statistical testing may be problematical.
對于分組(標稱比例)變量,請查看組的頻率。 您可能想知道每個組中是否有足夠的觀測值可以將其分解以進行進一步的分析。 對于進展(連續)量表,請查看中位數和均值。 如果它們很接近,則頻率分布可能是對稱的。 您可以通過查看直方圖或偏度來確認這一點。 如果按均值劃分的標準偏差(稱為變異系數 )超過1,則分布可能是對數正態分布,或者至少是不對稱分布。 四分位數和十分位數將支持這一發現。 看一下集中趨勢和分散性的度量。 如果離散度相對較大,則統計測試可能會出現問題。
Graphs are also a good way, and in my mind the best way, to explore population characteristics. Never calculate a statistic without looking at its visual representation in a graph. There are many types of graphs that will let you do that.
圖也是探索人口特征的一種好方法,也是我認為最好的方法。 在不查看圖形的直觀表示的情況下,切勿計算統計信息。 有許多類型的圖形可以幫助您做到這一點。

What you look for in a graph depends on what the graph is supposed to show — distribution, mixtures, properties, or relationships. There are other things you might look for but here are a few things to start with.
您在圖表中尋找的內容取決于圖表應顯示的內容-分布,混合,屬性或關系。 您可能還會尋找其他東西,但是這里有一些開始的事情。
For distribution graphs (box plots, histograms, dot plots, stem-leaf diagrams, Q-Q plots, rose diagrams, and probability plots), look for symmetry. That will separate many theoretical distributions, say a normal distribution (symmetrical) from a lognormal distribution (asymmetrical). This will be useful information if you do any statistical testing later.
對于分布圖(箱形圖,直方圖,點圖,莖葉圖,QQ圖,玫瑰圖和概率圖),請尋找對稱性 。 這會將許多理論分布(例如,正態分布(對稱)和對數正態分布(不對稱))分開。 如果以后進行任何統計測試,這將是有用的信息。
For mixture graphs (pie charts, rose diagrams, and ternary plots), look for imbalance. If you have some segments that are very large and others very small, there may be common and unique themes to the mix to explore. Maybe the unique segments can be combined. This will be useful information if you break out subgroups later.
對于混合圖(餅圖,玫瑰圖和三元圖),請查找不平衡度 。 如果您的某些細分受眾群很大,而其他細分受眾群很小,那么可能會有一些共同而獨特的主題可供探索。 也許可以組合獨特的細分。 如果以后再細分子組,這將是有用的信息。
For properties graphs (bar charts, area charts, line charts, candlestick charts, control charts, means plots, deviation plots, spread plots, matrix plots, maps, block diagrams, and rose diagrams), look for the unexpected. Are the central tendency and dispersion what you might expect? Where are big deviations?
對于特性圖(條形圖,面積圖,折線圖,燭臺圖,控制圖,均值圖,偏差圖,散布圖,矩陣圖,地圖,框圖和玫瑰圖),請查找意外的 。 您所期望的主要趨勢和分散是嗎? 大的偏差在哪里?
For relationship graphs (icon plots, 2D scatter plots, contour plots, bubble plots, 3D scatter plots, surface plots, and multivariable plots), look for trends and patterns. You might find linear or curvilinear trends, repeating cycles, one-time shifts, continuing steps, periodic shocks, or just random points. This is the prelude for looking for more detailed patterns.
對于關系圖(圖標圖,2D散點圖,輪廓圖,氣泡圖,3D散點圖,表面圖和多變量圖),請查找趨勢和模式 。 您可能會發現線性或曲線趨勢,重復周期,一次移位,連續步驟,周期性沖擊或只是隨機點。 這是尋找更詳細模式的序幕。
更改 (Change)
Change usually refers to differences between time periods but, like snapshots, it could also refer to some common conditions. Change can be difficult, or at least complicated, to analyze because you must first calculate the changes you want to explore. When calculating changes, be sure the intervals of the change are consistent. But after that, here’s what might you do.
更改通常是指時間段之間的差異,但是,像快照一樣,它也可以指某些常見情況。 因為您必須首先計算要探索的變更,所以變更可能很難分析,或者至少很復雜。 計算更改時,請確保更改間隔一致。 但是之后,這就是您可能會做的。
First, look for very large, negative or positive changes. Are the percentages of change consistent for all variables? What might be some reasons for the changes.
首先,尋找非常大的,消極的或積極的變化。 所有變量的變化百分比是否一致? 進行更改可能是某些原因。
Calculate the mean and median changes. If the indicators of central tendency for the changes are not near zero, you might have a trend. Verify the possibility by plotting the change data. You might even consider conducting a statistical test to confirm that the change is different from zero. If you do think you have a pattern, trend, or anomaly, graphs are always the best place to look.
計算均值和中位數變化。 如果變化的主要趨勢指標不接近于零,則可能具有趨勢。 通過繪制更改數據來驗證可能性。 您甚至可以考慮進行統計測試,以確認更改不為零。 如果您確實認為自己有模式,趨勢或異常,則圖形始終是最佳的查看位置。
趨勢與模式 (Trends and Patterns)
There are at least ten types of data relationships — direct, feedback, common, mediated, stimulated, suppressed, inverse, threshold, and complex — and of course spurious relationships. They can all produce different patterns and trends, or no recognizable arrangement at all.
至少有十種類型的數據關系 -直接,反饋,公共,中介,刺激,抑制,逆向,閾值和復雜-當然是虛假關系。 它們都可以產生不同的模式和趨勢,或者根本沒有可識別的安排。

There are four patterns to look for:
有四種模式可尋找:
- Shocks 電擊
- Steps 腳步
- Shifts 轉變
- Cycles. 周期。
Shocks are seemingly random excursions far from the main body of data. They are outliers but they often reoccur, sometimes in a similar way suggesting a common, though sporadic cause. Some shocks may be attributed to an intermittent malfunction in the measurement instrument. Sometimes they occur in pairs, one in the positive direction and another of similar size in the negative direction. This is often seen when reporting dates for business data are missed.
沖擊似乎是遠離數據主體的隨機漂移。 它們是異常值,但它們經常重復出現,有時以類似的方式暗示了一個常見的零星原因。 某些沖擊可能歸因于測量儀器的間歇性故障。 有時它們成對出現,一個在正方向,另一個在大小相似,在負方向。 當錯過業務數據的報告日期時,通常會看到這種情況。
Steps are periodic increases or decreases in the body of the data. Steps progress in the same direction because they reflect a progressive change in conditions. If the steps are small enough, they can appear to be, and be analyzed as, a linear trend.
步驟是數據主體中的周期性增加或減少。 步驟沿同一方向前進,因為它們反映了條件的逐步變化。 如果步長足夠小,則它們看起來可能是線性趨勢,并且被分析為線性趨勢。
Shifts are increases and/or decreases in the body of the data like steps, but shifts tend to be longer than steps and don’t necessarily progress in the same direction. Shifts reflect occasional changes in conditions. The changes may remain or revert to the previous conditions, making them more difficult to analyze with linear models.
移位是數據主體(如步長)中的增加和/或減少,但移位往往比步長,并且不一定沿相同方向進行。 變動反映了情況的偶然變化。 這些更改可能會保留或恢復為先前的條件,從而使使用線性模型進行分析變得更加困難。
Cycles are increases and decreases in the body of the data that usually appear as a waveform having fairly consistent amplitudes and frequencies. Cycles reflect periodic changes in conditions, often associated with time, such as daily or seasonal cycles. Cycles cannot be analyzed effectively with linear models. Sometimes different cycles add together making them more difficult to recognize and analyze.
周期是數據主體中的增加和減少,通常以具有相當一致的幅度和頻率的波形形式出現。 周期反映出條件的周期性變化,通常與時間相關,例如每日或季節性周期。 使用線性模型無法有效地分析周期。 有時,不同的循環加在一起會使它們更加難以識別和分析。
Trends are often easy to identify because they are more familiar to most data analysts. Again, graphs are the best place to look for trends.
趨勢通常很容易識別,因為大多數數據分析人員對趨勢更為熟悉。 同樣, 圖形是尋找趨勢的最佳位置。

Linear trends are easy to see; the data form a line. Curvilinear trends can be more difficult to recognize because they don’t necessarily follow a set path. With some experience and intuition, however, they can be identified. Nonlinear trends look similar to curvilinear trends but they require more complicated nonlinear models to analyze. Curvilinear trends can be analyzed with linear models with the use of transformations.
線性趨勢很容易看到; 數據排成一行。 曲線趨勢可能更難以識別,因為它們不一定遵循設定的路徑。 但是,憑著一些經驗和直覺,就可以確定它們。 非線性趨勢看起來與曲線趨勢相似,但是它們需要更復雜的非線性模型進行分析。 曲線趨勢可以通過使用變換的線性模型進行分析。

There are also more complex trends involving different dimensions, including:
還有涉及不同方面的更復雜的趨勢,包括:
- Temporal 顳
- Spatial 空間空間
- Categorical 分類的
- Hidden 隱
- Multivariate 多變量
Temporal Trends can be more difficult to identify because Time-series data can be combinations of shocks, steps, shifts, cycles, and linear and curvilinear trends. The effects may be seasonal, superimposed on each other within a given time period, or spread over many different time periods. Confounded effects are often impossible to separate, especially if the data record is short or the sampled intervals are irregular or too large.
時間趨勢可能更難識別,因為時間序列數據可以是沖擊,階躍,移位,周期以及線性和曲線趨勢的組合。 這些影響可以是季節性的,也可以在給定的時間段內相互疊加,也可以分布在許多不同的時間段內。 混淆的效果通常是無法分離的,尤其是在數據記錄較短或采樣間隔不規則或太大的情況下。

Spatial Trends present a different twist. Time is one-dimensional (at least as we now know it); distance can be one-, two-, or three-dimensional. Distance can be in a straight line (“as the crow flies”) or along a path (such as driving distance). Defining the location of a unique point on a two-dimensional surface (i.e., a plane) requires at least two variables. The variables can represent coordinates (northing/easting, latitude/longitude) or distance and direction from a fixed starting point. At least three variables are needed to define a unique point location in a three-dimensional volume, so a variable for depth (or height) must be added to the location coordinates. Looking for spatial patterns involves interpolation of geographic data using one of several available algorithms, like moving averages, inverse distances, or geostatistics.
空間趨勢呈現出不同的變化。 時間是一維的(至少我們現在知道)。 距離可以是一維,二維或三維。 距離可以是直線(“烏鴉飛翔”)或沿路徑(例如行駛距離)。 在二維表面(即平面)上定義唯一點的位置至少需要兩個變量。 變量可以表示坐標(北/東,緯度/經度)或距固定起點的距離和方向。 至少需要三個變量來定義三維體積中的唯一點位置,因此必須將深度(或高度)變量添加到位置坐標中。 尋找空間模式涉及使用幾種可用算法之一對地理數據進行插值,例如移動平均值,反距離或地統計學 。

Categorical Trends are no more difficult to identify than any trend except you have to break out categories to do it, which can be a lot of work. One thing you might see when analyzing categories is Simpson’s paradox. The paradox occurs when trends appear in categories that are different from the overall group. Hidden Trends are trends that appear only in categories and not the overall group. You may be able to detect linear trends in categories without graphs if you have enough data in the categories to calculate correlation coefficients within each.
分類趨勢比任何趨勢都更容易識別,除了您必須細分類別來進行,這可能需要很多工作。 分析類別時,您可能會看到的一件事是Simpson的悖論 。 當趨勢出現在與整個組不同的類別中時,就會發生自相矛盾。 隱藏趨勢是僅顯示在類別中而不顯示在整個組中的趨勢。 如果您在類別中有足夠的數據來計算每個類別中的相關系數,則可以在沒有圖形的情況下檢測類別中的線性趨勢。
Multivariate Trends add a layer of complexity to most trends, which are bivariate. Still, you look for the same things, patterns and trends, only you have to examine at least one additional dimension. The extra dimension may be an additional axis or some other way of representing data, like icon type, size, or color.
多元趨勢為大多數是雙變量的趨勢增加了一層復雜性。 盡管如此,您仍在尋找相同的事物,模式和趨勢,只需要檢查至少一個額外的維度。 額外的維度可以是額外的軸或其他表示數據的方式,例如圖標類型,大小或顏色。
異常現象 (Anomalies)
Sometimes the most interesting revelations you can garner from a dataset are the ways that it doesn’t fit expectations. Three things to look for are:
有時,您可以從數據集中獲得的最有趣的啟示是它不符合預期的方式。 要尋找的三件事是:

- Censoring 審查制度
- Heteroskedasticity 異方差
- Outliers. 離群值。
Censoring is when a measurement is recorded as a <value or as a >value, indicating that the measurement instrument was unable to quantify the real value. For example, the real value may be outside the range of a meter, or counts can’t be approximated because there are too many or too few, or a time can only be estimated as before or after. Censoring is easy to detect in a dataset because they should be qualified with < or >.
刪減是指將測量記錄為<值或>值,表示測量儀器無法量化實際值。 例如,實際值可能超出了儀表的范圍, 或者由于數量太多或太少而無法近似計數, 或者只能估計之前或之后的時間。 審查在數據集中很容易檢測,因為它們應使用<或>進行限定。

Heteroskedasticity is when the variability in a variable is not uniform across its range. This is important because homo-scedasticity (the opposite of heteroskedasicity) is assumed by parametric statistics. Look for differing thicknesses in plotted data. This is often seen in automated measurements when a measurement instrument is upgraded to one with a greater precision.
異是當在一個變量中的變化是不是在其整個范圍內均勻。 這很重要,因為參數統計量假定為均方差性(與異方差性相反)。 在繪圖數據中查找不同的厚度。 當自動將測量儀器升級為更高精度的儀器時,通常會看到這種情況。
Influential observations and outliers are the data points that don’t fit the overall trends and patterns. Finding anomalies isn’t that difficult; deciding why they are anomalous and what to do with them are the really tough parts. Here are some examples of the types of outliers to look for.
有影響力的觀察結果和離群值是與總體趨勢和模式不符的數據點。 查找異常并不困難; 決定它們為什么異常以及如何處理它們是真正困難的部分。 以下是一些要查找的異常值類型的示例。

如何看待 (How and Where to Look)
That’s a lot of information to take in and remember, so here’s a summary you can refer to in the future if you ever need it.
需要記住的很多信息,因此,如果需要,這里是您將來可以參考的摘要。

And when you’re done, be sure to document your results so others can follow what you did.
完成后,請務必記錄您的結果,以便其他人可以照做。
Originally published at http://statswithcats.net on January 21, 2019.
最初于 2019年1月21日 發布在 http://statswithcats.net 上。
翻譯自: https://medium.com/@charliekufs/what-to-look-for-in-data-e63209bb9c30
海量數據尋找最頻繁的數據
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389803.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389803.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389803.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!