上凸包和下凸包_使用凸包聚類

上凸包和下凸包

I recently came across the article titled High-dimensional data clustering by using local affine/convex hulls by HakanCevikalp in Pattern Recognition Letters. It proposes a novel algorithm to cluster high-dimensional data using local affine/convex hulls. I was inspired by their method of using convex hulls for clustering. I wanted to give a try at implementing my own simple clustering approach using convex hulls. So, in this article, I will walk you through my implementation of my clustering approach using convex hulls. Before we get into coding, let’s see what a convex hull is.

我最近在“ 模式識別字母”中碰到了一篇文章,標題為HakanCevikalp 使用本地仿射/凸包來進行高維數據聚類 。 提出了一種使用局部仿射/凸包對高維數據進行聚類的新算法。 他們使用凸包進行聚類的方法給我啟發。 我想嘗試使用凸包實現我自己的簡單聚類方法。 因此,在本文中,我將引導您完成使用凸包的聚類方法的實現。 在進行編碼之前,讓我們看看什么是凸包。

凸包 (Convex Hull)

According to Wikipedia, a convex hull is defined as follows.

根據維基百科 ,凸包的定義如下。

In geometry, the convex hull or convex envelope or convex closure of a shape is the smallest convex set that contains it.

在幾何中,形狀的凸包或凸包絡或凸包是包含該形狀的最小凸集。

Let us consider an example of a simple analogy. Assume that there are a few nails hammered half-way into a plank of wood as shown in Figure 1. You take a rubber band, stretch it to enclose the nails and let it go. It will fit around the outermost nails (shown in blue) and take a shape that minimizes its length. The area enclosed by the rubber band is called the convex hull of the set of nails.

讓我們考慮一個簡單類比的例子。 如圖1所示,假設有一些釘子被釘在一塊木板上。將橡皮筋拉開,將其拉緊以包住釘子,然后松開。 它將適合最外面的釘子(以藍色顯示),并具有使長度最小化的形狀。 橡皮筋包圍的區域稱為釘組的凸包

This convex hull (shown in Figure 1) in 2-dimensional space will be a convex polygon where all its interior angles are less than 180°. If it is in a 3-dimensional or higher-dimensional space, the convex hull will be a polyhedron.

這個在二維空間中的凸包(如圖1所示)將是一個凸多邊形 ,其所有內角均小于180°。 如果在3維或更高維空間中,則凸包將是多面體

There are several algorithms that can determine the convex hull of a given set of points. Some famous algorithms are the gift wrapping algorithm and the Graham scan algorithm.

有幾種算法可以確定給定點集的凸包。 一些著名的算法是禮品包裝算法和Graham掃描算法 。

Since a convex hull encloses a set of points, it can act as a cluster boundary, allowing us to determine points within a cluster. Hence, we can make use of convex hulls and perform clustering. Let’s get into the code.

由于凸包包圍著一組點,因此它可以充當群集邊界,從而使我們能夠確定群集中的點。 因此,我們可以利用凸包并執行聚類。 讓我們進入代碼。

一個簡單的例子 (A Simple Example)

I will be using Python for this example. Before getting started, we need the following Python libraries.

我將在此示例中使用Python。 在開始之前,我們需要以下Python庫。

sklearn
numpy
matplotlib
mpl_toolkits
itertools
scipy
quadprog

數據集 (Dataset)

To create our sample dataset, I will be using sci-kit learn library’s make blobs function. I will make 3 clusters.

為了創建示例數據集,我將使用sci-kit學習庫的make blobs函數。 我將制作3個群集。

import numpy as np
from sklearn.datasets import make_blobscenters = [[0, 1, 0], [1.5, 1.5, 1], [1, 1, 1]]
stds = [0.13, 0.12, 0.12]X, labels_true = make_blobs(n_samples=1000, centers=centers, cluster_std=stds, random_state=0)
point_indices = np.arange(1000)

Since this is a dataset of points with 3 dimensions, I will be drawing a 3D plot to show our ground truth clusters. Figure 2 denotes the scatter plot of the dataset with coloured clusters.

由于這是3維點的數據集,因此我將繪制3D圖以顯示我們的地面真相群集。 圖2表示帶有彩色簇的數據集的散點圖。

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3Dx = X[:,0]
y = X[:,1]
z = X[:,2]
# Creating figure
fig = plt.figure(figsize = (15, 10))
ax = plt.axes(projection ="3d")

# Add gridlines
ax.grid(b = True, color ='grey',
linestyle ='-.', linewidth = 0.3,
alpha = 0.2)

mycolours = ["red", "green", "blue"]# Creating color map
col = [mycolours[i] for i in labels_true]# Creating plot
sctt = ax.scatter3D(x, y, z, c = col, marker ='o')plt.title("3D scatter plot of the data\n")
ax.set_xlabel('X-axis', fontweight ='bold')
ax.set_ylabel('Y-axis', fontweight ='bold')
ax.set_zlabel('Z-axis', fontweight ='bold')

# show plot
plt.draw()
Image for post
Fig 2. Initial scatter plot of the dataset
圖2.數據集的初始散點圖

獲取初始聚類 (Obtaining an Initial Clustering)

First, we need to break our dataset into 2 parts. One part will be used as seeds to obtain an initial clustering using K-means. The points in the other part will be assigned to clusters based on the initial clustering.

首先,我們需要將數據集分為兩部分。 一部分將用作種子,以使用K均值獲得初始聚類。 另一部分中的點將根據初始聚類分配給聚類。

from sklearn.model_selection import train_test_splitX_seeds, X_rest, y_seeds, y_rest, id_seeds, id_rest = train_test_split(X, labels_true, point_indices, test_size=0.33, random_state=42)

Now we perform K-means clustering on the seed points.

現在我們對種子點執行K-均值聚類。

from sklearn.cluster import KMeanskmeans = KMeans(n_clusters=3, random_state=9).fit(X_seeds)
initial_result = kmeans.labels_

Since the resulting labels may not be the same as the ground truth labels, we have to map the two sets of labels. For this, we can use the following function.

由于生成的標簽可能與地面真相標簽不同,因此我們必須映射兩組標簽。 為此,我們可以使用以下功能。

from itertools import permutations# Source: https://stackoverflow.com/questions/11683785/how-can-i-match-up-cluster-labels-to-my-ground-truth-labels-in-matlabdef remap_labels(pred_labels, true_labels):    pred_labels, true_labels = np.array(pred_labels), np.array(true_labels)
assert pred_labels.ndim == 1 == true_labels.ndim
assert len(pred_labels) == len(true_labels)
cluster_names = np.unique(pred_labels)
accuracy = 0 perms = np.array(list(permutations(np.unique(true_labels)))) remapped_labels = true_labels for perm in perms: flipped_labels = np.zeros(len(true_labels))
for label_index, label in enumerate(cluster_names):
flipped_labels[pred_labels == label] = perm[label_index] testAcc = np.sum(flipped_labels == true_labels) / len(true_labels) if testAcc > accuracy:
accuracy = testAcc
remapped_labels = flipped_labels return accuracy, remapped_labels

We can get the accuracy and the mapped initial labels from the above function.

我們可以從上面的函數中獲得準確性和映射的初始標簽。

intial_accuracy, remapped_initial_result = remap_labels(initial_result, y_seeds)

Figure 3 denotes the initial clustering of the seed points.

圖3表示種子點的初始聚類。

Image for post
Fig 3. Initial clustering of the seed points using K-means
圖3.使用K均值的種子點初始聚類

獲取初始聚類的凸包 (Get Convex Hulls of the Initial Clustering)

Once we have obtained an initial clustering, we can get the convex hulls for each cluster. First, we have to get the indices of each data point in the clusters.

一旦獲得初始聚類,就可以獲取每個聚類的凸包。 首先,我們必須獲取群集中每個數據點的索引。

# Get the idices of the data points belonging to each cluster
indices = {}for i in range(len(id_seeds)):
if int(remapped_initial_result[i]) not in indices:
indices[int(remapped_initial_result[i])] = [i]
else:
indices[int(remapped_initial_result[i])].append(i)

Now we can obtain the convex hulls from each cluster.

現在我們可以從每個聚類中獲得凸包。

from scipy.spatial import ConvexHull# Get convex hulls for each cluster
hulls = {}for i in indices:
hull = ConvexHull(X_seeds[indices[i]])
hulls[i] = hull

Figure 4 denotes the convex hulls representing each of the 3 clusters.

圖4表示分別代表3個群集的凸包。

Image for post
Fig 4. Convex hulls of each cluster
圖4.每個群集的凸包

將剩余點分配給最接近的凸包的群集 (Assign Remaining Points to the Cluster of the Closest Convex Hull)

Now that we have the convex hulls of the initial clusters, we can assign the remaining points to the cluster of the closest convex hull. First, we have to get the projection of the data point on to a convex hull. To do so, we can use the following function.

現在我們有了初始聚類的凸包,我們可以將其余點分配給最接近的凸包的聚類。 首先,我們必須將數據點投影到凸包上。 為此,我們可以使用以下功能。

from quadprog import solve_qp# Source: https://stackoverflow.com/questions/42248202/find-the-projection-of-a-point-on-the-convex-hull-with-scipydef proj2hull(z, equations):    G = np.eye(len(z), dtype=float)
a = np.array(z, dtype=float)
C = np.array(-equations[:, :-1], dtype=float)
b = np.array(equations[:, -1], dtype=float) x, f, xu, itr, lag, act = solve_qp(G, a, C.T, b, meq=0, factorized=True) return x

The problem of finding the projection of a point on a convex hull can be solved using quadratic programming. The above function makes use of the quadprog module. You can install the quadprog module using conda or pip.

查找點在凸包上的投影的問題可以使用二次編程解決。 上面的功能利用了quadprog模塊。 您可以安裝quadprog使用模塊condapip

conda install -c omnia quadprog
OR
pip install quadprog

I won’t go into details about how to solve this problem using quadratic programming. If you are interested, you can read more from here and here.

我不會詳細介紹如何使用二次編程解決此問題。 如果您有興趣,可以從這里和這里內容。

Image for post
Fig 5. The distance from a point to its projection on to a convex hull
圖5.從點到投影到凸包上的距離

Once you have obtained the projection on the convex hull, you can calculate the distance from the point to the convex hull as shown in Figure 5. Based on this distance, now let’s assign the remaining data points to the cluster of the closest convex hull.

一旦獲得了凸包的投影,就可以計算從點到凸包的距離,如圖5所示。現在,基于該距離,我們將剩余的數據點分配給最近的凸包的群集。

I will consider the Euclidean distance from the data point to its projection on the convex hull. Then the data point will be assigned to the cluster with the convex hull having the shortest distance from that data point. If a point lies within the convex hull, then the distance will be 0.

我將考慮從數據點到其在凸包上的投影的歐幾里得距離。 然后,將數據點分配給群集,其中凸包距該數據點的距離最短。 如果點位于凸包內,則距離將為0。

prediction = []for z1 in X_rest:    min_cluster_distance = 100000
min_distance_point = ""
min_cluster_distance_hull = ""

for i in indices: p = proj2hull(z1, hulls[i].equations) dist = np.linalg.norm(z1-p) if dist < min_cluster_distance: min_cluster_distance = dist
min_distance_point = p
min_cluster_distance_hull = i prediction.append(min_cluster_distance_hull)prediction = np.array(prediction)

Figure 6 denotes the final clustering result.

圖6表示最終的聚類結果。

Image for post
Fig 6. Final result with convex hulls
圖6.凸包的最終結果

評估最終結果 (Evaluate the Final Result)

Let’s evaluate our result to see how accurate it is.

讓我們評估我們的結果以查看其準確性。

from sklearn.metrics import accuracy_scoreY_pred = np.concatenate((remapped_initial_result, prediction))
Y_real = np.concatenate((y_seeds, y_rest))
print(accuracy_score(Y_real, Y_pred))

I got an accuracy of 1.0 (100%)! Awesome and exciting right? 😊

我的準確度是1.0(100%)! 太棒了,令人興奮吧? 😊

If you want to know more about evaluating clustering results, you can check out my previous article Evaluating Clustering Results.

如果您想了解有關評估聚類結果的更多信息,可以查閱我之前的文章評估聚類結果 。

I have used a very simple dataset. You can try this method with more complex datasets and see what happens.

我使用了一個非常簡單的數據集。 您可以對更復雜的數據集嘗試此方法,然后看看會發生什么。

高維數據 (High-dimensional data)

I also tried to cluster a dataset with data points having 8 dimensions using my cluster hull method. You can find the jupyter notebook showing the code and results. The final results are as follows.

我還嘗試使用我的群集包方法將數據集與8個維度的數據點群集在一起。 您可以找到顯示代碼和結果的jupyter筆記本 。 最終結果如下。

Accuracy of K-means method: 0.866
Accuracy of Convex Hull method: 0.867

There is a slight improvement in my convex hull method over K-means.

與K均值相比,我的凸包方法略有改進。

最后的想法 (Final Thoughts)

The article titled High-dimensional data clustering by using local affine/convex hulls by HakanCevikalp shows that the convex hull-based method they proposed avoids the “hole artefacts” problem (the sparse and irregular distributions in high-dimensional spaces can make the nearest-neighbour distances unreliable) and improves the accuracy of high-dimensional datasets over other state-of-the-art subspace clustering methods.

由HakanCevikalp撰寫的使用局部仿射/凸包進行高維數據聚類的文章顯示,他們提出的基于凸包的方法避免了“ Kong偽像 ”問題(高維空間中稀疏和不規則的分布可以使最近的鄰居距離不可靠),并比其他最新的子空間聚類方法提高了高維數據集的準確性。

You can find the jupyter notebook containing the code used for this article.

您可以找到包含本文所用代碼的jupyter筆記本 。

Hope this article was interesting and useful.

希望本文有趣而有用。

Cheers! 😃

干杯! 😃

翻譯自: https://towardsdatascience.com/clustering-using-convex-hulls-fddafeaa963c

上凸包和下凸包

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/389017.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/389017.shtml
英文地址,請注明出處:http://en.pswp.cn/news/389017.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

sqlmap手冊

sqlmap用戶手冊 | by WooYun知識庫 sqlmap用戶手冊 當給sqlmap這么一個url (http://192.168.136.131/sqlmap/mysql/get_int.php?id1) 的時候&#xff0c;它會&#xff1a; 1、判斷可注入的參數 2、判斷可以用那種SQL注入技術來注入 3、識別出哪種數據庫 4、根據用戶選擇&…

幸運三角形 南陽acm491(dfs)

幸運三角形 時間限制&#xff1a;1000 ms | 內存限制&#xff1a;65535 KB 難度&#xff1a;3描述話說有這么一個圖形&#xff0c;只有兩種符號組成&#xff08;‘’或者‘-’&#xff09;&#xff0c;圖形的最上層有n個符號&#xff0c;往下個數依次減一&#xff0c;形成倒置…

jsforim

var isMouseDownfalse;var isFirsttrue;var centerdivObj;var ndiv1;var ndiv2;var ndiv3;var kjX;var kjY; window.οnerrοrfunction(){ return true;}; var thurlhttp://qq.jutoo.net/;var wzId12345; function createDiv(){ var sWscreen.width; var sHscree…

決策樹有框架嗎_決策框架

決策樹有框架嗎In a previous post, I mentioned that thinking exhaustively is exhausting! Volatility and uncertainty are ever present and must be factored into our decision making — yet, we often don’t have the time or data to properly account for it.在上一…

湊個熱鬧-LayoutInflater相關分析

前言 最近給組內同學做了一次“動態換膚和換文案”的主題分享&#xff0c;其中的核心就是LayoutInflater類&#xff0c;所以把LayoutInflater源碼梳理了一遍。巧了&#xff0c;這周掘金新榜和部分公眾號都發布了LayoutInflater或者換膚主題之類的文章。那只好站在各位大佬的肩膀…

ASP.NET Core文件上傳、下載與刪除

首先我們需要創建一個form表單如下: <form method"post" enctype"multipart/form-data" asp-controller"UpLoadFile" asp-action"FileSave"> <div> <div> <p>Form表單多個上傳文件:</p> <input type…

8 一點就消失_消失的莉莉安(26)

文|明鳶Hi&#xff0c;中午好&#xff0c;我是暖叔今天是免費連載《消失的莉莉安》第26章消失的莉莉安??往期鏈接&#xff1a;▼ 向下滑動閱讀1&#xff1a;“消失的莉莉安(1)”2&#xff1a; 消失的莉莉安(2)3&#xff1a;“消失的莉莉安(3)”4&#xff1a;“消失的莉莉安…

透明的WinForm窗體

this.Location new System.Drawing.Point(100, 100); this.Cursor System.Windows.Forms.Cursors.Hand; // 定義在窗體上&#xff0c;光標顯示為手形 this.Text "透明的WinForm窗體&#xff01;"; // 定義窗體的標題…

mysql那本書適合初學者_3本書適合初學者

mysql那本書適合初學者為什么要書籍&#xff1f; (Why Books?) The internet is a treasure-trove of information on a variety of topics. Whether you want to learn guitar through Youtube videos or how to change a tire when you are stuck on the side of the road, …

junit與spring-data-redis 版本對應成功的

spring-data-redis 版本:1.7.2.RELEASE junit 版本:4.12 轉載于:https://www.cnblogs.com/austinspark-jessylu/p/9366863.html

語音對話系統的設計要點與多輪對話的重要性

這是阿拉燈神丁Vicky的第 008 篇文章就從最近短視頻平臺的大媽與機器人快寶的聊天說起吧。某銀行內&#xff0c;一位阿姨因等待辦理業務的時間太長&#xff0c;與快寶機器人展開了一場來自靈魂的對話。對于銀行工作人員的不滿&#xff0c;大媽向快寶說道&#xff1a;“你們的工…

c讀取txt文件內容并建立一個鏈表_C++鏈表實現學生信息管理系統

可以增刪查改&#xff0c;使用鏈表存儲&#xff0c;支持排序以及文件存儲及數據讀取&#xff0c;基本可以應付期末大作業&#xff08;狗頭&#xff09; 界面為源代碼為一個main.cpp和三個頭文件&#xff0c;具體為 main.cpp#include <iostream> #include <fstream>…

注冊表啟動

public void SetReg() { RegistryKey hklmRegistry.LocalMachine; RegistryKey runhklm.CreateSubKey("Software/Microsoft/Windows/CurrentVersion/Run"); //定義hklm指向注冊表的LocalMachine,對注冊表的結構&#xff0c;可以在windows的運行里&#…

閻焱多少身價_2020年,數據科學家的身價是多少?

閻焱多少身價Photo by Christine Roy on Unsplash克里斯汀羅伊 ( Christine Roy) 攝于Unsplash Although we find ourselves in unprecedented times of uncertainty, current events have shown just how valuable the fields of Data Science and Computer Science truly are…

Django模型定義參考

字段 對字段名稱的限制 字段名不能是Python的保留字&#xff0c;否則會導致語法錯誤字段名不能有多個連續下劃線&#xff0c;否則影響ORM查詢操作Django模型字段類 字段類說明AutoField自增ID字段BigIntegerField64位有符號整數BinaryField存儲二進制數據的字段&#xff0c;對應…

精通Quartz-入門-Job

JobDetail實例&#xff0c;并且&#xff0c;它通過job的類代碼引用這個job來執行。每次調度器執行job時&#xff0c;它會在調用job的execute(..)方法之前創建一個他的實例。這就帶來了兩個事實&#xff1a;一、job必須有一個不帶參數的構造器&#xff0c;二、在job類里定義數據…

單據打印_Excel多功能進銷存套表,自動庫存單據,查詢打印一鍵操作

Hello大家好&#xff0c;我是幫幫。今天跟大家分享一張Excel多功能進銷存管理套表&#xff0c;自動庫存&#xff0c;單據打印&#xff0c;查詢統算一鍵操作。為了讓大家能更穩定的下載模板&#xff0c;我們又開通了全新下載方式(見文章末尾)&#xff0c;以便大家可以輕松獲得免…

卡爾曼濾波濾波方程_了解卡爾曼濾波器及其方程

卡爾曼濾波濾波方程Before getting into what a Kalman filter is or what it does, let’s first do an exercise. Open the google maps application on your phone and check your device’s current location.在了解什么是卡爾曼濾波器或其功能之前&#xff0c;我們先做一個…

js中的new()到底做了些什么??

要創建 Person 的新實例&#xff0c;必須使用 new 操作符。以這種方式調用構造函數實際上會經歷以下 4個步驟&#xff1a;(1) 創建一個新對象&#xff1b;(2) 將構造函數的作用域賦給新對象&#xff08;因此 this 就指向了這個新對象&#xff09; &#xff1b;(3) 執行構造函數…

Candidate sampling:NCE loss和negative sample

在工作中用到了類似于negative sample的方法&#xff0c;才發現我其實并不了解candidate sampling。于是看了一些相關資料&#xff0c;在此簡單總結一些相關內容。 主要內容來自tensorflow的candidate_sampling和卡耐基梅隆大學一個學生寫的一份notesNotes on Noise Contrastiv…