這篇文章主要介紹了Python聚類算法之DBSACN,結合實例形式詳細分析了DBSACN算法的原理與具體實現技巧,具有一定參考借鑒價值,需要的朋友可以參考下
本文實例講述了Python聚類算法之DBSACN。分享給大家供大家參考,具體如下:
DBSCAN:
是一種簡單的,基于密度的聚類算法。本次實現中,DBSCAN使用了基于中心的方法。在基于中心的方法中,每個數據點的密度通過對以該點為中心以邊長為2*EPs的網格(鄰域)內的其他數據點的個數來度量。根據數據點的密度分為三類點:
核心點:該點在鄰域內的密度超過給定的閥值MinPs。
邊界點:該點不是核心點,但是其鄰域內包含至少一個核心點。
噪音點:不是核心點,也不是邊界點。
有了以上對數據點的劃分,聚合可以這樣進行:各個核心點與其鄰域內的所有核心點放在同一個簇中,把邊界點跟其鄰域內的某個核心點放在同一個簇中。
- # scoding=utf-8
- import pylab as pl
- from collections import defaultdict,Counter
- points = [[int(eachpoint.split("#")[0]), int(eachpoint.split("#")[1])] for eachpoint in open("points","r")]
- # 計算每個數據點相鄰的數據點,鄰域定義為以該點為中心以邊長為2*EPs的網格
- Eps = 10
- surroundPoints = defaultdict(list)
- for idx1,point1 in enumerate(points):
- for idx2,point2 in enumerate(points):
- if (idx1 < idx2):
- if(abs(point1[0]-point2[0])<=Eps and abs(point1[1]-point2[1])<=Eps):
- surroundPoints[idx1].append(idx2)
- surroundPoints[idx2].append(idx1)
- # 定義鄰域內相鄰的數據點的個數大于4的為核心點
- MinPts = 5
- corePointIdx = [pointIdx for pointIdx,surPointIdxs in surroundPoints.iteritems() if len(surPointIdxs)>=MinPts]
- # 鄰域內包含某個核心點的非核心點,定義為邊界點
- borderPointIdx = []
- for pointIdx,surPointIdxs in surroundPoints.iteritems():
- if (pointIdx not in corePointIdx):
- for onesurPointIdx in surPointIdxs:
- if onesurPointIdx in corePointIdx:
- borderPointIdx.append(pointIdx)
- break
- # 噪音點既不是邊界點也不是核心點
- noisePointIdx = [pointIdx for pointIdx in range(len(points)) if pointIdx not in corePointIdx and pointIdx not in borderPointIdx]
- corePoint = [points[pointIdx] for pointIdx in corePointIdx]
- borderPoint = [points[pointIdx] for pointIdx in borderPointIdx]
- noisePoint = [points[pointIdx] for pointIdx in noisePointIdx]
- # pl.plot([eachpoint[0] for eachpoint in corePoint], [eachpoint[1] for eachpoint in corePoint], 'or')
- # pl.plot([eachpoint[0] for eachpoint in borderPoint], [eachpoint[1] for eachpoint in borderPoint], 'oy')
- # pl.plot([eachpoint[0] for eachpoint in noisePoint], [eachpoint[1] for eachpoint in noisePoint], 'ok')
- groups = [idx for idx in range(len(points))]
- # 各個核心點與其鄰域內的所有核心點放在同一個簇中
- for pointidx,surroundIdxs in surroundPoints.iteritems():
- for oneSurroundIdx in surroundIdxs:
- if (pointidx in corePointIdx and oneSurroundIdx in corePointIdx and pointidx < oneSurroundIdx):
- for idx in range(len(groups)):
- if groups[idx] == groups[oneSurroundIdx]:
- groups[idx] = groups[pointidx]
- # 邊界點跟其鄰域內的某個核心點放在同一個簇中
- for pointidx,surroundIdxs in surroundPoints.iteritems():
- for oneSurroundIdx in surroundIdxs:
- if (pointidx in borderPointIdx and oneSurroundIdx in corePointIdx):
- groups[pointidx] = groups[oneSurroundIdx]
- break
- # 取簇規模最大的5個簇
- wantGroupNum = 3
- finalGroup = Counter(groups).most_common(3)
- finalGroup = [onecount[0] for onecount in finalGroup]
- group1 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[0]]
- group2 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[1]]
- group3 = [points[idx] for idx in xrange(len(points)) if groups[idx]==finalGroup[2]]
- pl.plot([eachpoint[0] for eachpoint in group1], [eachpoint[1] for eachpoint in group1], 'or')
- pl.plot([eachpoint[0] for eachpoint in group2], [eachpoint[1] for eachpoint in group2], 'oy')
- pl.plot([eachpoint[0] for eachpoint in group3], [eachpoint[1] for eachpoint in group3], 'og')
- # 打印噪音點,黑色
- pl.plot([eachpoint[0] for eachpoint in noisePoint], [eachpoint[1] for eachpoint in noisePoint], 'ok')
- pl.show()
運行效果截圖如下:
希望本文所述對大家Python程序設計有所幫助。
新聞熱點
疑難解答