python實現關聯規則

　　代碼中Ci表示候選頻繁i項集，Li表示符合條件的頻繁i項集
　　
　　# coding=utf-8
　　
　　def createC1(dataSet): # 構建所有1項候選項集的集合
　　
　　C1 = []
　　
　　for transaction in dataSet:
　　
　　for item in transaction:
　　
　　if [item] not in C1:
　　
　　C1.append([item]) # C1添加的是列表，對于每一項進行添加，[[1], [2], [3], [4], [5]]
　　
　　#print('C1:',C1)
　　
　　return list(map(frozenset, C1)) # 使用frozenset，被“冰凍”的集合，為后續建立字典key-value使用。
　　
　　###由候選項集生成符合最小支持度的項集L。參數分別為數據集、候選項集列表，最小支持度
　　
　　###如
　　
　　###C3: [frozenset({1, 2, 3}), frozenset({1, 3, 5}), frozenset({2, 3, 5})]
　　
　　###L3: [frozenset({2, 3, 5})]
　　
　　def scanD(D, Ck, minSupport):
　　
　　ssCnt = {}
　　
　　for tid in D: # 對于數據集里的每一條記錄
　　
　　for can in Ck: # 每個候選項集can
　　
　　if can.issubset(tid): # 若是候選集can是作為記錄的子集，那么其值+1,對其計數
　　
　　if not ssCnt.__contains__(can): # ssCnt[can] = ssCnt.get(can,0)+1一句可破，沒有的時候為0,加上1,有的時候用get取出，加1
　　
　　ssCnt[can] = 1
　　
　　else:
　　
　　ssCnt[can] += 1
　　
　　numItems = float(len(D))
　　
　　retList = []
　　
　　supportData = {}
　　
　　for key in ssCnt:
　　
　　support = ssCnt[key] / numItems # 除以總的記錄條數，即為其支持度
　　
　　if support >= minSupport:
　　
　　retList.insert(0, key) # 超過最小支持度的項集，將其記錄下來。
　　
　　supportData[key] = support
　　
　　return retList, supportData
　　
　　###由Lk生成K項候選集Ck
　　
　　###如由L2: [frozenset({3, 5}), frozenset({2, 5}), frozenset({2, 3}), frozenset({1, 3})]
　　
　　###生成
　　
　　###C3: [frozenset({1, 2, 3}), frozenset({1, 3, 5}), frozenset({2, 3, 5})]
　　
　　def aprioriGen(Lk, k):
　　
　　retList = []
　　
　　lenLk = len(Lk)
　　
　　for i in range(lenLk):
　　
　　for j in range(i + 1,lenLk):
　　
　　if len(Lk[i] | Lk[j])==k:
　　
　　retList.append(Lk[i] | Lk[j])
　　
　　return list(set(retList))
　　
　　####生成所有頻繁子集
　　
　　def apriori(dataSet, minSupport=0.5):
　　
　　C1 = createC1(dataSet)
　　
　　D = list(map(set, dataSet))
　　
　　L1, supportData = scanD(D, C1, minSupport)
　　
　　L = [L1] # L將包含滿足最小支持度，即經過篩選的所有頻繁n項集，這里添加頻繁1項集
　　
　　k = 2
　　
　　while (len(L[k - 2]) > 0): # k=2開始，由頻繁1項集生成頻繁2項集，直到下一個打的項集為空
　　
　　Ck = aprioriGen(L[k - 2], k)
　　
　　Lk, supK = scanD(D, Ck, minSupport)
　　
　　supportData.update(supK) # supportData為字典，存放每個項集的支持度，并以更新的方式加入新的supK
　　
　　L.append(Lk)
　　
　　k += 1
　　
　　return L, supportData
　　
　　if __name__ == "__main__":
　　
　　dataSet = [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]]
　　
　　D = list(map(set, dataSet))
　　
　　L,suppData = apriori(dataSet)
　　
　　print('L:',L)
　　
　　print('suppData:',suppData)
　　
　　'''
　　
　　C1 = createC1(dataSet)
　　
　　L1, supportData1 = scanD(D, C1, 0.5)
　　
　　print('C1:',C1)
　　
　　print('L1:',L1)
　　
　　print('supportData1:',supportData1)
　　
　　C2 = aprioriGen(L1, 2)
　　
　　L2, supportData2 = scanD(D, C2, 0.5)
　　
　　print('C2:',C2)
　　
　　print('L2:',L2)
　　
　　print('supportData2:www.gcyl152.com/',supportData2)
　　
　　C3 = aprioriGen(L2, 3)
　　
　　L3, supportData3 www.michenggw.com= scanD(D, C3, 0.5)
　　
　　print('C3:',C3)
　　
　　print('L3:',L3)
　　
　　print('supportData3:',supportData3)
　　
　　'''
　　
　　最終得到的所有支持度大于0.5的頻繁子集及其支持度如下：
　　
　　?? ??? ?frozenset({1})www.mcyllpt.com/ : 0.5,?
　　
　　?? ??? ?frozenset({3}): 0.75,?
　　
　　?? ??? ?frozenset({4}): 0.25,?
　　
　　?? ??? ?frozenset({2}): 0.75,?
　　
　　?? ??? ?frozenset({5}): 0.75,?
　　
　　?? ??? ?frozenset({1, 3}): 0.5,?
　　
　　?? ??? ?frozenset({2, 3}): 0.5,?
　　
　　?? ??? ?frozenset({2, 5}): 0.75,?
　　
　　?? ??? ?frozenset({3, 5}): 0.5,?
　　
　　?? ??? ?frozenset({1, 2}): 0.25,?
　　
　　?? ??? ?frozenset({1, 5}): 0.25,?
　　
　　?? ??? ?frozenset({2, 3, 5}): 0.5,?
　　
　　?? ??? ?frozenset({1, 2, 3}): 0.25,?
　　
　　?? ??? ?frozenset({1, 3, 5}): 0.25

轉載于:https://www.cnblogs.com/qwangxiao/p/10121889.html

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/278677.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/278677.shtml
英文地址，請注明出處：http://en.pswp.cn/news/278677.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！