๐Ÿ“’ Today I Learn/๐Ÿ Python

๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ดํ•ด์™€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉ (6) ๋น„์ง€๋„ํ•™์Šต

ny:D 2024. 6. 11. 23:19

240611 Today I Learn

๋น„์ง€๋„ ํ•™์Šต 

๐Ÿ’ก ๋น„์ง€๋„ํ•™์Šต
๋‹ต์„ ์•Œ๋ ค์ฃผ์ง€ ์•Š๊ณ  ๊ณต๋ถ€์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•
- ์—ฐ๊ด€๊ทœ์น™
- ๊ตฐ์ง‘ํ™”

 K-Means Clustering

๐Ÿ’ก K-Means Clustering
์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ k๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๋ฌถ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ, ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์™€ ๊ฑฐ๋ฆฌ ์ฐจ์ด์˜ ๋ถ„์‚ฐ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋™์ž‘ํ•œ๋‹ค.
  • ์žฅ์  : ์ผ๋ฐ˜์ ์ด๊ณ  ์ ์šฉํ•˜๊ธฐ ์‰ฌ์›€
  • ๋‹จ์ 
    • ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ€๊นŒ์›€์„ ์ธก์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ฐจ์›์ด ๋งŽ์„ ์ˆ˜๋ก ์ •ํ™•๋„๊ฐ€ ๋–จ์–ด์ง
    • ๋ฐ˜๋ณต ํšŸ์ˆ˜๊ฐ€ ๋งŽ์„ ์ˆ˜๋ก ์‹œ๊ฐ„์ด ๋А๋ ค์ง
    • ๋ช‡ ๊ฐœ์˜ ๊ตฐ์ง‘(K)์„ ์„ ์ •ํ• ์ง€ ์ฃผ๊ด€์ ์ž„
    • ํ‰๊ท ์„ ์ด์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์—(์ค‘์‹ฌ์ ) ์ด์ƒ์น˜์— ์ทจ์•ฝํ•จ

์ข‹์€ ๊ตฐ์ง‘ํ™”๋ž€?

  • ์‹ค๋ฃจ์—ฃ ๊ฐ’์ด ๋†’์„์ˆ˜๋ก(1์— ๊ฐ€๊นŒ์›€)
  • ๊ฐœ๋ณ„ ๊ตฐ์ง‘์˜ ํ‰๊ท  ๊ฐ’์˜ ํŽธ์ฐจ๊ฐ€ ํฌ์ง€ ์•Š์„ ์ˆ˜๋ก ์ข‹์€ ๊ตฐ์ง‘ํ™”์ด๋‹ค.

๊ตฐ์ง‘ํ™” ์‹ค์Šต - iris

  • ๋ฐ์ดํ„ฐ ๋ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
# ๊ธฐ๋ณธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# seaborn iris ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
iris = sns.load_dataset('iris')

# label(์ด ๊ฒฝ์šฐ species)์ด ์—†๋Š” ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ๋งŒ๋“ค๊ธฐ -> ๊ตฐ์ง‘ํ™”์šฉ
iris2 = iris.copy()
iris2 = iris2.drop('species',axis=1)
  • K-Means Clustering
sklearn.cluster.KMeans
  • ํ•จ์ˆ˜ ์ž…๋ ฅ ๊ฐ’
    • n_cluster: ๊ตฐ์ง‘ํ™” ๊ฐฏ์ˆ˜ → ์ง€์ •ํ•ด ์ค˜์•ผํ•จ. (๊ณ„์† ๋ฐ”๊ฟ”๊ฐ€๋ฉด์„œ ์‹คํ—˜)
    • max_iter: ์ตœ๋Œ€ ๋ฐ˜๋ณต ํšŸ์ˆ˜ → ์–ผ๋งˆ๋‚˜ ๊ตฐ์ง‘ํ™”๋ฅผ ๋ฐ˜๋ณตํ•ด๋ณผ ๊ฒƒ์ธ์ง€
  • ๋ฉ”์†Œ๋“œ
    • labels_: ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ์†ํ•œ ๊ตฐ์ง‘ ์ค‘์‹ฌ์  ๋ ˆ์ด๋ธ”
    • cluster_centers: ๊ฐ ๊ตฐ์ง‘ ์ค‘์‹ฌ์ ์˜ ์ขŒํ‘œ
# ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
from sklearn.cluster import KMeans

# KMeans(n_clusters = ๊ตฐ์ง‘์ˆ˜, init = array shape)
kmeans = KMeans(n_clusters = 3, init = 'k-means++', max_iter = 300, random_state= 42)

# fitting
kmeans.fit(iris2)
  • Original vs. K-Means

original vs. clustering

plt.figure(figsize = (12,6))
plt.subplot(1,2,1)
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', hue='species', palette = 'husl')
plt.title('Original')

plt.subplot(1,2,2)
sns.scatterplot(data = iris2, x = 'sepal_length', y = 'sepal_width', hue = 'cluster', palette= 'pastel')
plt.title('Clustering')
plt.show()

→ original๊ณผ ๊ฑฐ์˜ ๋น„์Šทํ•˜๊ฒŒ clustering๋œ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ตฐ์ง‘ํ™” ํ‰๊ฐ€ ์ง€ํ‘œ

๐Ÿ’ก ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜
๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์™€ ์ฃผ์œ„ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋“ค๊ณผ์˜ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ(์œ ํด๋ฆฌ๋“œ)์„ ํ†ตํ•ด ๊ฐ’์„ ๊ตฌํ•˜๋ฉฐ, ๊ตฐ์ง‘ ์•ˆ์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋“ค์€ ์ž˜ ๋ชจ์—ฌ์žˆ๋Š”์ง€, ๊ตฐ์ง‘๋ผ๋ฆฌ๋Š” ์„œ๋กœ ์ž˜ ๊ตฌ๋ถ„๋˜๋Š”์ง€ ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ํ‰๊ฐ€ํ•˜๋Š” ์ฒ™๋„๋กœ ํ™œ์šฉ๋œ๋‹ค.


  • ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๊ฐ€ 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ๊ทผ์ฒ˜์˜ ๊ตฐ์ง‘๊ณผ ๋” ๋ฉ€๋ฆฌ ๋–จ์–ด์ง€๊ณ , 
  • 0์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก ๊ทผ์ฒ˜ ๊ตฐ์ง‘๊ณผ ๊ฐ€๊นŒ์›Œ ์ง„๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.

์‹ค์Šต - RFM ๊ณ ๊ฐ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜

๐Ÿ’พ ํ™œ์šฉ ๋ฐ์ดํ„ฐ์…‹

retail.head()

retail = pd.read_excel('Online Retail.xlsx')

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

๊ฒฐ์ธก์น˜ & ์ด์ƒ์น˜ ์ฒ˜๋ฆฌ

# ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ
# customer id ๊ฒฐ์ธก์น˜ ์‚ญ์ œ
cond1 = (retail['CustomerID'].notnull())

# invoice๊ฐ€ c๋กœ ์‹œ์ž‘๋˜๊ฑฐ๋‚˜ 
cond2 = (retail['InvoiceNo'].astype(str).str[0] != 'C')

# quantity๊ฐ€ ์Œ์ˆ˜์ด๊ฑฐ๋‚˜
cond3 = retail['Quantity']>0

# unit price๊ฐ€ ์Œ์ˆ˜์ธ ๊ฒƒ์€ ๋ชจ๋‘ ์‚ญ์ œ
cond4 = retail['UnitPrice']>0

retail2 = retail[cond1 & cond2 & cond3 & cond4]
  • CustomerID ๊ฒฐ์ธก์น˜ ์‚ญ์ œ
  • invoice๊ฐ€ c๋กœ ์‹œ์ž‘๋˜๊ฑฐ๋‚˜, quantity๊ฐ€ ์Œ์ˆ˜์ด๊ฑฐ๋‚˜, unit price๊ฐ€ ์Œ์ˆ˜์ธ ๊ฒƒ์€ ๋ชจ๋‘ ์‚ญ์ œ

์˜๊ตญ ๋ฐ์ดํ„ฐ๋งŒ ์„ ํƒ

retail2['Country'].value_counts()[:10] → cond5 ์ ์šฉ ํ›„ retail2.value_counts()

cond5 = (retail2['Country']=='United Kingdom')
retail2 = retail2[cond5]

RFM ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜

 

ํ†ต๊ณ„์•ผ ๋†€์ž (4) ์ง€๋„ํ•™์Šต๊ณผ ๋น„์ง€๋„ํ•™์Šต

240611 Today I Learn์ง€๋„ํ•™์Šต vs. ๋น„์ง€๋„ํ•™์Šต ์ง€๋„ ํ•™์Šต๋น„์ง€๋„ ํ•™์Šต๋ชฉํ‘œ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์˜ ๊ฒฐ๊ณผ๋ฅผ ์˜ˆ์ธก๋งŽ์€ ์–‘์˜ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํ†ต์ฐฐ๋ ฅ์„ ์–ป๋Š” ๊ฒƒ์‚ฌ์šฉ ๋ฐ์ดํ„ฐ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ๋ฐ์ดํ„ฐ์„ธํŠธ๋ ˆ์ด

archivenyc.tistory.com

 

RFM ๊ณ„์‚ฐ ์ค€๋น„์ž‘์—…

# Recency ๊ณ„์‚ฐํ•˜๊ธฐ
import datetime as dt

# 2011.12.10์ผ ๊ธฐ์ค€์œผ๋กœ ๊ฐ ๋‚ ์งœ๋ฅผ ๋นผ๊ณ  + 1
# ์ถ”ํ›„ CustomerID ๊ธฐ์ค€์œผ๋กœ Period์˜ ์ตœ์†Œ์˜ Period๋ฅผ ๊ตฌํ•˜๋ฉด ๊ทธ๊ฒƒ์ด Recency
# 1๋ฒˆ์‚ฌ๋žŒ 100์ผ์ „, 20์ผ์ „, 5์ผ์ „

retail2['Period'] = (dt.datetime(2011,12,10) - retail2['InvoiceDate']).apply(lambda x: x.days+1)

# Amount : Quantity(์ˆ˜๋Ÿ‰) * Price(๊ฐ€๊ฒฉ)
retail2['Amount'] = retail2['Quantity'] * retail2['UnitPrice']

# Amount๋ฅผ ์ •์ˆ˜ํ˜•์œผ๋กœ ๋ณ€ํ™˜
retail2['Amount'] = retail2['Amount'].astype('int')
  • Recency(๋ฐฉ๋ฌธ์ˆ˜) ๊ณ„์‚ฐ ์œ„ํ•ด ๊ธฐ์ค€์ผ(2011/12/10) Period ๊ณ„์‚ฐ
  • Monetary(์–ผ๋งˆ๋‚˜ ์ผ๋Š”์ง€) ๊ณ„์‚ฐ ์œ„ํ•ด ์ฃผ๋ฌธ๋ณ„ ์‚ฌ์šฉ ๊ธˆ์•ก ๊ณ„์‚ฐ ํ›„ ์ด๋ฅผ ์ •์ˆ˜ํ˜•์œผ๋กœ ๋ณ€๊ฒฝ

RFM ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ 

# customerId ๊ธฐ์ค€ RFM df ์ƒ์„ฑ
rfm_retail = retail2.groupby('CustomerID').agg({
                                                'Period': 'min',
                                                'InvoiceNo':'count',
                                                'Amount':'sum'
                                                })
# ์ปฌ๋Ÿผ๋ช…์„ R, F, M ์œผ๋กœ ์„ค์ •
rfm_retail.columns = ['Recency','Frequency','Monetary']

 

RFM ๊ฐ๊ฐ์˜ feature๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•ด๋ณด๊ธฐ

๋ถ„ํฌ๊ฐ€ ๋‹ค๋ฅด๋‚˜ ์„ธ feature ๋ชจ๋‘ right skewed

plt.figure(figsize = (18,6))
plt.subplot(1,3,1)
sns.histplot(rfm_retail['Recency'])
plt.title('Recency')

plt.subplot(1,3,2)
sns.histplot(rfm_retail['Frequency'])
plt.title('Frequency')

plt.subplot(1,3,3)
sns.histplot(rfm_retail['Monetary'])
plt.title('Monetary')
  • ๋ถ„ํฌ๊ฐ€ ๋‹ค๋ฅด๋‚˜ ์„ธ feature ๋ชจ๋‘ right skewed๋˜์–ด์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ๋”ฐ๋ผ์„œ ์•„๋ž˜์™€ ๊ฐ™์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ •๊ทœํ™” ํ–ˆ๋‹ค.
# ๋ฐ์ดํ„ฐ์ •๊ทœํ™”
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_features = sc.fit_transform(rfm_retail[['Recency','Frequency','Monetary']])

 

์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๋กœ ํ‰๊ฐ€ํ•˜๊ธฐ

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

kmeans = KMeans(n_clusters = 3, random_state = 42)
labels = kmeans.fit_predict(X_features)
rfm_retail['label'] = labels

silhouette_score(X_features, labels)
## 0.592575402996014

→ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๊ฐ€ ์ข‹์€ ์ˆ˜์น˜์ธ์ง€ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ๊ฐœ์˜ ๊ตฐ์ง‘ ๊ฐœ์ˆ˜๋ฅผ list๋กœ ์ž…๋ ฅ๋ฐ›์•„ ๊ฐ๊ฐ์˜ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๋ฅผ ๋ฉด์ ์œผ๋กœ ์‹œ๊ฐํ™”ํ•ด๋ณด์ž.

๋”๋ณด๊ธฐ
### ์—ฌ๋Ÿฌ๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง ๊ฐฏ์ˆ˜๋ฅผ List๋กœ ์ž…๋ ฅ ๋ฐ›์•„ ๊ฐ๊ฐ์˜ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๋ฅผ ๋ฉด์ ์œผ๋กœ ์‹œ๊ฐํ™”ํ•œ ํ•จ์ˆ˜ ์ž‘์„ฑ
def visualize_silhouette(cluster_lists, X_features): 
    
    from sklearn.cluster import KMeans
    from sklearn.metrics import silhouette_samples, silhouette_score

    import matplotlib.pyplot as plt
    import matplotlib.cm as cm
    import numpy as np
    
    # ์ž…๋ ฅ๊ฐ’์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ง ๊ฐฏ์ˆ˜๋“ค์„ ๋ฆฌ์ŠคํŠธ๋กœ ๋ฐ›์•„์„œ, ๊ฐ ๊ฐฏ์ˆ˜๋ณ„๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์ ์šฉํ•˜๊ณ  ์‹ค๋ฃจ์—ฃ ๊ฐœ์ˆ˜๋ฅผ ๊ตฌํ•จ
    n_cols = len(cluster_lists)
    
    # plt.subplots()์œผ๋กœ ๋ฆฌ์ŠคํŠธ์— ๊ธฐ์žฌ๋œ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์ˆ˜๋งŒํผ์˜ sub figures๋ฅผ ๊ฐ€์ง€๋Š” axs ์ƒ์„ฑ 
    fig, axs = plt.subplots(figsize=(4*n_cols, 4), nrows=1, ncols=n_cols)
    
    # ๋ฆฌ์ŠคํŠธ์— ๊ธฐ์žฌ๋œ ํด๋Ÿฌ์Šคํ„ฐ๋ง ๊ฐฏ์ˆ˜๋“ค์„ ์ฐจ๋ก€๋กœ iteration ์ˆ˜ํ–‰ํ•˜๋ฉด์„œ ์‹ค๋ฃจ์—ฃ ๊ฐœ์ˆ˜ ์‹œ๊ฐํ™”
    for ind, n_cluster in enumerate(cluster_lists):
        
        # KMeans ํด๋Ÿฌ์Šคํ„ฐ๋ง ์ˆ˜ํ–‰ํ•˜๊ณ , ์‹ค๋ฃจ์—ฃ ์Šค์ฝ”์–ด์™€ ๊ฐœ๋ณ„ ๋ฐ์ดํ„ฐ์˜ ์‹ค๋ฃจ์—ฃ ๊ฐ’ ๊ณ„์‚ฐ. 
        clusterer = KMeans(n_clusters = n_cluster, max_iter=500, random_state=0)
        cluster_labels = clusterer.fit_predict(X_features)
        
        sil_avg = silhouette_score(X_features, cluster_labels)
        sil_values = silhouette_samples(X_features, cluster_labels)
        
        y_lower = 10
        axs[ind].set_title('Number of Cluster : '+ str(n_cluster)+'\n' \
                          'Silhouette Score :' + str(round(sil_avg,3)) )
        axs[ind].set_xlabel("The silhouette coefficient values")
        axs[ind].set_ylabel("Cluster label")
        axs[ind].set_xlim([-0.1, 1])
        axs[ind].set_ylim([0, len(X_features) + (n_cluster + 1) * 10])
        axs[ind].set_yticks([])  # Clear the yaxis labels / ticks
        axs[ind].set_xticks([0, 0.2, 0.4, 0.6, 0.8, 1])
        
        # ํด๋Ÿฌ์Šคํ„ฐ๋ง ๊ฐฏ์ˆ˜๋ณ„๋กœ fill_betweenx( )ํ˜•ํƒœ์˜ ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„ ํ‘œํ˜„. 
        for i in range(n_cluster):
            ith_cluster_sil_values = sil_values[cluster_labels==i]
            ith_cluster_sil_values.sort()
            
            size_cluster_i = ith_cluster_sil_values.shape[0]
            y_upper = y_lower + size_cluster_i
            
            color = cm.nipy_spectral(float(i) / n_cluster)
            axs[ind].fill_betweenx(np.arange(y_lower, y_upper), 0, ith_cluster_sil_values, \
                                facecolor=color, edgecolor=color, alpha=0.7)
            axs[ind].text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))
            y_lower = y_upper + 10
            
        axs[ind].axvline(x=sil_avg, color="red", linestyle="--")

from kmeans_visaul import visualize_silhouette
visualize_silhouette([2,3,4,5,6], X_features)
  • cluster์˜ ์ˆ˜๊ฐ€ 2์ผ๋•Œ๋ณด๋‹ค 5์ผ๋•Œ ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๊ฐ€ ํฌ๋‹ค.
  • ๊ทธ๋Ÿฌ๋‚˜ ์•„๋ž˜ ๋ฉด์  ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด cluster๊ฐ€ 5์ผ๋•Œ best๊ฐ€ ์•„๋‹ˆ๋ผ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.
    • cluster = 5์ธ ๊ฒฝ์šฐ 5๊ฐœ ๊ตฐ์ง‘์˜ ๋ฉด์ ์ด ๊ณ ๋ฅด๊ฒŒ ๋ถ„ํฌ๋˜์ง€ ์•Š์•˜๋‹ค.
    • ์ƒ์œ„ 3๊ฐœ์˜ ๊ตฐ์ง‘์˜ ๊ฒฝ์šฐ ์•„์ฃผ ์ ์€ ๋ฉด์ ์„ ์ฐจ์ง€ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ด๋Š” ์ผ๋ถ€ ์ด์ƒ์น˜์— ์˜ํ•ด ๊ฒฐ๊ณผ๊ฐ€ ์™œ๊ณก๋˜๊ณ  ์žˆ์Œ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์ด๋‹ค.
    • ๋”ฐ๋ผ์„œ ์•„๋ž˜์™€ ๊ฐ™์ด log ์Šค์ผ€์ผ์„ ์ด์šฉํ•ด ์ถ”๊ฐ€ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์„ ์ง„ํ–‰ํ–ˆ๋‹ค.

#log ์Šค์ผ€์ผ์„ ํ†ตํ•œ ์ถ”๊ฐ€์ „์ฒ˜๋ฆฌ
import numpy as np

rfm_df['Recency_log'] = np.log1p(rfm_df['Recency'])
rfm_df['Frequency_log'] = np.log1p(rfm_df['Frequency'])
rfm_df['Monetary_log'] = np.log1p(rfm_df['Monetary'])

X_features2 = rfm_df[['Recency_log','Frequency_log','Monetary_log']]
sc2 = StandardScaler()
X_features2_sc = sc2.fit_transform(X_features2)

visualize_silhouette([2,3,4,5,6], X_features2_sc)
  • ์ •๊ทœํ™”๋งŒ ์ง„ํ–‰ํ–ˆ์„ ๋•Œ์— ๋น„ํ•ด ๋น„๊ต์  ๊ณ ๋ฅด๊ฒŒ cluster๋ณ„ ๋ฉด์ ์ด ๋‚˜๋‰˜์–ด์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 
  • ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜๋Š” ๋‚ฎ์•„์กŒ์ง€๋งŒ, ๊ตฐ์ง‘์ด ๋น„๊ต์  ๊ณ ๋ฅด๊ฒŒ ๋‚˜๋‰˜์—ˆ๋‹ค.