๐Ÿ“’ Today I Learn/๐Ÿ Python

๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ดํ•ด์™€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉ (3) ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€

ny:D 2024. 6. 4. 21:29

240604 Today I Learn

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ์ด๋ก 

๐Ÿ’ก ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€
๋…๋ฆฝ ๋ณ€์ˆ˜์˜ ์„ ํ˜• ๊ฒฐํ•ฉ์„ ์ด์šฉํ•˜์—ฌ ์‚ฌ๊ฑด์˜ ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ํ†ต๊ณ„ ๊ธฐ๋ฒ•์œผ๋กœ ๊ฐ€์ค‘์น˜ ๊ฐ’์„ ์•ˆ๋‹ค๋ฉด X๊ฐ’์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ํ•ด๋‹น ์‚ฌ๊ฑด์ด ์ผ์–ด๋‚  ์ˆ˜ ์žˆ๋Š” P์˜ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•œ๋‹ค. ์ด๋•Œ, ํ™•๋ฅ  0.5๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ทธ๋ณด๋‹ค ๋†’์œผ๋ฉด ์‚ฌ๊ฑด์ด ์ผ์–ด๋‚จ(P(Y) = 1), ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ์‚ฌ๊ฑด์ด ์ผ์–ด๋‚˜์ง€ ์•Š์Œ(P(Y) = 0)์œผ๋กœ ํŒ๋‹จํ•˜์—ฌ ๋ถ„๋ฅ˜ ์˜ˆ์ธก์— ์‚ฌ์šฉํ•œ๋‹ค.

  • ๋กœ์ง“์˜ ๊ฒฝ์šฐ ์–ด๋–ค ๊ฐ’์„ ๊ฐ€์ง€๋”๋ผ๋„ ๋ฐ˜๋“œ์‹œ ํŠน์ • ์‚ฌ๊ฑด์ด ์ผ์–ด๋‚  ํ™•๋ฅ ์„ 0~1๋กœ ๋งŒ๋“ค์–ด์ค€๋‹ค.
  • ๋กœ์ง“์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•ด์„๋œ๋‹ค.
    → X์˜ ๊ฐ’์ด 1๋งŒํผ ์ฆ๊ฐ€ํ•  ๋•Œ, ์˜ค์ฆˆ๋น„๋Š” e์˜ w1์Šน ๋งŒํผ ์ฆ๊ฐ€ํ•œ๋‹ค.

๋ถ„๋ฅ˜ ํ‰๊ฐ€์ง€ํ‘œ

confusion matrix

๐Ÿ’ก ์ •ํ™•๋„ (Accuracy)
์ „์ฒด ์ค‘ ์˜ˆ์ธก์ด ์‹ค์ œ ๊ฐ’๊ณผ ๊ฐ™์„ ๊ฒฝ์šฐ (์ฐธ์ธ ๊ฒฝ์šฐ ์ฐธ์œผ๋กœ, ๊ฑฐ์ง“์ธ ๊ฒฝ์šฐ ๊ฑฐ์ง“์œผ๋กœ ์˜ˆ์ธก์ด ๋œ ๊ฒฝ์šฐ → TP / TN)
  • Y๊ฐ’์ด unbalanceํ•œ ๊ฒฝ์šฐ ์ œ ๊ธฐ๋Šฅ์„ ํ•˜์ง€ ๋ชปํ•จ
  • Y ๋ฒ”์ฃผ์˜ ๋น„์œจ์„ ๋งž์ถฐ์ฃผ๊ฑฐ๋‚˜ ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ f1 score์„ ์‚ฌ์šฉํ•ด ์ด๋ฅผ ๋ณด์™„ํ•จ
  • `from sklearn.metrics import accuracy_score`๋ฅผ ์‚ฌ์šฉ
๐Ÿ’ก F1 - Score
์ •๋ฐ€๋„์™€ ์žฌํ˜„์œจ์˜ ์กฐํ™” ํ‰๊ท 
๊ฐ€๋Šฅํ•œ ๊ฐ€์žฅ ๋†’์€ ๊ฐ’์€ 1.0์œผ๋กœ ์™„๋ฒฝํ•œ ์ •๋ฐ€๋„์™€ ์žฌํ˜„์œจ์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ๊ฐ€๋Šฅํ•œ ๊ฐ€์žฅ ๋‚ฎ์€ ๊ฐ’์€ ์ •๋ฐ€๋„๋‚˜ ์žฌํ˜„์œจ์ด 0์ธ ๊ฒฝ์šฐ 0์ด๋‹ค.
  •  ํ•˜๋‚˜์˜ ์ธก์ • ํ•ญ๋ชฉ์—์„œ ์ •๋ฐ€๋„์™€ ์žฌํ˜„์œจ์„ ๋Œ€์นญ์ ์œผ๋กœ ๋‚˜ํƒ€๋ƒ„
  • `from sklearn.metrics import f1_score` ์‚ฌ์šฉํ•ด ๊ณ„์‚ฐ

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ์‹ค์Šต - ํƒ€์ดํƒ€๋‹‰ ์ƒ์กด๋ฌธ์ œ

๐Ÿงฎ ์‚ฌ์šฉ ํ•จ์ˆ˜

๋”๋ณด๊ธฐ
# attribute๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ํ•จ์ˆ˜
def get_att(x):
    #x๋ชจ๋ธ์„ ๋„ฃ๊ธฐ
    print(f'ํด๋ž˜์Šค ์ข…๋ฅ˜ : {x.classes_}')
    print(f'๋…๋ฆฝ๋ณ€์ˆ˜ {x.n_features_in_}๊ฐœ')
    print(f'๋“ค์–ด๊ฐ„ ๋…๋ฆฝ๋ณ€์ˆ˜(x)์˜ ์ด๋ฆ„ {x.feature_names_in_}')
    print(f'coef : {x.coef_}')
    print(f'bias : {x.intercept_}')

# ๋ชจ๋ธ ํ‰๊ฐ€
def get_metrics(true, pred):
    print(f'์ •ํ™•๋„ : {accuracy_score(true, pred):.4f}')
    print(f'f1-score : {f1_score(true, pred) :.4f}')
    
    
# ๋‚ด๊ฐ€ ๋งŒ๋“ ..? ํ•จ์ˆ˜
def get_metrics(model, X, y_true):
    model.fit(X, y_true)
    pred = model.predict(X)
    print(f'์ •ํ™•๋„ : {accuracy_score(y_true, pred):.4f}')
    print(f'f1-score : {f1_score(y_true, pred) :.4f}')

1์ฐจ ๋ชจ๋ธ: Fare

# ํ•จ์ˆ˜ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ๋ฐ ๋ชจ๋ธ์— ๊ตฌ์กฐ ๋„ฃ๊ธฐ
from sklearn.linear_model import LogisticRegression
model_lor = LogisticRegression()

# ๋ณ€์ˆ˜ ์ง€์ •
# X๋ณ€์ˆ˜: Fare, Y๋ณ€์ˆ˜: Survived
X1 = titanic[['Fare']]
y_true = titanic[['Survived']]

# ๋ชจ๋ธ ์ ํ•ฉ
model_lor.fit(X1, y_true)

2์ฐจ ๋ชจ๋ธ: Pclass, Sex, Fare

# ๋ชจ๋ธ์— ๊ตฌ์กฐ ๋„ฃ๊ธฐ
model_lor_2 = LogisticRegression()

#Y(Surivved): ์‚ฌ๋ง
#X(์ˆ˜์น˜ํ˜•): Fare
#X(๋ฒ”์ฃผํ˜•): Plcass(์ขŒ์„๋“ฑ๊ธ‰), Sex

# ์„ฑ๋ณ„์˜ ๊ฒฝ์šฐ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜์ด๋‚˜ ๋”๋ฏธ๋ณ€์ˆ˜๋กœ ์ธ์ฝ”๋”ฉ์ด ํ•„์š”
# ๋ฌธ์žํ˜• -> ๋”๋ฏธ๋ณ€์ˆ˜๋กœ ๋ฐ”๊พธ๊ธฐ
def get_sex(x):
    if x == 'female':
        return 0
    else:
        return 1
titanic['Sex_enc'] = titanic['Sex'].apply(get_sex)

# ๋ณ€์ˆ˜ ์ง€์ •
X2 = titanic[['Pclass','Sex_enc','Fare']]
y_true2 = titanic['Survived']

# ๋ชจ๋ธ ์ ํ•ฉ
model_lor_2.fit(X2,y_true2)

1์ฐจ๋ชจ๋ธ vs. 2์ฐจ๋ชจ๋ธ

  • 1์ฐจ๋ชจ๋ธ๋ณด๋‹ค 2์ฐจ๋ชจ๋ธ์˜ ์ •ํ™•๋„์™€ F1-score๊ฐ€ ๋ˆˆ์— ๋„๊ฒŒ ๋†’๋‹ค → 2์ฐจ ๋ชจ๋ธ์ด 1์ฐจ ๋ชจ๋ธ๋ณด๋‹ค ์ข‹์€ ๋ชจ๋ธ์ด๋‹ค.
# X๋ณ€์ˆ˜๊ฐ€ Fare
get_att(model_lor)
get_metrics(model_lor, X1, y_true)

# X๋ณ€์ˆ˜๊ฐ€ Fare, Pclass, Sex
get_att(model_lor_2)
get_metrics(model_lor_2, X2, y_true2)

predict vs. predict_proba

predict / predict_proba

  • predict๋Š” ์ ํ•ฉ๋œ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์˜ˆ์ธก๊ฐ’์„ ๊ตฌํ•˜๊ณ , predict_proba๋Š” ์ ํ•ฉ๋œ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ P(Y=0)๊ณผ P(Y=1) ๊ฐ’์„ ์ถœ๋ ฅํ•œ๋‹ค. ๋‹ค์‹œ๋งํ•ด, predict_proba์—์„œ๋Š” ํ•œ ๊ฐ’์— ๋Œ€ํ•ด ๋‘๊ฐœ์˜ ๊ฐ’์ด ์ถœ๋ ฅ๋˜๋Š”๋ฐ, ํ™•๋ฅ ์ด ๋” ๋†’์€ ์ชฝ์˜ Y๊ฐ’์ด predict์—์„œ ์ถœ๋ ฅ๋œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. 
# ์ ํ•ฉ๋œ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์˜ˆ์ธก๊ฐ’์„ ๊ตฌํ•˜๊ธฐ
model_lor_2.predict(X2)

# ๊ฐ ๋ฐ์ดํ„ฐ๋ณ„ Y=1์ธ ํ™•๋ฅ  ๋ฝ‘์•„๋‚ด๊ธฐ(์ƒ์กดํ•  ํ™•๋ฅ )
model_lor_2.predict_proba(X2)

 

์„ ํ˜• ํšŒ๊ท€ vs. ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€

  ์„ ํ˜•ํšŒ๊ท€ (์˜ˆ์ธก) ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ (๋ถ„๋ฅ˜)
๊ณตํ†ต์  1. ๋ชจ๋ธ ์ƒ์„ฑ์ด ์‰ฌ์›€
2. ๊ฐ€์ค‘์น˜(ํ˜น์€ ํšŒ๊ท€๊ณ„์ˆ˜)๋ฅผ ํ†ตํ•œ ํ•ด์„์ด ์‰ฌ์šด ์žฅ์ ์ด ์žˆ์Œ
3. X๋ณ€์ˆ˜์— ๋ฒ”์ฃผํ˜•, ์ˆ˜์น˜ํ˜• ๋ณ€์ˆ˜ ๋‘˜ ๋‹ค ์‚ฌ์šฉ ๊ฐ€๋Šฅ
Y(์ข…์†๋ณ€์ˆ˜) ์ˆ˜์น˜ํ˜• ๋ฒ”์ฃผํ˜•
ํ‰๊ฐ€์ฒ™๋„ Mean Square Error
R Square
Accuracy
F1 - score
sklearn ๋ชจ๋ธ ํด๋ž˜์Šค sklearn.linear_model.linearRegression sklearn.linear_model.LogistricRegression
sklearn ํ‰๊ฐ€ ํด๋ž˜์Šค sklearn.metrics.mean_squared_error,
skelarn.metrics.r2_score
sklearn.metrics.accuracy_score,
skelearn.metrics.f1_score