๐Ÿ“’ Today I Learn/๐Ÿ Python

๋จธ์‹ ๋Ÿฌ๋‹ ํŠน๊ฐ• #1 ๋ถ„๋ฅ˜(Clasification)

ny:D 2024. 6. 12. 17:27

240612 Today I Learn

๋จธ์‹ ๋Ÿฌ๋‹

๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ข…๋ฅ˜

  1. ์ง€๋„ ํ•™์Šต (Supervised Learning)
  2. ๋น„์ง€๋„ ํ•™์Šต (Unservised Learning)
  3. ๊ฐ•ํ™” ํ•™์Šต (Reinforcement Learning)

๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ  ์„ฑ๋Šฅ ํ‰๊ฐ€ ์ง€ํ‘œ

  • ์ •ํ™•๋„ Accuracy = (True Positive +True Negative)/Total
Accuracy ๊ฐ€ ๋งŒ๋Šฅ์ผ ์ˆ˜ ์—†๋Š” ์ด์œ 
์–ด๋–ค ํšŒ์‚ฌ์—์„œ 100๋ช…์ค‘ 2๋ช…์„ ์•”ํ™˜์ž๋กœ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ณ  ์‹ถ์„ ๋•Œ, accuracy๋ฅผ ๊ฐ€์žฅ ๋†’๊ฒŒ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ผ๊นŒ?
๋ฐ”๋กœ 100๋ช…์˜ ํ™˜์ž๋ฅผ ๋ชจ๋‘ ์•”ํ™˜์ž๋ผ๊ณ  ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋‹ค. 100๋ช…์˜ ํ™˜์ž๋ฅผ ๋ชจ๋‘ ์•”ํ™˜์ž๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ชจ๋ธ์˜ ์ •ํ™•๋„๋Š” ๋ฌด๋ ค 98%๊ฐ€ ๋œ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์ด ํšŒ์‚ฌ๋Š” '์ €ํฌ ๋ชจ๋ธ์€ 98%์˜ ์ •ํ™•๋„๋กœ ์•”ํ™˜์ž๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.'๋ผ๊ณ  ์ด์•ผ๊ธฐ ํ•  ์ˆ˜ ์žˆ์„๊นŒ?
98%๋ผ๋Š” ์ˆ˜์น˜๋งŒ ๋ณด๋ฉด ์ด ํšŒ์‚ฌ๊ฐ€ ์—„์ฒญ๋‚œ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ•œ ๊ฒƒ ๊ฐ™์ง€๋งŒ, ๋ชจ๋“  ํ™˜์ž๋ฅผ ์•”ํ™˜์ž๋ผ๊ณ  ํŒ์ •ํ•˜๋Š” ๋ชจ๋ธ์ด ๊ณผ์—ฐ ์•”ํ™˜์ž์ธ์ง€ ์•„๋‹Œ์ง€ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์ธ๊ฐ€? ๊ทธ๋ ‡์ง€ ์•Š๋‹ค. 
  • ๋ฏผ๊ฐ๋„ Precision
    • (positive) = (True Positive) / (Predicted Positive) = TP/(TP+FP)
    • (negative) = (True Negative) / (Predicted Negative) = TN/(TN+FN)
  • ์žฌํ˜„์œจ Positive Recall (Sensitivity) = True Positive Rate = Sensitivity = TP/(TP+FN)
  • ํŠน์ด๋„ Negative Recall (Specificity) = True Negative Rate = Specificity = TN/(TN+FP)
  • F ์Šค์ฝ”์–ด F-measure (positive) = 2 x Precision x Recall = 2 x TP Precision + Recall 2 x TP + FP + FN
๐Ÿ’ก ๊ต์ฐจ ๊ฒ€์ฆ(Cross Validation)
๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋‚˜๋ˆ„๊ณ , ํ•™์Šต ์„ธํŠธ์—์„œ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚จ ๋‹ค์Œ, ํ…Œ์ŠคํŠธ ์„ธํŠธ์—์„œ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ณผ์ •
  • ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ ์„ ํƒํ•˜๊ธฐ
    • ์„ฑ๋Šฅ์ด ์ข‹์€ ๋ชจ๋ธ์ด๋ฉด์„œ
    • ๋ชจ๋ธ์˜ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ, ๋ณต์žก์„ฑ ๋ฐ ๊ณ„์‚ฐ ๋น„์šฉ๊ณผ ๊ฐ™์€ ๋‹ค๋ฅธ ์š”์†Œ๋„ ๊ณ ๋ คํ•ด์•ผ ํ•œ๋‹ค.

๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜

White box vs. Gray box vs. Black box ๋ชจ๋ธ
  • White box   : ํ•ด์„ ๊ฐ€๋Šฅํ•œ ํ”ผ์ณ๋“ค์„ ์ด์šฉํ•ด ๋” ๋งŽ์€ ์ธ์‚ฌ์ดํŠธ๋ฅผ ์–ป์–ด ๋ชจ๋ธ์—์„œ ์–ด๋–ค ์ผ๋“ค์ด ๋ฒŒ์–ด์ง€๊ณ  ์žˆ๋Š”์ง€ ์‰ฝ๊ฒŒ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Œ.
  • Gray box (white + black) :  ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋‚ด๋ถ€ ์ž‘๋™์„ ๋ถ€๋ถ„์ ์œผ๋กœ ๊ด€์ฐฐ๊ฐ€๋Šฅ
  • Black box: ๋ชจ๋ธ์ด ๋‚ด๋ถ€์ ์œผ๋กœ ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ๊ด€์ฐฐํ•˜๊ฑฐ๋‚˜ ์ดํ•ดํ•˜๊ธฐ ์–ด๋ ค์šด ๋ฌธ์ œ

 

KNN(K-Nearest Neighbors)

๐Ÿ’ก KNN
๋ฐ์ดํ„ฐ๋ฅผ ์ฃผ๋ณ€์˜ K๊ฐœ์˜ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋ฐ์ดํ„ฐ๋“ค์˜ ๋ฒ”์ฃผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์˜ˆ์ธกํ•˜๋Š” ๊ธฐ๋ฒ•. ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์™€ ๊ฐ€์žฅ ์œ ์‚ฌํ•œ K๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ์•„๊ฐ€๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•์œผ๋กœ, ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ๋‚˜ ๋งจํ•˜ํƒ„ ๊ฑฐ๋ฆฌ ๋“ฑ์˜ ๊ฑฐ๋ฆฌ ์ธก์ • ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค.

  • ์ž‘๋™ ๋‹จ๊ณ„
    1. ๋ฐ์ดํ„ฐ ์ค€๋น„
    2. ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ
      • ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ - ๋‘ ์  ์‚ฌ์ด์˜ ์ง์„  ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ, ํ‰๋ฉด์ด๋‚˜ ๊ณต๊ฐ„์—์„œ ๋‘ ์  ์‚ฌ์ด์˜ ๊ฐ€์žฅ ์งง์€ ๊ฒฝ๋กœ๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.
      • ๋งจํ•˜ํƒ„ ๊ฑฐ๋ฆฌ - ์ขŒํ‘œ์— ํ‘œ์‹œ๋œ ๋‘ ์  ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ(์ ˆ๋Œ“๊ฐ’)์˜ ์ฐจ์ด์— ๋”ฐ๋ฅธ ์ƒˆ๋กœ์šด ๊ฑฐ๋ฆฌ
    3. ๊ณ„์‚ฐ๋œ ๊ฑฐ๋ฆฌ ์ค‘ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด K๊ฐœ์˜ ์ด์›ƒ์„ ์„ ํƒ
    4. ์„ ํƒ๋œ K๊ฐœ์˜ ์ด์›ƒ ์ค‘ ๊ฐ€์žฅ ๋งŽ์€ ๋ฒ”์ฃผ๋ฅผ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์˜ ๋ฒ”์ฃผ๋กœ ํ• ๋‹น
  • ์‚ฌ์šฉ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ - sklearn.neighbors.KNeighborsClassifier

๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ (Naive Bayes)

๐Ÿ’ก ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ (Naive Bayes)
ํ™•๋ฅ  ๊ธฐ๋ฐ˜ ๋ถ„๋ฅ˜ ๊ธฐ๋ฒ•์œผ๋กœ, ๋…๋ฆฝ ๋ณ€์ˆ˜ ๊ฐ„์˜ ๋…๋ฆฝ์„ฑ์„ ๊ฐ€์ •ํ•˜์—ฌ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ถ„๋ฅ˜ ๊ธฐ๋ฒ•. ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ(Bayes' Theorem)*๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค.
  • ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ (Bayes' Theorem): ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๋Š” ์ •๋ฆฌ.
  • ์‚ฌ์ „ ํ™•๋ฅ  (Prior Probability): ํŠน์ • ํด๋ž˜์Šค๊ฐ€ ๋‚˜ํƒ€๋‚  ์‚ฌ์ „ ํ™•๋ฅ .

  • ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ์˜ ๊ธฐ๋ณธ ๊ฐ€์ • - ํŠน์ง•๋“ค์ด ์„œ๋กœ ๋…๋ฆฝ์ด๋‹ค.
    • ์‹ค์ œ๋กœ๋Š” ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ์—์„œ ์™„์ „ํžˆ ๋…๋ฆฝ์ ์ธ ํŠน์ง•์„ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋Š” ๊ฑฐ์˜ ์—†๋‹ค.
    • ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ๋Š” ๊ฝค ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค.
  • ์‚ฌ์šฉ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ - sklearn.naive_bayes.MultinomialNB

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ (Logistic Regression)

๐Ÿ’ก ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ (Logistic Regression)
์ด์ง„ ๋ถ„๋ฅ˜ ๋ฌธ์ œ(Binary Classfication)๋ฅผ ํ•ด๊ฒฐ
ํ•˜๋Š” ๋ฐ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋ถ„์„ ๊ธฐ๋ฒ•. ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ค ๋ฒ”์ฃผ์— ์†ํ•  ํ™•๋ฅ ์„ 0์—์„œ 1 ์‚ฌ์ด์˜ ๊ฐ’์œผ๋กœ ์˜ˆ์ธกํ•˜๊ณ  ๊ทธ ํ™•๋ฅ ์— ๋”ฐ๋ผ ๊ฐ€๋Šฅ์„ฑ์ด ๋” ๋†’์€ ๋ฒ”์ฃผ์— ์†ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋ถ„๋ฅ˜ํ•œ๋‹ค.์ฆ‰ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๊ฐ€ ํŠน์ • ๋ฒ”์ฃผ์— ์†ํ•  ํ™•๋ฅ ์„ ์˜ˆ์ธกํ•˜์—ฌ 0๊ณผ 1๋กœ ๋‚˜๋ˆŒ์ˆ˜ ์žˆ๋‹ค.
  • ๋กœ์ง“ ํ•จ์ˆ˜ (Logit Function): ๋กœ๊ทธ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๋Š” ํ•จ์ˆ˜.
  • ์˜ค์ฆˆ (Odds): ํŠน์ • ์‚ฌ๊ฑด์ด ๋ฐœ์ƒํ•  ํ™•๋ฅ ๊ณผ ๋ฐœ์ƒํ•˜์ง€ ์•Š์„ ํ™•๋ฅ ์˜ ๋น„์œจ.
  • ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜ (Sigmoid Function): ํ™•๋ฅ ์„ 0๊ณผ 1 ์‚ฌ์ด๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜.
  • ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ - ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Gradient Descent)
  • ์ ์šฉ ์‚ฌ๋ก€: ์ŠคํŒธ ๋ฉ”์ผ ํ•„ํ„ฐ๋ง, ์งˆ๋ณ‘ ์—ฌ๋ถ€ ์ง„๋‹จ, ๊ณ ๊ฐ ์ดํƒˆ ์—ฌ๋ถ€
 

๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ดํ•ด์™€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉ (3) ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€

240604 Today I Learn๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ์ด๋ก ๐Ÿ’ก ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋…๋ฆฝ ๋ณ€์ˆ˜์˜ ์„ ํ˜• ๊ฒฐํ•ฉ์„ ์ด์šฉํ•˜์—ฌ ์‚ฌ๊ฑด์˜ ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ํ†ต๊ณ„ ๊ธฐ๋ฒ•์œผ๋กœ ๊ฐ€์ค‘์น˜ ๊ฐ’์„ ์•ˆ๋‹ค๋ฉด X๊ฐ’์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ํ•ด๋‹น

archivenyc.tistory.com

 

์„œํฌํŠธ ๋ฒกํ„ฐ ๋จธ์‹  (Support Vector Machine)

๐Ÿ’ก ์„œํฌํŠธ ๋ฒกํ„ฐ ๋จธ์‹ (SVM)
๋‹ค์ฐจ์› ๊ณต๊ฐ„์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ์ดˆํ‰๋ฉด(Hyperplane)์„ ์ฐพ๋Š” ๊ธฐ๋ฒ•. ๋น„์„ ํ˜• ๋ฌธ์ œ๋„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Œ.
๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์œผ๋กœ ๋งคํ•‘ํ•˜์—ฌ ์ตœ์ ์˜ ๊ฒฐ์ • ๊ฒฝ๊ณ„๋ฅผ ์ฐพ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ, ์ปค๋„ ํŠธ๋ฆญ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋น„์„ ํ˜• ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค.
  • ์ดˆํ‰๋ฉด (Hyperplane): ๋ฐ์ดํ„ฐ๋ฅผ ๋‘ ๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒฝ๊ณ„.  (2์ฐจ์›์ผ ๋•Œ ์„ , 3์ฐจ์›์ผ ๋•Œ ํ‰๋ฉด, n์ฐจ์›์ผ ๋•Œ๋Š” ์ดˆํ‰๋ฉด)
  • ์„œํฌํŠธ ๋ฒกํ„ฐ (Support Vectors): ์ดˆํ‰๋ฉด์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์ด ์œ„์น˜ํ•œ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ.
  • ๋งˆ์ง„ (Margin): ์„œํฌํŠธ ๋ฒกํ„ฐ์™€ ์ดˆํ‰๋ฉด ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ.

  • SVM์˜ ์›๋ฆฌ - SVM์€ ์ตœ๋Œ€ ๋งˆ์ง„์„ ์ฐพ๊ธฐ ์œ„ํ•ด ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ
    • ํ•˜๋“œ ๋งˆ์ง„ SVM: ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ๋งˆ์ง„ ์™ธ๋ถ€์— ์œ„์น˜ํ•˜๋„๋ก ๊ฒฐ์ • ์ดˆํ‰๋ฉด์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. (์žก์Œ์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์— ์ ํ•ฉ)
    • ์†Œํ”„ํŠธ ๋งˆ์ง„ SVM: ์ผ๋ถ€ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ๋งˆ์ง„ ๋‚ด๋ถ€์— ์œ„์น˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ—ˆ์šฉํ•˜๋ฉฐ, ์ด๋ฅผ ์œ„ํ•ด ๋ฒŒ์  ๋ณ€์ˆ˜๋ฅผ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค. (์žก์Œ์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์— ์ ํ•ฉ)
  • ์žฅ๋‹จ์ 
์žฅ์  ๋‹จ์ 
  • ๊ณ ์ฐจ์› ๊ณต๊ฐ„์—์„œ๋„ ํšจ๊ณผ์ ์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
  • ๋‹ค์–‘ํ•œ ์ปค๋„์„ ํ†ตํ•ด ๋น„์„ ํ˜• ๋ถ„๋ฅ˜๋„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋Š” ๊ณ„์‚ฐ ์†๋„๊ฐ€ ๋А๋ ค์ง„๋‹ค.
  • ๋ชจ๋ธ ํŠœ๋‹ ๋ฐฉ๋ฒ•์ด ๋‹ค์–‘ํ•œ ๋งŒํผ ์ตœ์ ํ™”๋œ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ž˜ ์„ ํƒํ•ด์•ผ ํ•จ.
  •