๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„ (3)

2024. 5. 27. 00:16ยท ๐Ÿ“Š Data Analysis/๐ŸŽฏ Project
๋ชฉ์ฐจ
  1. ๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„
  2. ๐Ÿงน ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌํ•˜๊ธฐ
  3. ๐Ÿ—‘๏ธ ์ด์ƒ์น˜ ์ฒ˜๋ฆฌํ•˜๊ธฐ

๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„

 

 

๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„ (2)

๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„  ๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„ (1)๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„ํ”„๋กœ์ ํŠธ

archivenyc.tistory.com

 

์ž๊พธ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋กœ ๋Œ์•„๊ฐ€๋Š” ์ค‘....ใ… ์ด์ง€๋งŒ ์ค‘๊บพ๋งˆ ๋งˆ์ธ๋“œ๋กœ ์ผ๋‹จ ํ•ด๋ณด๋Š” ์ค‘์ด๋‹ค.!

 

๐Ÿงน ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌํ•˜๊ธฐ

๊ฒฐ์ธก๊ฐ’ ์ฐพ๊ธฐ

bank.isna().sum()์„ ํ™œ์šฉํ•ด ์ „์ฒด ์ปฌ๋Ÿผ ์ค‘ ๊ฒฐ์ธก์น˜๊ฐ€ ์žˆ๋Š” ์ปฌ๋Ÿผ์„ ์ฐพ์•˜๋‹ค. ๊ฒฐ์ธก๊ฐ’์ด ๋ฐœ๊ฒฌ๋œ ์ปฌ๋Ÿผ์€ Type of Loan, Num_of_Delayed_Payment, Credit_History_Age, Amount_invested_monthly, Monthly_Balance ์ด๋ ‡๊ฒŒ 5๊ฐœ์˜ ์ปฌ๋Ÿผ์ด์—ˆ๋‹ค. ๊ฐ๊ฐ์˜ ์ปฌ๋Ÿผ์— ๋Œ€ํ•ด ์•Œ๋งž๋Š” ๊ฒฐ์ธก๊ฐ’์„ ๊ณ„์‚ฐํ•ด ์ฑ„์›Œ๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค.

bank.isna().sum()

## ID                             0
...
## Type_of_Loan                5704
## Delay_from_due_date            0
## Num_of_Delayed_Payment      3498
## Credit_Mix                     0
## Outstanding_Debt               0
## Credit_Utilization_Ratio       0
## Credit_History_Age          4470
## Payment_of_Min_Amount          0
## Total_EMI_per_month            0
## Amount_invested_monthly     2271
## Payment_Behaviour              0
## Monthly_Balance              562
## dtype: int64

Type of Loan : 'none'

๋Œ€์ถœ์„ ๋ฐ›์€ ๊ณ ๊ฐ์˜ ๊ฒฝ์šฐ Type of Loan ์ปฌ๋Ÿผ์— '์–ด๋–ค ๋Œ€์ถœ์„ ๋ฐ›์•˜๋Š”์ง€' ๋Œ€์ถœ์˜ ์ข…๋ฅ˜๊ฐ€ ๋ช…๊ธฐ๋˜์–ด์žˆ๋‹ค. ํ™•์ธ ๊ฒฐ๊ณผ, ๊ฒฐ์ธก๊ฐ’์ด ์กด์žฌํ•˜๋Š” ํ–‰๋“ค์€ ๋Œ€์ถœ์˜ ๊ฐœ์ˆ˜๋ฅผ ์„ธ๋Š” 'num_of_loan'์ปฌ๋Ÿผ์— ๊ฐ’์ด ์—†๋Š”๊ฒƒ์„ ํ™•์ธํ–ˆ๊ณ , ๋”ฐ๋ผ์„œ ๋Œ€์ถœ์„ ๋ฐ›์ง€ ์•Š์€๊ณ ๊ฐ์œผ๋กœ ๊ฐ„์ฃผ, 'none'์œผ๋กœ null๊ฐ’์„ ์ฑ„์› ๋‹ค.

 

bank['Type_of_Loan'] = bank['Type_of_Loan'].fillna('none')

Num_of_Delayed_Payment : ํ‰๊ท 

delayed payment๊ฐ€ ๋ฐœ์ƒํ–ˆ๋Š”์ง€, ๋ฐœ์ƒํ•˜์ง€ ์•Š์•˜๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์—  ์ด์ƒ์น˜๋ฅผ ์ œ์™ธ(0~30๋ฒ”์œ„ ์™ธ ๊ฐ’)ํ•ด ๊ณ„์‚ฐํ•œ ํ‰๊ท (mean_delayed_payment)๋กœ ๊ฒฐ์ธก์น˜๋ฅผ ์ฑ„์›Œ๋„ฃ์—ˆ๋‹ค.

# Num_of_Delayed_Payment
cond = (bank['Num_of_Delayed_Payment']>=0) & (bank['Num_of_Delayed_Payment']<=30)
mean_delayed_payment = bank.loc[cond,'Num_of_Delayed_Payment'].mean()
mean_delayed_payment #13.417525098210389

# ๊ฒฐ์ธก์น˜ ์ฑ„์šฐ๊ธฐ
bank['Num_of_Delayed_Payment'] = bank['Num_of_Delayed_Payment'].fillna(mean_delayed_payment)

Credit_History_Age : 0

Credit History Age์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ, ์€ํ–‰์— ๋Œ€ํ•œ ์‹ ์šฉ ๊ธฐ๋ก์ด ์—†๋‹ค๊ณ  ๊ฐ„์ฃผ 0์œผ๋กœ ๊ฐ’์„ ์ฑ„์›Œ๋„ฃ์—ˆ๋‹ค. (์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„์—์„œ year๋‹จ์œ„๋งŒ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด extract_years๋ฅผ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ, ์ด ๊ณผ์ •์—์„œ pd.isnull์„ ํ™œ์šฉํ•ด ๊ฒฐ์ธก์น˜๋ฅผ ์ฑ„์› ๋‹ค.)

def extract_years(data):
  if pd.isnull(data):
    return 0
  result = data.split()[0]
  return result

bank['Credit_History_Age'] = bank['Credit_History_Age'].apply(extract_years)

 

Amount_invested_monthly : customer_id๋ณ„ ํ‰๊ท 

Amount_invested_monthly์˜ ๊ฒฝ์šฐ ๋ฐ˜๋ณต์ ์œผ๋กœ __10000__๊ฐ’์ด ๋“ฑ์žฅํ•ด ์–ธ๋”๋ฐ” ์ œ๊ฑฐ์— ํฌํ•จํ•˜์ง€ ์•Š์•˜๋‹ค. (1. __10000__์ด๋ผ๋Š” ๊ฐ™์€ ๊ฐ’๋“ค์ด 10ํšŒ ์ด์ƒ ๋ฐœ๊ฒฌ 2. customer_id๋ณ„๋กœ ์ƒ˜ํ”Œ์„ ๋ฝ‘์•„ ํ™•์ธํ•ด๋ณด์•˜์„ ๋–„ ๋„ค๋‹ฌ๊ฐ„์˜ ๋ฐ์ดํ„ฐ ์ค‘ ์–ธ๋”๋ฐ”๋ฅผ ์ œ๊ฑฐํ•œ ์ƒํƒœ์ธ 10000์ด ์ง€๋‚˜์น˜๊ฒŒ ํฐ ๊ฐ’์ด๋ผ๋Š” ์‚ฌ์‹ค์„ ํ™•์ธํ•จ)

๋”ฐ๋ผ์„œ __10000__๊ณผ ๊ณต๋ฐฑ๊ฐ’์„ ๋ชจ๋‘ error, ๊ฒฐ์ธก์น˜๋กœ ํŒ๋‹จํ•˜๊ณ  customer_id๋ณ„ ํ‰๊ท ๊ฐ’์„ ๊ณ„์‚ฐํ•ด ์ด๋ฅผ ์ฑ„์›Œ๋„ฃ์—ˆ๋‹ค. 1์ฐจ์ ์œผ๋กœ ๊ฐ’์„ ์ฑ„์›Œ๋„ฃ์€ ๊ฒฐ๊ณผ 4๊ฐœ์˜ ๊ณต๋ฐฑ๊ฐ’์ด ๋ฐœ์ƒํ–ˆ๊ณ , ์ด๋Š” ์•ž์„œ ์ด์ƒ์น˜๋ฅผ ์‚ญ์ œํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์ด์—ˆ๋‹ค. (์ด๋Ÿฌํ•œ ์‚ฌ์œ ๋•Œ๋ฌธ์— ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ๋ฅผ ๋จผ์ € ํ•ด์•ผํ•œ๋‹ค๋Š” ์‚ฌ์‹ค์„ ๊นจ๋‹ซ๊ฒŒ ๋˜์—ˆ๋‹ค..!) 4๊ฐœ์˜ ๊ณต๋ฐฑ๊ฐ’์˜ ๊ฒฝ์šฐ ์ฑ„์›Œ๋„ฃ์„ ๊ฐ’์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์•Œ ์ˆ˜ ์—†์—ˆ๊ธฐ ๋•Œ๋ฌธ์— 0์œผ๋กœ ๊ฐ’์„ ์ฑ„์›Œ๋„ฃ์—ˆ๋‹ค.

# ๋ฐ์ดํ„ฐ ํƒ€์ž…์„ numeric์œผ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ : __10000__๊ฐ’ ๋˜๋Š” ๊ณต๋ฐฑ๊ฐ’์€ ๋ชจ๋‘ error๋กœ ๋ฐ˜ํ™˜
bank['Amount_invested_monthly'] = pd.to_numeric(bank['Amount_invested_monthly'], errors='coerce')

# customer_id๋ณ„ ํ‰๊ท ๊ฐ’์œผ๋กœ error๋‚œ ๊ณณ ์ฑ„์›Œ๋„ฃ๊ธฐ
bank['Amount_invested_monthly'] = bank.groupby('Customer_ID')['Amount_invested_monthly'].transform(lambda x: x.fillna(x.mean()))

# null๊ฐ’ ์„ธ๊ธฐ
bank['Amount_invested_monthly'].isna().sum()  ## ๊ฒฐ๊ณผ 4

# customer_id๊ฐ€ 'CUS_0x311a'์ธ ๊ณ ๊ฐ์˜ ๊ฒฝ์šฐ amount_invested_monthly ๊ฐ’์ด ๋ชจ๋‘ ์—†์Œ -> 0์œผ๋กœ ์ฑ„์šฐ๊ธฐ
bank['Amount_invested_monthly'] = bank['Amount_invested_monthly'].fillna(0)

Monthly_Balance : ํ‰๊ท (mean)

Monthly_balance์˜ ๊ฒฝ์šฐ ์—ญ์‹œ 562๊ฐœ์˜ ๊ฒฐ์ธก์น˜๊ฐ€ ๋ฐœ์ƒํ•˜์˜€๋Š”๋ฐ, ์ด์ƒ์น˜์ธ 0์„ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ ๊ฐ’๋“ค์˜ ํ‰๊ท ์œผ๋กœ ๊ฒฐ์ธก๊ฐ’์„ ์ฑ„์›Œ๋„ฃ์—ˆ๋‹ค.

# monthly balance ๊ฒฐ์ธก์น˜ (ํ‰๊ท ์œผ๋กœ ์ฑ„์šฐ๊ธฐ)
mean_balance = bank[bank['Monthly_Balance']>0]['Monthly_Balance'].mean()
bank['Monthly_Balance'] = bank['Monthly_Balance'].fillna(mean_balance)

๐Ÿ—‘๏ธ ์ด์ƒ์น˜ ์ฒ˜๋ฆฌํ•˜๊ธฐ

์ปฌ๋Ÿผ ์ •๋ณด๊ฐ€ ๋ถˆ๋ช…ํ™•ํ–ˆ์Œ์—๋„, ์ƒ์‹์ด๋‚˜ ๊ด€๋ จ ์กฐ์‚ฌ ๊ฒฐ๊ณผ ๋ง์ด ๋˜์ง€ ์•Š๋Š” ์ด์ƒ์น˜๋“ค์ด ๋ฐœ๊ฒฌ๋˜์—ˆ๊ณ , ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ฒ˜๋ฆฌํ•˜์˜€๋‹ค.

Age : 0 ~ 109

bank = bank[((bank['Age']>0) & (bank['Age']<=109))]
bank['Customer_ID'].count() ##48599

Interest_Rate : 30์ดํ•˜

bank = bank[bank['Interest_Rate']<30]
bank['Interest_Rate'].count()  #43846

Num_of_Delayed_Payment 0์ด์ƒ 30 

cond = (bank['Num_of_Delayed_Payment'] >= 0) & (bank['Num_of_Delayed_Payment'] <= 30)
bank = bank[cond]
bank['Num_of_Delayed_Payment'].count() ##43,226

 

Monthly Balance, Delay_from_due_date ์Œ์ˆ˜์‚ญ์ œ

bank = bank[bank['Monthly_Balance']>0]
bank['Monthly_Balance'].count() ##43220

cond = (bank['Delay_from_due_date']) >= 0 & (bank['Delay_from_due_date'] <= 54)
bank = bank[cond]
bank['Delay_from_due_date'].count()##42942

 

'๐Ÿ“Š Data Analysis > ๐ŸŽฏ Project' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„(5)  (0) 2024.05.27
๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„(4)  (0) 2024.05.27
๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„ (2)  (0) 2024.05.20
๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„ (1)  (0) 2024.05.17
๋ฏธ๋‹ˆ ํ”„๋กœ์ ํŠธ : ์‚ฌํšŒ ์ดˆ๋…„์ƒ์˜ ์ „์…‹์ง‘ ๊ตฌํ•˜๊ธฐ (5)  (0) 2024.05.04
  1. ๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„
  2. ๐Ÿงน ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌํ•˜๊ธฐ
  3. ๐Ÿ—‘๏ธ ์ด์ƒ์น˜ ์ฒ˜๋ฆฌํ•˜๊ธฐ
'๐Ÿ“Š Data Analysis/๐ŸŽฏ Project' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • ๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„(5)
  • ๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„(4)
  • ๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„ (2)
  • ๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„ (1)
ny:D
ny:D
ny:D
n.archive
ny:D
์ „์ฒด
์˜ค๋Š˜
์–ด์ œ
  • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (163)
    • ๐Ÿ“Š Data Analysis (27)
      • ๐ŸŽฏ Project (21)
      • ๐Ÿ—‚๏ธ Note (6)
    • ๐Ÿ“’ Today I Learn (72)
      • ๐Ÿ˜Š My SQL (9)
      • ๐Ÿ Python (44)
      • ๐Ÿ“Š Tableau (9)
      • ๐Ÿ‘พ Error (5)
    • โŒจ๏ธ Code Taka (64)
      • MySQL (39)
      • Python (25)

๋ธ”๋กœ๊ทธ ๋ฉ”๋‰ด

  • ํ™ˆ
  • ํƒœ๊ทธ
  • ๋ฐฉ๋ช…๋ก

๊ณต์ง€์‚ฌํ•ญ

์ธ๊ธฐ ๊ธ€

ํƒœ๊ทธ

  • ์•Œ๊ณ ๋ฆฌ์ฆ˜
  • Eda
  • ๋ฐ์ดํ„ฐ์ˆ˜์ง‘
  • seaborn
  • Project
  • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ
  • wil
  • ML
  • error
  • ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”
  • ๋ฐ์ดํ„ฐ๋ฆฌํ„ฐ๋Ÿฌ์‹œ
  • ํŒŒ์ด์ฌ
  • ์‹œ๊ฐํ™”
  • python
  • matplotlib
  • ํ”„๋กœ๊ทธ๋ž˜๋จธ์Šค
  • DBeaver
  • BI
  • pandas
  • ํ”„๋กœ์ ํŠธ
  • spark
  • SQL
  • API
  • ํฌ๋กค๋ง
  • Tableau
  • p
  • MYSQL
  • selenium
  • Crawling

์ตœ๊ทผ ๋Œ“๊ธ€

์ตœ๊ทผ ๊ธ€

hELLO ยท Designed By ์ •์ƒ์šฐ.v4.3.0
ny:D
๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„ (3)
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”

๋‹จ์ถ•ํ‚ค

๋‚ด ๋ธ”๋กœ๊ทธ

๋‚ด ๋ธ”๋กœ๊ทธ - ๊ด€๋ฆฌ์ž ํ™ˆ ์ „ํ™˜
Q
Q
์ƒˆ ๊ธ€ ์“ฐ๊ธฐ
W
W

๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๊ธ€

๊ธ€ ์ˆ˜์ • (๊ถŒํ•œ ์žˆ๋Š” ๊ฒฝ์šฐ)
E
E
๋Œ“๊ธ€ ์˜์—ญ์œผ๋กœ ์ด๋™
C
C

๋ชจ๋“  ์˜์—ญ

์ด ํŽ˜์ด์ง€์˜ URL ๋ณต์‚ฌ
S
S
๋งจ ์œ„๋กœ ์ด๋™
T
T
ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ ์ด๋™
H
H
๋‹จ์ถ•ํ‚ค ์•ˆ๋‚ด
Shift + /
โ‡ง + /

* ๋‹จ์ถ•ํ‚ค๋Š” ํ•œ๊ธ€/์˜๋ฌธ ๋Œ€์†Œ๋ฌธ์ž๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๊ธฐ๋ณธ ๋„๋ฉ”์ธ์—์„œ๋งŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.