๐Ÿ“Š Data Analysis/๐ŸŽฏ Project

๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„(5)

ny:D 2024. 5. 27. 01:26

๊ธฐ์ดˆ ํ”„๋กœ์ ํŠธ : ์€ํ–‰ ๊ณ ๊ฐ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•œ ์„œ๋น„์Šค ๋ถ„์„

๐Ÿ“Š ์‹œ๊ฐํ™” - ์†Œ๋“์ด ๋†’์€ ๊ณ ๊ฐ(VIP)๊ณ ๊ฐ์„ ์žก์•„๋ผ!

3. VIP ๊ณ ๊ฐ๋“ค์˜ ์˜ˆ์น˜ ํ˜„ํ™ฉ ํŒŒ์•…

VIP ๊ณ ๊ฐ๋“ค์€ ํˆฌ์ž๋ฅผ ๋งŽ์ด ํ•˜๋Š”์ง€ ์•„๋‹ˆ๋ฉด ์˜ˆ์น˜๋ฅผ ํ•˜๋Š” ํŽธ์ธ์ง€ ์•Œ์•„๋ณด์ž.

โœ… VIP ๊ณ ๊ฐ๋“ค์€ ์†Œ๋“์ด ๋งŽ์œผ๋ฉด ํˆฌ์ž๋ฅผ ๋งŽ์ด ํ•˜๋Š”๊ฐ€?

# ์‚ฌ์ด์ฆˆ ์ง€์ •
plt.figure(figsize=(16,9))

# ์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ
sns.scatterplot(data = stat, x='Monthly_Income', y='Amount_invested_monthly', hue = 'age_group', palette = green_palette2, alpha= 0.5)

# ์ œ๋ชฉ ๋ถ™์ด๊ธฐ
plt.title('Regression Analysis of Monthly Income - Amount of Monthly Investment(VIP)')

# age_group๋ณ„ ํšŒ๊ท€์‹ ํ‘œ์‹œํ•˜๊ธฐ
grouped = stat.groupby('age_group')
for age_group, group_data in grouped:
    z = np.polyfit(group_data['Monthly_Income'], group_data['Amount_invested_monthly'], 1)

    slope = round(z[0], 4)
    intercept = round(z[1], 2)

    plt.text(1000000, 1300 - age_group*5, f"For age_group {age_group}: y = {slope}x + {intercept}")
  • ๊ธฐ์šธ๊ธฐ์˜ ๊ฐ’์ด ๋ชจ๋“  ๊ทธ๋ฃน์—์„œ ๋งค์šฐ ์ž‘๊ฑฐ๋‚˜ 0์— ๊ฐ€๊น๊ธฐ ๋•Œ๋ฌธ์—, ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ์›”๋ณ„ ์†Œ๋“๊ณผ ์›”๋ณ„ ์ €์ถ•์•ก ์‚ฌ์ด์— ๊ฐ•ํ•œ ์„ ํ˜• ๊ด€๊ณ„๊ฐ€ ์—†๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. VIP ๊ทธ๋ฃน์—์„œ๋Š” ์›”๋ณ„ ์†Œ๋“๊ณผ ์›”๋ณ„ ์ €์ถ•์•ก ๊ฐ„์˜ ์ง์ ‘์ ์ธ ๊ด€๋ จ์„ฑ์ด ๋‚ฎ๋‹ค.

# else ๊ทธ๋ฃน๋งŒ ๋ฌถ์–ด์„œ stat ๋ฐ์ดํ„ฐ๋กœ
stat = investment_stat[investment_stat['salary_group']=='else']

# ์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ
sns.scatterplot(data = stat, x='Monthly_Income', y='Amount_invested_monthly', hue = 'age_group', palette = green_palette2, alpha= 0.5)

# ์ œ๋ชฉ ๋ถ™์ด๊ธฐ
plt.title('Regression Analysis of Monthly Income - Amount of Monthly Investment(else)')

# ํšŒ๊ท€์‹ ๊ทธ๋ž˜ํ”„์— ํ‘œํ˜„ํ•˜๊ธฐ
grouped = stat.groupby('age_group')
for age_group, group_data in grouped:
    z = np.polyfit(group_data['Monthly_Income'], group_data['Amount_invested_monthly'], 1)

    slope = round(z[0], 4)
    intercept = round(z[1], 2)

    plt.text(2500, 1300 - age_group*5, f"For age_group {age_group}: y = {slope}x + {intercept}")
  • ๊ทธ๋Ÿฌ๋‚˜ ํŠน์ดํ•œ ์ ์€, else ๊ทธ๋ฃน์˜ ๊ฒฝ์šฐ VIP ๊ณ ๊ฐ๋“ค์— ๋น„ํ•ด ์›”๋ณ„ ์†Œ๋“๊ณผ ์ €์ถ•์•ก๊ฐ„์˜ ์„ ํ˜•์„ฑ์ด ๋†’๊ณ  ์–‘์˜ ๊ด€๊ณ„๋ฅผ ๋„๊ณ  ์žˆ๊ณ  VIP ๊ณ ๊ฐ๋“ค์˜ ๊ฒฝ์šฐ ์„ ํ˜•์„ฑ์€ ๋‚ฎ์ง€๋งŒ ์›”๋ณ„ ์†Œ๋“์•ก๊ณผ ์ €์ถ•์•ก์ด ์Œ์˜ ์„ ํ˜•๊ด€๊ณ„๋ผ๋Š” ๊ฒƒ์ด๋‹ค.

โœ… ์›”๋ณ„ ์˜ˆ์น˜๊ธˆ์€ 50๋Œ€๊ฐ€ ๊ฐ€์žฅ ๋†’๋‹ค.

# 'salary_group'์˜ median ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌ๋œ ์ˆœ์„œ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
median_order = stat.groupby('age_group')['Monthly_Balance'].median().sort_values(ascending=False).index

# median ๊ฐ’์— ๋”ฐ๋ผ ์ •๋ ฌ๋œ ์ˆœ์„œ๋กœ boxplot์„ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค.
sns.boxplot(data=stat, x='age_group', y='Monthly_Balance', palette=green_palette2, order=median_order)

 

๐Ÿ–ฑ๏ธ PPT ๋งŒ๋“ค๊ธฐ