320x100
흐음... 정말 집중력 어떻게 안되나....
호흡이 이렇게 긴 강의를 들을때는 그냥 속수 무책이다.
통계관련 강의가 나오기도 해서 대부분은 알아듣고, 이해가 안되는 부분도 있지만
근본적으로는 흥미를 잃는 느낌이랄까
한편으로는 흥미가 만땅이지만 한편으로는 흥미가 없달까..
거참...
내가 작성했던 코드도 있는데,
결과적으로 봤을때 그건 무의미할 정도로 호흡이 너무 길다.
따라서 내가 썼던 코드는 생략하도록 하겠다.
# Explanatory Data Analysis Challenge (Olympic Games)
## Data Import and Inspection
Import the Datasets Summer (__summer.csv__), Winter (__winter.csv__) and dictionary (__dictionary.csv__) and Inspect!
import pandas as pd
summer = pd.read_csv('summer.csv')
winter = pd.read_csv('winter.csv')
dic = pd.read_csv('dictionary.csv')
## Merging and Concatenating
# 1. __Merge__ Summer and Winter (one row for each Medal awarded in any Olympic Games) and save the merged DataFrame in __olympics__.
# 2. An __additional column__ (e.g. "Edition") shall indicate the Edition -> __Summer or Winter__.
# 3. Add the __full Country name__ from dictionary to olympics (e.g. France for FRA).
pd.concat([summer, winter], axis = 0, keys = ['Summer', 'Winter'],
names = ['Edition']).reset_index().drop(columns = 'level_1')
'''
Edition Year City Sport Discipline Athlete Country Gender Event Medal
0 Summer 1896 Athens Aquatics Swimming HAJOS, Alfred HUN Men 100M Freestyle Gold
1 Summer 1896 Athens Aquatics Swimming HERSCHMANN, Otto AUT Men 100M Freestyle Silver
2 Summer 1896 Athens Aquatics Swimming DRIVAS, Dimitrios GRE Men 100M Freestyle For Sailors Bronze
3 Summer 1896 Athens Aquatics Swimming MALOKINIS, Ioannis GRE Men 100M Freestyle For Sailors Gold
4 Summer 1896 Athens Aquatics Swimming CHASAPIS, Spiridon GRE Men 100M Freestyle For Sailors Silver
... ... ... ... ... ... ... ... ... ... ...
36930 Winter 2014 Sochi Skiing Snowboard JONES, Jenny GBR Women Slopestyle Bronze
36931 Winter 2014 Sochi Skiing Snowboard ANDERSON, Jamie USA Women Slopestyle Gold
36932 Winter 2014 Sochi Skiing Snowboard MALTAIS, Dominique CAN Women Snowboard Cross Silver
36933 Winter 2014 Sochi Skiing Snowboard SAMKOVA, Eva CZE Women Snowboard Cross Gold
36934 Winter 2014 Sochi Skiing Snowboard TRESPEUCH, Chloe FRA Women Snowboard Cross Bronze
36935 rows × 10 columns
'''
# axis 가 0이면 행을 기준으로 움직인다.
# 멀티인덱스로써 keys를 지정해줬다.
# 해당 멀티인덱스의 이름은 names다
pd.concat([summer, winter], axis = 0, keys = ['Summer', 'Winter'],
names = ['Edition'])
'''
Year City Sport Discipline Athlete Country Gender Event Medal
Edition
Summer 0 1896 Athens Aquatics Swimming HAJOS, Alfred HUN Men 100M Freestyle Gold
1 1896 Athens Aquatics Swimming HERSCHMANN, Otto AUT Men 100M Freestyle Silver
2 1896 Athens Aquatics Swimming DRIVAS, Dimitrios GRE Men 100M Freestyle For Sailors Bronze
3 1896 Athens Aquatics Swimming MALOKINIS, Ioannis GRE Men 100M Freestyle For Sailors Gold
4 1896 Athens Aquatics Swimming CHASAPIS, Spiridon GRE Men 100M Freestyle For Sailors Silver
... ... ... ... ... ... ... ... ... ... ...
Winter 5765 2014 Sochi Skiing Snowboard JONES, Jenny GBR Women Slopestyle Bronze
5766 2014 Sochi Skiing Snowboard ANDERSON, Jamie USA Women Slopestyle Gold
5767 2014 Sochi Skiing Snowboard MALTAIS, Dominique CAN Women Snowboard Cross Silver
5768 2014 Sochi Skiing Snowboard SAMKOVA, Eva CZE Women Snowboard Cross Gold
5769 2014 Sochi Skiing Snowboard TRESPEUCH, Chloe FRA Women Snowboard Cross Bronze
36935 rows × 9 columns
'''
# reset_index를 해주면 멀티인덱스로 잡아놨던것들이 자동으로 다 풀린다.
# 아직은 drop(columns)에 쓰인 level_1이 어디서 비롯되었는지 잘 모르겠다.
olympics = pd.concat([summer, winter], axis = 0, keys= ['Summer', 'Winter'],
names = ['Edition']).reset_index().drop(columns = 'level_1')
olympics
dic.head()
'''
Country Code Population GDP per Capita
0 Afghanistan AFG 32526562.0 594.323081
1 Albania ALB 2889167.0 3945.217582
2 Algeria ALG 39666519.0 4206.031232
3 American Samoa* ASA 55538.0 NaN
4 Andorra AND 70473.0 NaN
'''
# Country_x의 값과 Code의 값이 같았다.
# 굳이 2개의 데이터가 필요 없으므로, Code 열의 값을 모두 drop시킨다.
olympics = olympics.merge(dic.iloc[:,:2], how = 'left',
left_on = 'Country', right_on = 'Code').drop(columns = ['Code'])
## Data Cleaning (Part 1)
# 1. If you haven´t done it yet: Assign appropriate __Column Headers__ to Country Codes (e.g. __"Code"__) and full Country Names (e.g. __"Country"__).
# 2. __Remove Spaces__ from column headers in olympics and dictionary.
# 3. For some Country Codes, there is no corresponding __full Country Name__ available (e.g. for "URS") -> __missing values__ in olympics. Identify these Country Codes and search the Web for the full Country Names. __Replace__ missing values! (Alternatively, you can find a Solution for this at the end of this Notebook!)
# 컬럼 이름 바꾸기
olympics.rename(columns = {'Country_x':'Code', 'Country_y':'Country'}, inplace = True)
# 컬럼 이름 바꾸기
dic.rename(columns = {'GDP per Capita':'GDP'}, inplace = True)
dic
'''
Country Code Population GDP
0 Afghanistan AFG 32526562.0 594.323081
1 Albania ALB 2889167.0 3945.217582
2 Algeria ALG 39666519.0 4206.031232
3 American Samoa* ASA 55538.0 NaN
4 Andorra AND 70473.0 NaN
... ... ... ... ...
196 Vietnam VIE 91703800.0 2111.138024
197 Virgin Islands* ISV 103574.0 NaN
198 Yemen YEM 26832215.0 1406.291651
199 Zambia ZAM 16211767.0 1304.879014
200 Zimbabwe ZIM 15602751.0 924.143819
201 rows × 4 columns
'''
olympics.info()
'''
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36935 entries, 0 to 36934
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Edition 36935 non-null object
1 Year 36935 non-null int64
2 City 36935 non-null object
3 Sport 36935 non-null object
4 Discipline 36935 non-null object
5 Athlete 36935 non-null object
6 Country 36931 non-null object
7 Gender 36935 non-null object
8 Event 36935 non-null object
9 Medal 36935 non-null object
dtypes: int64(1), object(9)
memory usage: 2.8+ MB
'''
olympics.loc[olympics.Country.isnull()].Code.value_counts()
'''
URS 2489
GDR 987
ROU 642
FRG 584
TCH 487
YUG 442
EUN 283
EUA 281
ZZX 48
SRB 31
ANZ 29
RU1 17
MNE 14
TTO 10
BOH 7
BWI 5
SGP 4
IOP 3
Name: Code, dtype: int64
'''
old_c = olympics.loc[olympics.Country.isnull(), :].Code.value_counts().index
old_c
'''
Index(['URS', 'GDR', 'ROU', 'FRG', 'TCH', 'YUG', 'EUN', 'EUA', 'ZZX', 'SRB',
'ANZ', 'RU1', 'MNE', 'TTO', 'BOH', 'BWI', 'SGP', 'IOP'],
dtype='object')
'''
data = ["Soviet Union", "East Germany", "Romania", "West Germany", "Czechoslovakia",
"Yugoslavia", "Unified Team", "Unified Team of Germany", "Mixed teams", "Serbia",
"Australasia", "Russian Empire", "Montenegro", "Trinidad and Tobago", "Bohemia",
"West Indies Federation", "Singapore", "Independent Olympic Participants"]
mapper = pd.Series(index = old_c, name = 'Country', data = data)
'''
URS Soviet Union
GDR East Germany
ROU Romania
FRG West Germany
TCH Czechoslovakia
YUG Yugoslavia
EUN Unified Team
EUA Unified Team of Germany
ZZX Mixed teams
SRB Serbia
ANZ Australasia
RU1 Russian Empire
MNE Montenegro
TTO Trinidad and Tobago
BOH Bohemia
BWI West Indies Federation
SGP Singapore
IOP Independent Olympic Participants
Name: Country, dtype: object
'''
# Country 컬럼에 해당하는 필드 값중에서
# olympics 변수의 Country 컬럼에 결측값이 있는지 확인
olympics.Country.isnull()
# 위의 결측값이 Country 컬럼에 있는 행들 중에서
# Code 행에 대한 값을 더해주자.
olympics.loc[olympics.Country.isnull()].Code.value_counts()
'''
URS 2489
GDR 987
ROU 642
FRG 584
TCH 487
YUG 442
EUN 283
EUA 281
ZZX 48
SRB 31
ANZ 29
RU1 17
MNE 14
TTO 10
BOH 7
BWI 5
SGP 4
IOP 3
Name: Code, dtype: int64
'''
# 이 국가들에 결측값이 많은 이유는, 더이상 존재하지 않는 나라인 확률이 높기 때문이다.
# 내코드
# 내 코드는 쓰임의 모습은 맞는것같은데, 기능측면에 있어서 완전 틀렸다.
# 그 증거로 전체 인덱스값이 조회된다.
# olympics.Country.isnull()
# 해답코드
missing_C_index = olympics.loc[olympics.Country.isnull()].index
olympics.loc[missing_C_index, 'Code'].map(mapper)
'''
132 Mixed teams
133 Mixed teams
134 Mixed teams
135 Mixed teams
136 Mixed teams
...
33961 Unified Team
33977 Czechoslovakia
33978 Czechoslovakia
33979 Czechoslovakia
33980 Czechoslovakia
Name: Code, Length: 6367, dtype: object
'''
olympics.Country.fillna(olympics.Code.map(mapper), inplace = True)
'''
0 Hungary
1 Austria
2 Greece
3 Greece
4 Greece
...
36930 United Kingdom
36931 United States
36932 Canada
36933 Czech Republic
36934 France
Name: Country, Length: 36935, dtype: object
'''
# 결과적으로 missing_C_index에는 결측값이 Country에 들어있던 값들의 인덱스 모음집이였다.
# 그리고 위의 fillna 작업등을 통해서 결측값을 채워주었다.
olympics.loc[missing_C_index]
'''
Edition Year City Sport Discipline Athlete Code Gender Event Medal Country
132 Summer 1896 Athens Tennis Tennis FLACK, Edwin ZZX Men Doubles Bronze Mixed teams
133 Summer 1896 Athens Tennis Tennis ROBERTSON, George Stuart ZZX Men Doubles Bronze Mixed teams
134 Summer 1896 Athens Tennis Tennis BOLAND, John ZZX Men Doubles Gold Mixed teams
135 Summer 1896 Athens Tennis Tennis TRAUN, Friedrich ZZX Men Doubles Gold Mixed teams
136 Summer 1896 Athens Tennis Tennis KASDAGLIS, Dionysios ZZX Men Doubles Silver Mixed teams
... ... ... ... ... ... ... ... ... ... ... ...
33961 Winter 1992 Albertville Skiing Freestyle Skiing KOZHEVNIKOVA, Yelizaveta EUN Women Moguls Silver Unified Team
33977 Winter 1992 Albertville Skiing Ski Jumping GODER, Tomas TCH Men K120 Team (90M) Bronze Czechoslovakia
33978 Winter 1992 Albertville Skiing Ski Jumping JEZ, Frantisek TCH Men K120 Team (90M) Bronze Czechoslovakia
33979 Winter 1992 Albertville Skiing Ski Jumping PARMA, Jiri TCH Men K120 Team (90M) Bronze Czechoslovakia
33980 Winter 1992 Albertville Skiing Ski Jumping SAKALA, Jaroslav TCH Men K120 Team (90M) Bronze Czechoslovakia
6367 rows × 11 columns
'''
## Data Cleaning (Part 2)
olympics.info()
'''
<class 'pandas.core.frame.DataFrame'>
Int64Index: 36935 entries, 0 to 36934
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Edition 36935 non-null object
1 Year 36935 non-null int64
2 City 36935 non-null object
3 Sport 36935 non-null object
4 Discipline 36935 non-null object
5 Athlete 36935 non-null object
6 Code 36931 non-null object
7 Gender 36935 non-null object
8 Event 36935 non-null object
9 Medal 36935 non-null object
10 Country 36931 non-null object
dtypes: int64(1), object(10)
memory usage: 4.4+ MB
'''
# Code와 Country 컬럼에 각 4개씩의 결측값이 있다.
# 왜 그런지 필터링으로 찾아보자.
# 결측값 찾기 필터 마스크
olympics.loc[olympics.Code.isna()]
'''
Edition Year City Sport Discipline Athlete Code Gender Event Medal Country
29603 Summer 2012 London Athletics Athletics Pending NaN Women 1500M Gold NaN
31072 Summer 2012 London Weightlifting Weightlifting Pending NaN Women 63KG Gold NaN
31091 Summer 2012 London Weightlifting Weightlifting Pending NaN Men 94KG Silver NaN
31110 Summer 2012 London Wrestling Wrestling Freestyle KUDUKHOV, Besik NaN Men Wf 60 KG Silver NaN
'''
# 결측값이 있는 행을 버리자.
olympics.dropna(subset = ['Code'], inplace = True)
olympics.info()
'''
<class 'pandas.core.frame.DataFrame'>
Int64Index: 36931 entries, 0 to 36934
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Edition 36931 non-null object
1 Year 36931 non-null int64
2 City 36931 non-null object
3 Sport 36931 non-null object
4 Discipline 36931 non-null object
5 Athlete 36931 non-null object
6 Code 36931 non-null object
7 Gender 36931 non-null object
8 Event 36931 non-null object
9 Medal 36931 non-null object
10 Country 36931 non-null object
dtypes: int64(1), object(10)
memory usage: 3.4+ MB
'''
# 정말 신기하다. 결측값이 있던 컬럼의 4개 행을 버리니
# 행의 일부만 버려진게 아니라 해당 행은 모두 버려졌다.
# 값이 추가되거나 버려지고나서는 반드시 reset을 해주어야 한다.
olympics.reset_index(drop = True, inplace = True)
olympics.nunique()
'''
Edition 2
Year 33
City 41
Sport 48
Discipline 80
Athlete 26494
Code 148
Gender 2
Event 733
Medal 3
Country 145
dtype: int64
1
'''
olympics.Medal = olympics.Medal.astype('category')
olympics.info()
'''
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36931 entries, 0 to 36930
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Edition 36931 non-null object
1 Year 36931 non-null int64
2 City 36931 non-null object
3 Sport 36931 non-null object
4 Discipline 36931 non-null object
5 Athlete 36931 non-null object
6 Code 36931 non-null object
7 Gender 36931 non-null object
8 Event 36931 non-null object
9 Medal 36931 non-null category
10 Country 36931 non-null object
dtypes: category(1), int64(1), object(9)
memory usage: 2.9+ MB
'''
olympics.Medal.sort_values()
'''
36930 Bronze
25565 Bronze
25562 Bronze
25559 Bronze
11084 Bronze
...
10811 Silver
10810 Silver
10809 Silver
25834 Silver
18465 Silver
Name: Medal, Length: 36931, dtype: category
Categories (3, object): ['Bronze', 'Gold', 'Silver']
'''
olympics.Medal.cat.set_categories(['Bronze', 'Silver', 'Gold'], ordered = True, inplace = True)
olympics.Medal.sort_values()
'''
36930 Bronze
8916 Bronze
15339 Bronze
27698 Bronze
8919 Bronze
...
25833 Gold
10807 Gold
10806 Gold
10804 Gold
0 Gold
Name: Medal, Length: 36931, dtype: category
Categories (3, object): ['Bronze' < 'Silver' < 'Gold']
'''
## What are the most successful countries of all times?
import matplotlib.pyplot as plt
import seaborn as sns
olympics
'''
Edition Year City Sport Discipline Athlete Code Gender Event Medal Country
0 Summer 1896 Athens Aquatics Swimming HAJOS, Alfred HUN Men 100M Freestyle Gold Hungary
1 Summer 1896 Athens Aquatics Swimming HERSCHMANN, Otto AUT Men 100M Freestyle Silver Austria
2 Summer 1896 Athens Aquatics Swimming DRIVAS, Dimitrios GRE Men 100M Freestyle For Sailors Bronze Greece
3 Summer 1896 Athens Aquatics Swimming MALOKINIS, Ioannis GRE Men 100M Freestyle For Sailors Gold Greece
4 Summer 1896 Athens Aquatics Swimming CHASAPIS, Spiridon GRE Men 100M Freestyle For Sailors Silver Greece
... ... ... ... ... ... ... ... ... ... ... ...
36926 Winter 2014 Sochi Skiing Snowboard JONES, Jenny GBR Women Slopestyle Bronze United Kingdom
36927 Winter 2014 Sochi Skiing Snowboard ANDERSON, Jamie USA Women Slopestyle Gold United States
36928 Winter 2014 Sochi Skiing Snowboard MALTAIS, Dominique CAN Women Snowboard Cross Silver Canada
36929 Winter 2014 Sochi Skiing Snowboard SAMKOVA, Eva CZE Women Snowboard Cross Gold Czech Republic
36930 Winter 2014 Sochi Skiing Snowboard TRESPEUCH, Chloe FRA Women Snowboard Cross Bronze France
36931 rows × 11 columns
'''
olympics.Country.value_counts()
'''
United States 5238
Soviet Union 2489
United Kingdom 1799
Germany 1665
France 1548
...
Virgin Islands* 1
Guyana 1
Netherlands Antilles* 1
Iraq 1
Bermuda* 1
Name: Country, Length: 145, dtype: int64
'''
# 빈도가 높은 국가를 순위화 해서 10개 rank화
top_10 = olympics.Country.value_counts().head(10)
'''
United States 5238
Soviet Union 2489
United Kingdom 1799
Germany 1665
France 1548
Italy 1488
Sweden 1477
Canada 1274
Australia 1204
Hungary 1091
Name: Country, dtype: int64
'''
top_10.plot(kind = 'bar', fontsize = 15, figsize = (14,10))
plt.title('Top 10 Countries by Medals', fontsize = 15)
plt.ylabel('Medals', fontsize =14)
plt.show()
olympics_10 = olympics.loc[olympics.Country.isin(top_10.index)]
olympics.loc[olympics.Country.isin(top_10.index)]
'''
Edition Year City Sport Discipline Athlete Code Gender Event Medal Country
0 Summer 1896 Athens Aquatics Swimming HAJOS, Alfred HUN Men 100M Freestyle Gold Hungary
6 Summer 1896 Athens Aquatics Swimming HAJOS, Alfred HUN Men 1200M Freestyle Gold Hungary
11 Summer 1896 Athens Athletics Athletics LANE, Francis USA Men 100M Bronze United States
12 Summer 1896 Athens Athletics Athletics SZOKOLYI, Alajos HUN Men 100M Bronze Hungary
13 Summer 1896 Athens Athletics Athletics BURKE, Thomas USA Men 100M Gold United States
... ... ... ... ... ... ... ... ... ... ... ...
36924 Winter 2014 Sochi Skiing Snowboard KOBER, Amelie GER Women Parallel Slalom Bronze Germany
36926 Winter 2014 Sochi Skiing Snowboard JONES, Jenny GBR Women Slopestyle Bronze United Kingdom
36927 Winter 2014 Sochi Skiing Snowboard ANDERSON, Jamie USA Women Slopestyle Gold United States
36928 Winter 2014 Sochi Skiing Snowboard MALTAIS, Dominique CAN Women Snowboard Cross Silver Canada
36930 Winter 2014 Sochi Skiing Snowboard TRESPEUCH, Chloe FRA Women Snowboard Cross Bronze France
19273 rows × 11 columns
'''
plt.figure(figsize = (20,10))
sns.set(font_scale = 1.5, palette = 'dark')
# order에 rank를 먹여줌으로 rank별로 정렬이 되었다.
sns.countplot(data = olympics_10, x = 'Country',order = top_10.index)
plt.title('Top 10 Countries by Medals', fontsize = 20)
plt.show()
plt.figure(figsize = (20,10))
sns.set(font_scale = 1.5, palette = 'dark')
sns.countplot(data = olympics_10, x = 'Country', hue = 'Edition', order = top_10.index)
plt.title('Top 10 Countries by Medals', fontsize = 20)
plt.show()
plt.figure(figsize = (20,10))
sns.set(font_scale = 1.5, palette = 'dark')
sns.countplot(data = olympics_10, x = 'Edition', hue = 'Country', hue_order = top_10.index)
plt.title('Top 10 Countries by Medals', fontsize = 20)
plt.show()
plt.figure(figsize= (20,10))
sns.set(font_scale = 1.5, palette = 'dark')
sns.countplot(data = olympics_10, x = 'Country', hue = 'Medal', order = top_10.index,
hue_order = ['Gold', 'Silver', 'Bronze'], palette = ['gold', 'silver', 'brown'])
plt.title('Top 10 Countries by Medals', fontsize = 20)
plt.show()
plt.figure(figsize = (20,10))
sns.set(font_scale = 1.5, palette = 'dark')
sns.countplot(data = olympics_10, x = 'Medal', hue = 'Country',
order = ['Gold', 'Silver', 'Bronze'], hue_order = top_10.index)
plt.title('Top 10 Countries by Medals', fontsize = 20)
plt.show()
## Do GDP, Population and Politics matter?
olympics
'''
Edition Year City Sport Discipline Athlete Code Gender Event Medal Country
0 Summer 1896 Athens Aquatics Swimming HAJOS, Alfred HUN Men 100M Freestyle Gold Hungary
1 Summer 1896 Athens Aquatics Swimming HERSCHMANN, Otto AUT Men 100M Freestyle Silver Austria
2 Summer 1896 Athens Aquatics Swimming DRIVAS, Dimitrios GRE Men 100M Freestyle For Sailors Bronze Greece
3 Summer 1896 Athens Aquatics Swimming MALOKINIS, Ioannis GRE Men 100M Freestyle For Sailors Gold Greece
4 Summer 1896 Athens Aquatics Swimming CHASAPIS, Spiridon GRE Men 100M Freestyle For Sailors Silver Greece
... ... ... ... ... ... ... ... ... ... ... ...
36926 Winter 2014 Sochi Skiing Snowboard JONES, Jenny GBR Women Slopestyle Bronze United Kingdom
36927 Winter 2014 Sochi Skiing Snowboard ANDERSON, Jamie USA Women Slopestyle Gold United States
36928 Winter 2014 Sochi Skiing Snowboard MALTAIS, Dominique CAN Women Snowboard Cross Silver Canada
36929 Winter 2014 Sochi Skiing Snowboard SAMKOVA, Eva CZE Women Snowboard Cross Gold Czech Republic
36930 Winter 2014 Sochi Skiing Snowboard TRESPEUCH, Chloe FRA Women Snowboard Cross Bronze France
36931 rows × 11 columns
'''
# 기본적으로 margins = True를 해주면
# 컬럼기준으로, 행 기준으로 Total이 생성된다.
medals_per_country = pd.crosstab(index = olympics.Country, columns = olympics.Medal,
margins = True, margins_name = 'Total')
'''
Medal Bronze Silver Gold Total
Country
Afghanistan 2 0 0 2
Algeria 8 2 5 15
Argentina 91 99 69 259
Armenia 8 2 1 11
Australasia 5 4 20 29
... ... ... ... ...
West Indies Federation 5 0 0 5
Yugoslavia 119 180 143 442
Zambia 1 1 0 2
Zimbabwe 1 4 18 23
Total 12288 12238 12405 36931
146 rows × 4 columns
'''
medals_per_country.drop('Total', axis = 0, inplace = True)
medals_per_country
'''
Medal Bronze Silver Gold Total
Country
Afghanistan 2 0 0 2
Algeria 8 2 5 15
Argentina 91 99 69 259
Armenia 8 2 1 11
Australasia 5 4 20 29
... ... ... ... ...
West Germany 227 191 166 584
West Indies Federation 5 0 0 5
Yugoslavia 119 180 143 442
Zambia 1 1 0 2
Zimbabwe 1 4 18 23
145 rows × 4 columns
'''
medals_per_contry = medals_per_country[['Total', 'Gold', 'Silver', 'Bronze']].sort_values(by = ['Total', 'Gold', 'Silver', 'Bronze'], ascending= False)
'''
Medal Total Gold Silver Bronze
Country
United States 5238 2402 1571 1265
Soviet Union 2489 1088 724 677
United Kingdom 1799 580 632 587
Germany 1665 589 504 572
France 1548 444 526 578
... ... ... ... ...
Iraq 1 0 0 1
Macedonia 1 0 0 1
Mauritius 1 0 0 1
Niger 1 0 0 1
Togo 1 0 0 1
145 rows × 4 columns
'''
medals_per_country.head()
'''
Medal Bronze Silver Gold Total
Country
Afghanistan 2 0 0 2
Algeria 8 2 5 15
Argentina 91 99 69 259
Armenia 8 2 1 11
Australasia 5 4 20 29
'''
dic
'''
Country Code Population GDP
0 Afghanistan AFG 32526562.0 594.323081
1 Albania ALB 2889167.0 3945.217582
2 Algeria ALG 39666519.0 4206.031232
3 American Samoa* ASA 55538.0 NaN
4 Andorra AND 70473.0 NaN
... ... ... ... ...
196 Vietnam VIE 91703800.0 2111.138024
197 Virgin Islands* ISV 103574.0 NaN
198 Yemen YEM 26832215.0 1406.291651
199 Zambia ZAM 16211767.0 1304.879014
200 Zimbabwe ZIM 15602751.0 924.143819
201 rows × 4 columns
'''
medals_per_country = medals_per_country.merge(dic, how = 'left', left_index = True,
right_on = 'Country').drop(columns = ['Code']).set_index('Country')
'''
Bronze Silver Gold Total Population GDP
Country
Afghanistan 2 0 0 2 32526562.0 594.323081
Algeria 8 2 5 15 39666519.0 4206.031232
Argentina 91 99 69 259 43416755.0 13431.878340
Armenia 8 2 1 11 3017712.0 3489.127690
Australasia 5 4 20 29 NaN NaN
... ... ... ... ... ... ...
West Germany 227 191 166 584 NaN NaN
West Indies Federation 5 0 0 5 NaN NaN
Yugoslavia 119 180 143 442 NaN NaN
Zambia 1 1 0 2 16211767.0 1304.879014
Zimbabwe 1 4 18 23 15602751.0 924.143819
145 rows × 6 columns
'''
olympics.nunique()
'''
Edition 2
Year 33
City 41
Sport 48
Discipline 80
Athlete 26494
Code 148
Gender 2
Event 733
Medal 3
Country 145
dtype: int64
'''
# lambda를 이용한 컬럼간 필드값 scala 방식으로 합치기
# 매우 중요
olympics['Games'] = olympics.apply(lambda x: str(x.Year) + ' ' + x.City, axis = 1)
'''
0 1896 Athens
1 1896 Athens
2 1896 Athens
3 1896 Athens
4 1896 Athens
...
36926 2014 Sochi
36927 2014 Sochi
36928 2014 Sochi
36929 2014 Sochi
36930 2014 Sochi
Length: 36931, dtype: object
'''
# 각 컬럼별로 고유한 필드값이 몇개인가.
olympics.nunique()
'''
Edition 2
Year 33
City 41
Sport 48
Discipline 80
Athlete 26494
Code 148
Gender 2
Event 733
Medal 3
Country 145
Games 49
dtype: int64
'''
olympics.Games.nunique()
# 49
olympics.Games.value_counts()
'''
2008 Beijing 2042
2000 Sydney 2015
2004 Athens 1998
2012 London 1945
1996 Atlanta 1859
1992 Barcelona 1705
1988 Seoul 1546
1984 Los Angeles 1459
1980 Moscow 1387
1976 Montreal 1305
1920 Antwerp 1298
1972 Munich 1185
1968 Mexico 1031
1964 Tokyo 1010
1952 Helsinki 889
1956 Melbourne / Stockholm 885
1912 Stockholm 885
1924 Paris 884
1960 Rome 882
1936 Berlin 875
1948 London 814
1908 London 804
1928 Amsterdam 710
1932 Los Angeles 615
2014 Sochi 612
2006 Turin 531
2010 Vancouver 529
1900 Paris 512
2002 Salt Lake City 481
1904 St Louis 470
1998 Nagano 447
1994 Lillehammer 343
1992 Albertville 325
1988 Calgary 264
1984 Sarajevo 222
1980 Lake Placid 218
1976 Innsbruck 210
1972 Sapporo 200
1968 Grenoble 199
1964 Innsbruck 185
1896 Athens 151
1956 Cortina d\'Ampezzo 150
1960 Squaw Valley 147
1948 St.Moritz 140
1952 Oslo 136
1924 Chamonix 118
1932 Lake Placid 116
1936 Garmisch Partenkirchen 108
1928 St.Moritz 89
Name: Games, dtype: int64
'''
olympics.groupby('Country').apply(lambda x: x.Games.nunique())
'''
Country
Afghanistan 2
Algeria 6
Argentina 18
Armenia 4
Australasia 2
..
West Germany 12
West Indies Federation 1
Yugoslavia 18
Zambia 2
Zimbabwe 3
Length: 145, dtype: int64
'''
medals_per_country['Total_Games'] = olympics.groupby('Country').apply(lambda x: x.Games.nunique())
medals_per_country
'''
Bronze Silver Gold Total Population GDP Total_Games
Country
Afghanistan 2 0 0 2 32526562.0 594.323081 2
Algeria 8 2 5 15 39666519.0 4206.031232 6
Argentina 91 99 69 259 43416755.0 13431.878340 18
Armenia 8 2 1 11 3017712.0 3489.127690 4
Australasia 5 4 20 29 NaN NaN 2
... ... ... ... ... ... ... ...
West Germany 227 191 166 584 NaN NaN 12
West Indies Federation 5 0 0 5 NaN NaN 1
Yugoslavia 119 180 143 442 NaN NaN 18
Zambia 1 1 0 2 16211767.0 1304.879014 2
Zimbabwe 1 4 18 23 15602751.0 924.143819 3
145 rows × 7 columns
'''
medals_per_country.rank(ascending= False)
'''
Bronze Silver Gold Total Population GDP Total_Games
Country
Afghanistan 100.0 135.0 122.5 117.0 37.0 115.0 107.5
Algeria 78.0 94.5 71.0 83.5 31.0 79.0 73.0
Argentina 34.0 33.0 35.0 34.0 29.0 44.0 28.0
Armenia 78.0 94.5 91.5 87.0 101.0 89.0 84.0
Australasia 86.5 82.0 51.5 67.0 NaN NaN 107.5
... ... ... ... ... ... ... ...
West Germany 21.0 24.0 19.0 21.0 NaN NaN 40.0
West Indies Federation 86.5 135.0 122.5 101.0 NaN NaN 131.0
Yugoslavia 29.0 26.0 24.0 26.0 NaN NaN 28.0
Zambia 114.5 113.0 122.5 117.0 57.0 105.0 107.5
Zimbabwe 114.5 82.0 54.0 74.5 59.0 109.0 93.0
145 rows × 7 columns
'''
## Statistical Analysis and Hypothesis Testing with scipy
medals_per_country
'''
Bronze Silver Gold Total Population GDP Total_Games
Country
Afghanistan 2 0 0 2 32526562.0 594.323081 2
Algeria 8 2 5 15 39666519.0 4206.031232 6
Argentina 91 99 69 259 43416755.0 13431.878340 18
Armenia 8 2 1 11 3017712.0 3489.127690 4
Australasia 5 4 20 29 NaN NaN 2
... ... ... ... ... ... ... ...
West Germany 227 191 166 584 NaN NaN 12
West Indies Federation 5 0 0 5 NaN NaN 1
Yugoslavia 119 180 143 442 NaN NaN 18
Zambia 1 1 0 2 16211767.0 1304.879014 2
Zimbabwe 1 4 18 23 15602751.0 924.143819 3
145 rows × 7 columns
'''
medals_per_country.drop(columns = ['Gold', 'Silver', 'Bronze'], inplace = True)
medals_per_country
'''
Total Population GDP Total_Games
Country
Afghanistan 2 32526562.0 594.323081 2
Algeria 15 39666519.0 4206.031232 6
Argentina 259 43416755.0 13431.878340 18
Armenia 11 3017712.0 3489.127690 4
Australasia 29 NaN NaN 2
... ... ... ... ...
West Germany 584 NaN NaN 12
West Indies Federation 5 NaN NaN 1
Yugoslavia 442 NaN NaN 18
Zambia 2 16211767.0 1304.879014 2
Zimbabwe 23 15602751.0 924.143819 3
145 rows × 4 columns
'''
medals_per_country
'''
Total Population GDP Total_Games
Country
Afghanistan 2 32526562.0 594.323081 2
Algeria 15 39666519.0 4206.031232 6
Argentina 259 43416755.0 13431.878340 18
Armenia 11 3017712.0 3489.127690 4
Australasia 29 NaN NaN 2
... ... ... ... ...
West Germany 584 NaN NaN 12
West Indies Federation 5 NaN NaN 1
Yugoslavia 442 NaN NaN 18
Zambia 2 16211767.0 1304.879014 2
Zimbabwe 23 15602751.0 924.143819 3
145 rows × 4 columns
'''
# corr는 상관계수 표현이다.
# +1은 완전한 양의 상관관계
# -1은 완전한 음의 상관관계
# 0은 완전히 상관관계가 없음을 의미한다.
medals_per_country.corr(method = 'pearson')
'''
Total Population GDP Total_Games
Total 1.000000 0.206606 0.437131 0.712917
Population 0.206606 1.000000 -0.089976 0.123700
GDP 0.437131 -0.089976 1.000000 0.563540
Total_Games 0.712917 0.123700 0.563540 1.000000
'''
# pearson method는 '선형 관계'만 파악한다.
# 따라서 상관관계 판단에 제한적이다.
# corr 상관계수
# 따라서 spearman method를 써줌으로써 선형인지 여부와 관계없이 통계적 테스트가 정규 분포를 가정하지 않는다.
# 덜 제한적이다. 라고 할 수 있다.
medals_per_country.corr(method = 'spearman')
'''
Total Population GDP Total_Games
Total 1.000000 0.419755 0.458478 0.927611
Population 0.419755 1.000000 -0.239097 0.402098
GDP 0.458478 -0.239097 1.000000 0.498056
Total_Games 0.927611 0.402098 0.498056 1.000000
'''
medals_per_country.rank(ascending=False).corr(method = 'pearson')
'''
Total Population GDP Total_Games
Total 1.000000 0.419634 0.458508 0.927611
Population 0.419634 1.000000 -0.239567 0.401448
GDP 0.458508 -0.239567 1.000000 0.497109
Total_Games 0.927611 0.401448 0.497109 1.000000
'''
# 가설검정 하기
import scipy.stats as stats
#### Hypothesis 1: There is no relationship between Total Medals and Population
stats.spearmanr(medals_per_country.Total, medals_per_country.Population,
nan_policy = 'omit')
# SpearmanrResult(correlation=0.41975527887063924, pvalue=8.132240868116897e-07)
r, p_value = stats.spearmanr(medals_per_country.Total, medals_per_country.Population,
nan_policy = 'omit')
r
# 0.41975527887063924
#### Reject Hypothesis 1 -> There is a significant (positive) relationship between Total Medals and Population
---
### Hypothesis 2: There is no relationship between Total Medals and GDP per capita
r, p_value = stats.spearmanr(medals_per_country.Total, medals_per_country.GDP,
nan_policy = 'omit')
r
# 0.45847787983426685
p_value
# 1.5758169913239602e-07
#### Reject Hypothesis 2 -> There is a significant (positive) relationship between Total Medals and GDP per Capita
---
#### Hypothesis 3: There is no relationship between Total Medals and Participations
medals_per_country.head()
'''
Total Population GDP Total_Games
Country
Afghanistan 2 32526562.0 594.323081 2
Algeria 15 39666519.0 4206.031232 6
Argentina 259 43416755.0 13431.878340 18
Armenia 11 3017712.0 3489.127690 4
Australasia 29 NaN NaN 2
'''
r, p_value = stats.spearmanr(medals_per_country.head(50).Total,
medals_per_country.head(50).Total_Games)
r
# 0.9642475115792023
p_value
# 2.4562642832742088e-29
## Aggregating and Ranking
sum_vs_win = pd.crosstab(olympics.Country, olympics.Edition)
'''
Edition Summer Winter
Country
Afghanistan 2 0
Algeria 15 0
Argentina 259 0
Armenia 11 0
Australasia 29 0
... ... ...
West Germany 490 94
West Indies Federation 5 0
Yugoslavia 435 7
Zambia 2 0
Zimbabwe 23 0
145 rows × 2 columns
'''
men_vs_women = pd.crosstab(olympics.Country, olympics.Gender)
'''
Gender Men Women
Country
Afghanistan 2 0
Algeria 12 3
Argentina 187 72
Armenia 11 0
Australasia 27 2
... ... ...
West Germany 450 134
West Indies Federation 5 0
Yugoslavia 379 63
Zambia 2 0
Zimbabwe 0 23
145 rows × 2 columns
'''
medals_by_cats = pd.concat([sum_vs_win, men_vs_women], axis = 1)
'''
Summer Winter Men Women
Country
Afghanistan 2 0 2 0
Algeria 15 0 12 3
Argentina 259 0 187 72
Armenia 11 0 11 0
Australasia 29 0 27 2
... ... ... ... ...
West Germany 490 94 450 134
West Indies Federation 5 0 5 0
Yugoslavia 435 7 379 63
Zambia 2 0 2 0
Zimbabwe 23 0 0 23
145 rows × 4 columns
'''
medals_by_cats['Total'] = medals_by_cats.Summer + medals_by_cats.Winter
medals_by_cats
'''
Summer Winter Men Women Total
Country
Afghanistan 2 0 2 0 2
Algeria 15 0 12 3 15
Argentina 259 0 187 72 259
Armenia 11 0 11 0 11
Australasia 29 0 27 2 29
... ... ... ... ... ...
West Germany 490 94 450 134 584
West Indies Federation 5 0 5 0 5
Yugoslavia 435 7 379 63 442
Zambia 2 0 2 0 2
Zimbabwe 23 0 0 23 23
145 rows × 5 columns
'''
medals_by_cats = pd.concat([sum_vs_win, men_vs_women], axis = 1)
'''
Summer Winter Men Women
Country
Afghanistan 2 0 2 0
Algeria 15 0 12 3
Argentina 259 0 187 72
Armenia 11 0 11 0
Australasia 29 0 27 2
... ... ... ... ...
West Germany 490 94 450 134
West Indies Federation 5 0 5 0
Yugoslavia 435 7 379 63
Zambia 2 0 2 0
Zimbabwe 23 0 0 23
145 rows × 4 columns
'''
medals_by_cats['Total'] = medals_by_cats.Summer + medals_by_cats.Winter
medals_by_cats.sort_values('Total', ascending=False, inplace = True)
medals_by_cats
'''
Summer Winter Men Women Total
Country
United States 4585 653 3618 1620 5238
Soviet Union 2049 440 1807 682 2489
United Kingdom 1720 79 1470 329 1799
Germany 1305 360 1119 546 1665
France 1396 152 1351 197 1548
... ... ... ... ... ...
Iraq 1 0 1 0 1
Senegal 1 0 1 0 1
Macedonia 1 0 1 0 1
Mauritius 1 0 1 0 1
Guatemala 1 0 1 0 1
145 rows × 5 columns
'''
medals_by_cats.sort_values('Total', ascending = False, inplace = True)
medals_by_cats
'''
Summer Winter Men Women Total
Country
United States 4585 653 3618 1620 5238
Soviet Union 2049 440 1807 682 2489
United Kingdom 1720 79 1470 329 1799
Germany 1305 360 1119 546 1665
France 1396 152 1351 197 1548
... ... ... ... ... ...
Barbados 1 0 1 0 1
Cyprus 1 0 1 0 1
Virgin Islands* 1 0 1 0 1
Bermuda* 1 0 1 0 1
Guatemala 1 0 1 0 1
145 rows × 5 columns
'''
ranks = medals_by_cats.rank(ascending=False, method = 'average')
ranks
'''
Summer Winter Men Women Total
Country
United States 1.0 1.0 1.0 1.0 1.0
Soviet Union 2.0 4.0 2.0 2.0 2.0
United Kingdom 3.0 19.0 3.0 11.0 3.0
Germany 5.0 7.0 7.0 4.0 4.0
France 4.0 14.0 4.0 16.5 5.0
... ... ... ... ... ...
Barbados 132.5 95.5 127.0 120.0 133.5
Cyprus 132.5 95.5 127.0 120.0 133.5
Virgin Islands* 132.5 95.5 127.0 120.0 133.5
Bermuda* 132.5 95.5 127.0 120.0 133.5
Guatemala 132.5 95.5 127.0 120.0 133.5
145 rows × 5 columns
'''
# 총 메달수 상위 50개 국을 다루는 게 좋은데
# 이유는 하위 그룹에서는 한두명의 특출나게 잘 하는 선수에 의해서 순위가 왜곡될 수 있기 때문이다.
top_50 = ranks.head(50)
plt.figure(figsize = (50,10))
sns.heatmap(top_50.T, cmap = 'RdYlGn_r', annot = True, fmt = '2.0f')
plt.show()
## Summer Games vs. Winter Games - does Geographical Location matter?
top_50.sort_values('Summer')
top_50.sort_values('Winter')
plt.figure(figsize = (50,10))
sns.heatmap(top_50.sort_values(by = 'Summer').T, cmap = 'RdYlGn_r', annot = True, fmt = '2.0f')
plt.show()
plt.figure(figsize = (50,10))
sns.heatmap(top_50.sort_values(by = 'Winter').T, cmap = 'RdYlGn_r',annot = True, fmt = '2.0f')
plt.show()
top_50.Summer.sub(top_50.Winter).sort_values()
'''
Country
Brazil -70.5
Cuba -68.5
Argentina -63.5
India -60.5
Greece -58.5
Jamaica -56.5
Pakistan -55.5
South Africa -52.0
Mexico -52.0
Kenya -50.5
Turkey -49.5
Nigeria -48.5
Uruguay -47.5
Iran -46.5
Romania -25.5
Hungary -22.0
Australia -20.0
Denmark -18.5
Spain -18.5
United Kingdom -16.0
Yugoslavia -11.0
New Zealand -10.5
France -10.0
Japan -8.0
Bulgaria -8.0
China -6.0
Netherlands -5.0
Italy -5.0
Poland -4.0
Belgium -3.0
Soviet Union -2.0
Germany -2.0
East Germany -1.0
United States 0.0
Korea, South 1.0
Sweden 3.0
Russia 4.0
Ukraine 4.5
West Germany 5.0
Unified Team of Germany 7.0
Croatia 9.5
Unified Team 11.0
Canada 13.0
Norway 14.0
Belarus 15.0
Finland 17.0
Czechoslovakia 17.0
Switzerland 20.0
Austria 29.0
Czech Republic 31.0
dtype: float64
'''
rank_diff= top_50.Summer.sub(top_50.Winter).sort_values().to_frame()
rank_diff
'''
0
Country
Brazil -70.5
Cuba -68.5
Argentina -63.5
India -60.5
Greece -58.5
Jamaica -56.5
Pakistan -55.5
South Africa -52.0
Mexico -52.0
Kenya -50.5
Turkey -49.5
Nigeria -48.5
Uruguay -47.5
Iran -46.5
Romania -25.5
Hungary -22.0
Australia -20.0
Denmark -18.5
Spain -18.5
United Kingdom -16.0
Yugoslavia -11.0
New Zealand -10.5
France -10.0
Japan -8.0
Bulgaria -8.0
China -6.0
Netherlands -5.0
Italy -5.0
Poland -4.0
Belgium -3.0
Soviet Union -2.0
Germany -2.0
East Germany -1.0
United States 0.0
Korea, South 1.0
Sweden 3.0
Russia 4.0
Ukraine 4.5
West Germany 5.0
Unified Team of Germany 7.0
Croatia 9.5
Unified Team 11.0
Canada 13.0
Norway 14.0
Belarus 15.0
Finland 17.0
Czechoslovakia 17.0
Switzerland 20.0
Austria 29.0
Czech Republic 31.0
'''
plt.figure(figsize = (35,5))
sns.heatmap(rank_diff.T, cmap = 'RdBu', annot = True, fmt = '2.0f', center = 0)
plt.show()
## Men vs. Women - does Culture & Religion matter?
plt.figure(figsize = (50, 10))
sns.heatmap(top_50.sort_values(by = 'Men').T, cmap = 'RdYlGn_r', annot = True, fmt = '2.0f')
plt.show()
plt.figure(figsize = (50,10))
sns.heatmap(top_50.sort_values(by = 'Women').T, cmap = 'RdYlGn_r', annot = True, fmt = '2.0f')
plt.show()
rank_diff2 = top_50.Men.sub(top_50.Women).sort_values().to_frame()
'''
0
Country
Pakistan -83.0
Uruguay -74.0
Iran -72.0
India -37.5
Belgium -22.0
Croatia -15.5
Turkey -13.5
Yugoslavia -13.0
Italy -13.0
Switzerland -13.0
France -12.5
Finland -12.0
Sweden -10.5
Denmark -9.5
Czechoslovakia -9.0
United Kingdom -8.0
Kenya -7.5
Hungary -6.0
Mexico -5.0
Poland -5.0
Norway -5.0
Unified Team of Germany -4.0
Austria -4.0
New Zealand -3.0
Spain -1.0
West Germany -1.0
United States 0.0
Soviet Union 0.0
Japan 0.0
South Africa 1.0
Cuba 1.0
Argentina 1.0
Canada 2.0
Czech Republic 2.0
Greece 2.0
Nigeria 2.5
Germany 3.0
Netherlands 5.0
Australia 7.0
East Germany 9.0
Brazil 9.0
Russia 10.0
Bulgaria 11.0
Unified Team 13.0
Korea, South 13.5
Ukraine 15.0
Belarus 15.5
Jamaica 16.5
Romania 18.0
China 27.0
'''
plt.figure(figsize = (35,5))
sns.heatmap(rank_diff2.T, cmap = 'RdBu_r', annot = True, fmt = '2.0f', center = 0)
plt.show()
## Do Traditions matter?
olympics.Sport.value_counts()
'''
Aquatics 4170
Athletics 3637
Rowing 2667
Gymnastics 2307
Skiing 1781
Fencing 1613
Ice Hockey 1563
Football 1497
Hockey 1422
Skating 1256
Wrestling 1211
Shooting 1150
Sailing 1109
Cycling 1105
Basketball 1012
Canoe / Kayak 1002
Volleyball 994
Handball 973
Equestrian 939
Boxing 894
Weightlifting 591
Judo 491
Bobsleigh 452
Biathlon 420
Baseball 335
Archery 329
Tennis 296
Rugby 192
Softball 180
Luge 180
Modern Pentathlon 180
Curling 172
Badminton 144
Table Tennis 144
Taekwondo 112
Tug of War 94
Canoe 81
Polo 66
Lacrosse 59
Golf 30
Triathlon 24
Cricket 24
Rackets 10
Croquet 8
Water Motorsports 5
Basque Pelota 4
Roque 3
Jeu de paume 3
Name: Sport, dtype: int64
'''
sports = olympics.Sport.value_counts().index
olympics.Country.value_counts().head(20)
'''
United States 5238
Soviet Union 2489
United Kingdom 1799
Germany 1665
France 1548
Italy 1488
Sweden 1477
Canada 1274
Australia 1204
Hungary 1091
Russia 1031
Norway 1011
East Germany 987
Netherlands 973
Finland 890
China 889
Japan 851
Switzerland 665
Romania 642
Korea, South 616
Name: Country, dtype: int64
'''
top_20 = olympics.Country.value_counts().head(20).index
pd.crosstab(olympics.Country, olympics.Sport)
'''
Sport Aquatics Archery Athletics Badminton Baseball Basketball Basque Pelota Biathlon Bobsleigh Boxing ... Softball Table Tennis Taekwondo Tennis Triathlon Tug of War Volleyball Water Motorsports Weightlifting Wrestling
Country
Afghanistan 0 0 0 0 0 0 0 0 0 0 ... 0 0 2 0 0 0 0 0 0 0
Algeria 0 0 7 0 0 0 0 0 0 6 ... 0 0 0 0 0 0 0 0 0 0
Argentina 3 0 5 0 0 24 0 0 0 24 ... 0 0 1 6 0 0 12 0 2 0
Armenia 0 0 0 0 0 0 0 0 0 1 ... 0 0 0 0 0 0 0 0 4 6
Australasia 11 0 1 0 0 0 0 0 0 1 ... 0 0 0 1 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
West Germany 62 0 67 0 0 0 0 14 22 6 ... 0 0 0 3 0 0 0 0 7 9
West Indies Federation 0 0 5 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
Yugoslavia 91 0 2 0 0 96 0 0 0 11 ... 0 4 0 0 0 0 24 0 0 16
Zambia 0 0 1 0 0 0 0 0 0 1 ... 0 0 0 0 0 0 0 0 0 0
Zimbabwe 7 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
145 rows × 48 columns
'''
by_sport = pd.crosstab(olympics.Country, olympics.Sport)
by_sport = by_sport.rank(ascending=False, method = 'average')
plt.figure(figsize = (30,10))
sns.heatmap(by_sport, cmap = 'RdYlGn_r', vmin = 1, vmax = 6, linewidth = 1)
plt.show()
300x250
'개발일지 > 임시카테고리' 카테고리의 다른 글
SQL 기초9 수치함수 ceiling, floor, rand, power, sqrt 등 (0) | 2022.08.23 |
---|---|
python 파이썬 기초1 난수(random) 과 seed 활용, 정규분포 (0) | 2022.08.22 |
SQL 기초8 기초중의 기초 **** SQL 구문의 실행 순서 ***** (0) | 2022.08.20 |
SQL 기초7 문자열 연결 연산자 (concat 포함), IS NULL (0) | 2022.08.20 |
SQL 기초 6 문자열형 리스트 (0) | 2022.08.20 |