LoL Optimization

June 28th, 2023

Background

In the game, two teams of five players battle in player-versus-player combat, each team occupying and defending their half of the map. Each of the ten players controls a character, known as a "champion", with unique abilities and differing styles of play. During a match, champions become more powerful by collecting experience points, earning gold, and purchasing items to defeat the opposing team. In League's main mode, Summoner's Rift, a team wins by pushing through to the enemy base and destroying their "Nexus", a large structure located within. (wikipedia)

The two teams are the red team and the blue team.

In League there are different "tiers" of players called "leagues". New and unskilled players (sorry) are in Bronze league while relatively skilled players are in Diamond league.

*we will refer to a player in diamond league as a diamond player and a player in bronze league as a bronze player

The Goal

Investigate the differences between bronze players and diamond players. Presumably, bronze players and diamond players will play the game somewhat differently due to their relative skill difference. However, we dont expect there to be huge differences since bronze players may try to imitate diamond players in order to get better at the gane.

We want to know

Are there are any systematic differences in the way bronze players and diamond players approach the game?
Do bronze league games and diamond league games play out the same?

Data

There are 8 provided datasets. Each row of each dataset records information from one match up to a certain point time (15, 20, 25, or 30 minutes)

For example

timeline_DIAMOND_15.csv contains match data up to 15 minutes into the game for diamond players.
timeline_BRONZE_30.csv contains match data up to 30 minutes into the game for bronze players.

We will be using all datasets to look at differences between bronze and diamond players over the course of the game.

Task

Using the provided data we will compare the gameplay of bronze and diamond players at five stages of the game.

Stage 1 is the 15 minute mark
Stage 2 is the 20 minute mark
Stage 3 is the 25 minute mark
Stage 4 is the 30 minute mark

We will try to discover how features differ between the two groups and which features are important to predicting the winner of the game.

# standard imports

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

import warnings

warnings.filterwarnings('ignore')

from sklearn import metrics

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA

from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier

from sklearn.metrics import roc_curve, roc_auc_score

# score classifiers

from sklearn.preprocessing import OneHotEncoder

def brier_score(targets, probs):

enc = OneHotEncoder()

target_enc = enc.fit_transform(np.array(targets).reshape(-1, 1)).toarray()

return np.mean(np.sum((probs - target_enc)**2, axis=1))

def log_score(targets, probs):

enc = OneHotEncoder()

target_enc = enc.fit_transform(np.array(targets).reshape(-1, 1)).toarray()

return -np.mean(np.sum(target_enc * np.log(probs + 1e-32), axis=1))

Exploratory Analysis

Lets assume we dont know anything about league. We have never played the game, never seen anyone play it, and maybe we haven't even heard of it. Thats okay, we can still analyze the data and try to learn something.

First things first, we want to do some exploratory analysis. This will help us identify consistent trends between bronze and diamond players and become familiar with the data.

Through a little investigation you find out there are some key metrics people keep track of to tell whos currently winning. We will look at three of them here.

The amount of "xp" earned by each team. A team with more "xp" is typically stronger than a team with less "xp". Basically more xp is better.
The amount of "gold" earned by each team. A team with more gold typically has better equipment than a team with less gold. Basically more gold is better.
The number of "wards" placed by each team. A team with more wards can react to the opposing team better. Basically more wards is better.

Using the data available up to 15 minutes in the game, compare the distribution of xp, gold, and wards for bronze and diamond players.

Create 9 subfigures in a 3x3 grid (done for you). Fill in the provided template with the appropriate variables.
The column names (variables) you will need are: blue_gold, red_gold, gold_diff, blue_xp, red_xp, xp_diff, blue_ward_placed, red_ward_placed, ward_placed_diff. These variables record the amount of gold, xp, and wards each team (red or blue) has and the difference.
In each subfigure plot two histograms: one for diamond players and one for bronze players.

#bronze15 contains bronze league match data up to 15 minutes into the game

bronze15 = pd.read_csv('data/timeline_BRONZE_15.csv', index_col = 0)

#diamond15 contains diamond league match data up to 15 minutes into the game

diamond15 = pd.read_csv('data/timeline_DIAMOND_15.csv', index_col = 0)

# check column names (should be the same for both datasets)

print(bronze15.columns)

print(diamond15.columns)

fig, ax = plt.subplots(3, 3, constrained_layout = True, figsize = (12, 8))

# gold histograms

ax[0,0].set_title('Blue team gold', fontsize = 15)

ax[0,0].hist(diamond15["blue_gold"],alpha=0.5, label='diamond')

ax[0,0].hist(bronze15["blue_gold"],alpha=0.5, label='bronze')

ax[0,0].legend()

ax[0,1].set_title('Red team gold', fontsize = 15)

ax[0,1].hist(diamond15["red_gold"],alpha=0.5, label='diamond')

ax[0,1].hist(bronze15["red_gold"],alpha=0.5, label='bronze')

ax[0,1].legend()

ax[0,2].set_title('Gold difference', fontsize = 15)

ax[0,2].hist(diamond15["gold_diff"],alpha=0.5, label='diamond')

ax[0,2].hist(bronze15["gold_diff"],alpha=0.5, label='bronze')

ax[0,2].legend()

# XP histograms

ax[1,0].set_title('Blue team XP', fontsize = 15)

ax[1,0].hist(diamond15["blue_xp"],alpha=0.5, label='diamond')

ax[1,0].hist(bronze15["blue_xp"],alpha=0.5, label='bronze')

ax[1,0].legend()

ax[1,1].set_title('Red team XP', fontsize = 15)

ax[1,1].hist(diamond15["red_xp"],alpha=0.5, label='diamond')

ax[1,1].hist(bronze15["red_xp"],alpha=0.5, label='bronze')

ax[1,1].legend()

ax[1,2].set_title('XP difference', fontsize = 15)

ax[1,2].hist(diamond15["xp_diff"],alpha=0.5, label='diamond')

ax[1,2].hist(bronze15["xp_diff"],alpha=0.5, label='bronze')

ax[1,2].legend()

# Ward histograms

ax[2,0].set_title('Blue team wards', fontsize = 15)

ax[2,0].hist(diamond15["blue_ward_placed"],alpha=0.5, label='diamond')

ax[2,0].hist(bronze15["blue_ward_placed"],alpha=0.5, label='bronze')

ax[2,0].legend()

ax[2,1].set_title('Red team wards', fontsize = 15)

ax[2,1].hist(diamond15["red_ward_placed"],alpha=0.5, label='diamond')

ax[2,1].hist(bronze15["red_ward_placed"],alpha=0.5, label='bronze')

ax[2,1].legend()

ax[2,2].set_title('Ward difference', fontsize = 15)

ax[2,2].hist(diamond15["ward_placed_diff"],alpha=0.5, label='diamond')

ax[2,2].hist(bronze15["ward_placed_diff"],alpha=0.5, label='bronze')

ax[2,2].legend()

plt.show()

import statistics as st

averageMetrics = pd.DataFrame(index=["diamond15","bronze15"],columns=["blue_gold","red_gold","gold_diff","blue_xp","red_xp","xp_diff","blue_ward_placed","red_ward_placed","ward_placed_diff"])

diaBlueGoldAvg = np.round(st.mean(diamond15["blue_gold"]),3)

averageMetrics.iloc[0,0] = diaBlueGoldAvg

diaRedGoldAvg = np.round(st.mean(diamond15["red_gold"]),3)

averageMetrics.iloc[0,1] = diaRedGoldAvg

diaGoldDiffAvg = np.round(st.mean(diamond15["gold_diff"]),3)

averageMetrics.iloc[0,2] = diaGoldDiffAvg

broBlueGoldAvg = np.round(st.mean(bronze15["blue_gold"]),3)

averageMetrics.iloc[1,0] = broBlueGoldAvg

broRedGoldAvg = np.round(st.mean(bronze15["red_gold"]),3)

averageMetrics.iloc[1,1] = broRedGoldAvg

broGoldDiffAvg = np.round(st.mean(bronze15["gold_diff"]),3)

averageMetrics.iloc[1,2] = broGoldDiffAvg

diaBlueXpdAvg = np.round(st.mean(diamond15["blue_xp"]),3)

averageMetrics.iloc[0,3] = diaBlueXpdAvg

diaRedXpdAvg = np.round(st.mean(diamond15["red_xp"]),3)

averageMetrics.iloc[0,4] = diaRedXpdAvg

diaXpDiffAvg = np.round(st.mean(diamond15["xp_diff"]),3)

averageMetrics.iloc[0,5] = diaXpDiffAvg

broBlueXpAvg = np.round(st.mean(bronze15["blue_xp"]),3)

averageMetrics.iloc[1,3] = broBlueXpAvg

broRedXpAvg = np.round(st.mean(bronze15["red_xp"]),3)

averageMetrics.iloc[1,4] = broRedXpAvg

broXpDiffAvg = np.round(st.mean(bronze15["xp_diff"]),3)

averageMetrics.iloc[1,5] = broXpDiffAvg

diaBlueWardAvg = np.round(st.mean(diamond15["blue_ward_placed"]),3)

averageMetrics.iloc[0,6] = diaBlueWardAvg

diaRedWardAvg = np.round(st.mean(diamond15["red_ward_placed"]),3)

averageMetrics.iloc[0,7] = diaRedWardAvg

diaWardDiffAvg = np.round(st.mean(diamond15["ward_placed_diff"]),3)

averageMetrics.iloc[0,8] = diaWardDiffAvg

broBlueWardAvg = np.round(st.mean(bronze15["blue_ward_placed"]),3)

averageMetrics.iloc[1,6] = broBlueWardAvg

broRedWardAvg = np.round(st.mean(bronze15["red_ward_placed"]),3)

averageMetrics.iloc[1,7] = broRedWardAvg

broWardDiffAvg = np.round(st.mean(bronze15["ward_placed_diff"]),3)

averageMetrics.iloc[1,8] = broWardDiffAvg

averageMetrics

Overall, diamond players outperform bronze players in every category. You can see this in each plot simply by looking at the histograms, or you can observe this though the difference in means.

For example, the average blue_gold for diamond players is almost 3000 more than bronze players. This is also true for blue_xp, over 1500 more, and blue_ward_placed with over 2 more on average.

There is no significant difference between the red in blue team for either diamond or bronze. Blue is better on average for gold but red is better for xp (for diamond players at least). There is insufficient evidence to prove otherwise.

We suspect that the amount of gold earned may be the most important variable. Lets compare the gold distribution across match lengths for bronze and diamond players.

Using the data available up to 15, 20, 25 and 30 minutes into the game, compare the distribution of gold for bronze and diamond players.

Create 12 subfigures in a 4x3 grid (done for you). Fill in the provided template with the appropriate variables.
The column names (variables) you will need are: blue_gold, red_gold, gold_diff
In each subfigure plot two histograms: one for diamond players and one for bronze players. Make each histogram transparent (alpha = 0.5) since they will overlap. Label the histograms.
Create table that clearly displays the mean of each histogram for each of the twelve subfigures. The table should be 12x3, each row is a subfigure, column 1 contains the title of the subfigure, column 2 contains the bronze mean, and column 3 contains the diamond mean.

diamond15 = pd.read_csv('data/timeline_DIAMOND_15.csv', index_col = 0)

diamond20 = pd.read_csv('data/timeline_DIAMOND_20.csv', index_col = 0)

diamond25 = pd.read_csv('data/timeline_DIAMOND_25.csv', index_col = 0)

diamond30 = pd.read_csv('data/timeline_DIAMOND_30.csv', index_col = 0)

bronze15 = pd.read_csv('data/timeline_BRONZE_15.csv', index_col = 0)

bronze20 = pd.read_csv('data/timeline_BRONZE_20.csv', index_col = 0)

bronze25 = pd.read_csv('data/timeline_BRONZE_25.csv', index_col = 0)

bronze30 = pd.read_csv('data/timeline_BRONZE_30.csv', index_col = 0)

fig, ax = plt.subplots(4, 3, constrained_layout = True, figsize = (12, 10))

averageGoldMetrics = pd.DataFrame(columns=["subfigure","bronze mean","diamond mean"],index=['0','1','2','3','4','5','6','7','8','9','10','11'])

# Gold histograms (15 min)

ax[0,0].set_title('Blue gold (15 min.)')

ax[0,0].hist(diamond15['blue_gold'], alpha=0.5, label='diamond')

ax[0,0].hist(bronze15['blue_gold'], alpha=0.5, label='bronze')

ax[0,0].legend()

averageGoldMetrics.iloc[0,0] = "Blue gold (15 min.)"

averageGoldMetrics.iloc[0,1] = np.round(st.mean(bronze15["blue_gold"]),3)

averageGoldMetrics.iloc[0,2] = np.round(st.mean(diamond15["blue_gold"]),3)

ax[0,1].set_title('Red gold (15 min.)')

ax[0,1].hist(diamond15['red_gold'], alpha=0.5, label='diamond')

ax[0,1].hist(bronze15['red_gold'], alpha=0.5, label='bronze')

ax[0,1].legend()

averageGoldMetrics.iloc[1,0] = "Red gold (15 min.)"

averageGoldMetrics.iloc[1,1] = np.round(st.mean(bronze15["red_gold"]),3)

averageGoldMetrics.iloc[1,2] = np.round(st.mean(diamond15["red_gold"]),3)

ax[0,2].set_title('Gold diff (15 min.)')

ax[0,2].hist(diamond15['gold_diff'], alpha=0.5, label='diamond')

ax[0,2].hist(bronze15['gold_diff'], alpha=0.5, label='bronze')

ax[0,2].legend()

averageGoldMetrics.iloc[2,0] = "Gold diff (15 min.)"

averageGoldMetrics.iloc[2,1] = np.round(st.mean(bronze15["gold_diff"]),3)

averageGoldMetrics.iloc[2,2] = np.round(st.mean(diamond15["gold_diff"]),3)

# Gold histograms (20 min)

ax[1,0].set_title('Blue gold (20 min.)')

ax[1,0].hist(diamond20['blue_gold'], alpha=0.5, label='diamond')

ax[1,0].hist(bronze20['blue_gold'], alpha=0.5, label='bronze')

ax[1,0].legend()

averageGoldMetrics.iloc[3,0] = "Blue gold (20 min.)"

averageGoldMetrics.iloc[3,1] = np.round(st.mean(bronze20["blue_gold"]),3)

averageGoldMetrics.iloc[3,2] = np.round(st.mean(diamond20["blue_gold"]),3)

ax[1,1].set_title('Red gold (20 min.)')

ax[1,1].hist(diamond20['red_gold'], alpha=0.5, label='diamond')

ax[1,1].hist(bronze20['red_gold'], alpha=0.5, label='bronze')

ax[1,1].legend()

averageGoldMetrics.iloc[4,0] = "Red gold (20 min.)"

averageGoldMetrics.iloc[4,1] = np.round(st.mean(bronze20["red_gold"]),3)

averageGoldMetrics.iloc[4,2] = np.round(st.mean(diamond20["red_gold"]),3)

ax[1,2].set_title('Gold diff (20 min.)')

ax[1,2].hist(diamond20['gold_diff'], alpha=0.5, label='diamond')

ax[1,2].hist(bronze20['gold_diff'], alpha=0.5, label='bronze')

ax[1,2].legend()

averageGoldMetrics.iloc[5,0] = "Gold diff (20 min.)"

averageGoldMetrics.iloc[5,1] = np.round(st.mean(bronze20["gold_diff"]),3)

averageGoldMetrics.iloc[5,2] = np.round(st.mean(diamond20["gold_diff"]),3)

# Gold histograms (25 min)

ax[2,0].set_title('Blue gold (25 min.)')

ax[2,0].hist(diamond25['blue_gold'], alpha=0.5, label='diamond')

ax[2,0].hist(bronze25['blue_gold'], alpha=0.5, label='bronze')

ax[2,0].legend()

averageGoldMetrics.iloc[6,0] = "Blue gold (25 min.)"

averageGoldMetrics.iloc[6,1] = np.round(st.mean(bronze25["blue_gold"]),3)

averageGoldMetrics.iloc[6,2] = np.round(st.mean(diamond25["blue_gold"]),3)

ax[2,1].set_title('Red gold (25 min.)')

ax[2,1].hist(diamond25['red_gold'], alpha=0.5, label='diamond')

ax[2,1].hist(bronze25['red_gold'], alpha=0.5, label='bronze')

ax[2,1].legend()

averageGoldMetrics.iloc[7,0] = "Red gold (25 min.)"

averageGoldMetrics.iloc[7,1] = np.round(st.mean(bronze25["red_gold"]),3)

averageGoldMetrics.iloc[7,2] = np.round(st.mean(diamond25["red_gold"]),3)

ax[2,2].set_title('Gold diff (25 min.)')

ax[2,2].hist(diamond25['gold_diff'], alpha=0.5, label='diamond')

ax[2,2].hist(bronze25['gold_diff'], alpha=0.5, label='bronze')

ax[2,2].legend()

averageGoldMetrics.iloc[8,0] = "Gold diff (25 min.)"

averageGoldMetrics.iloc[8,1] = np.round(st.mean(bronze25["gold_diff"]),3)

averageGoldMetrics.iloc[8,2] = np.round(st.mean(diamond25["gold_diff"]),3)

# Gold histograms (30 min)

ax[3,0].set_title('Blue gold (30 min.)')

ax[3,0].hist(diamond30['blue_gold'], alpha=0.5, label='diamond')

ax[3,0].hist(bronze30['blue_gold'], alpha=0.5, label='bronze')

ax[3,0].legend()

averageGoldMetrics.iloc[9,0] = "Blue gold (30 min.)"

averageGoldMetrics.iloc[9,1] = np.round(st.mean(bronze30["blue_gold"]),3)

averageGoldMetrics.iloc[9,2] = np.round(st.mean(diamond30["blue_gold"]),3)

ax[3,1].set_title('Red gold (30 min.)')

ax[3,1].hist(diamond30['red_gold'], alpha=0.5, label='diamond')

ax[3,1].hist(bronze30['red_gold'], alpha=0.5, label='bronze')

ax[3,1].legend()

averageGoldMetrics.iloc[10,0] = "Red gold (30 min.)"

averageGoldMetrics.iloc[10,1] = np.round(st.mean(bronze30["red_gold"]),3)

averageGoldMetrics.iloc[10,2] = np.round(st.mean(diamond30["red_gold"]),3)

ax[3,2].set_title('Gold diff (30 min.)')

ax[3,2].hist(diamond30['gold_diff'], alpha=0.5, label='diamond')

ax[3,2].hist(bronze30['gold_diff'], alpha=0.5, label='bronze')

ax[3,2].legend()

averageGoldMetrics.iloc[11,0] = "Gold diff (30 min.)"

averageGoldMetrics.iloc[11,1] = np.round(st.mean(bronze30["gold_diff"]),3)

averageGoldMetrics.iloc[11,2] = np.round(st.mean(diamond30["gold_diff"]),3)

plt.show()

# create table

averageGoldMetrics

The majority of the histograms convery a pretty congrugent scalar relationship between bronze and diamond players. You can see that overall diamond players average more gold, regardless of the observed duration or team color. Interestingly the gap between bronze and diamond gold seems to shrink the longer the game is played. Blue team in the shorter durations seems to gather more gold, but realistically this is within the margin of error. For the bronze mean for a 15 minute game is within 100 gold for both blue and red (24741 and 24619 respectively).

Maybe looking at winners and losers aggregated together isnt the best idea. Lets see how gold differences vary between winners and losers (over time and between bronze and diamond). Perhaps the gold accumulation and the gold lead (gold diff) is different for bronze and diamond players?

Create 8 subfigures in a 2x4 grid (done for you). Fill in the provided template with the appropriate variables.
The column names (variables) you will need are: blue_win, blue_gold, and gold_diff
In each subfigure plot two histograms: one for diamond players and one for bronze players only in the cases where blue team won (blue_win == 1). Make each histogram transparent (alpha = 0.5) since they will overlap. Label the histograms.
Create table that clearly displays the mean of each histogram for each of the eight subfigures. The table should be 8x3, each row is a subfigure, column 1 contains the title of the subfigure, column 2 contains the bronze mean, and column 3 contains the diamond mean.

diamond15BlueWin = diamond15[diamond15['blue_win']==1]

diamond20BlueWin = diamond20[diamond20['blue_win']==1]

diamond25BlueWin = diamond25[diamond25['blue_win']==1]

diamond30BlueWin = diamond30[diamond30['blue_win']==1]

bronze15BlueWin = bronze15[bronze15['blue_win']==1]

bronze20BlueWin = bronze20[bronze20['blue_win']==1]

bronze25BlueWin = bronze25[bronze25['blue_win']==1]

bronze30BlueWin = bronze30[bronze30['blue_win']==1]

fig, ax = plt.subplots(2, 4, constrained_layout = True, figsize = (14, 5))

averageBlueWinGoldMetrics = pd.DataFrame(columns=["subfigure","bronze mean","diamond mean"],index=['0','1','2','3','4','5','6','7'])

# blue gold distribution (winners only)

ax[0,0].set_title('Winning Blue gold (15 min)')

ax[0,0].hist(diamond15BlueWin['blue_gold'], alpha=0.5, label='diamond')

ax[0,0].hist(bronze15BlueWin['blue_gold'], alpha=0.5, label='bronze')

ax[0,0].legend()

averageBlueWinGoldMetrics.iloc[0,0] = "Winning Blue gold (15 min)"

averageBlueWinGoldMetrics.iloc[0,1] = np.round(st.mean(bronze15BlueWin["blue_gold"]),3)

averageBlueWinGoldMetrics.iloc[0,2] = np.round(st.mean(diamond15BlueWin["blue_gold"]),3)

ax[0,1].set_title('Winning Blue gold (20 min)')

ax[0,1].hist(diamond20BlueWin['blue_gold'], alpha=0.5, label='diamond')

ax[0,1].hist(bronze20BlueWin['blue_gold'], alpha=0.5, label='bronze')

ax[0,1].legend()

averageBlueWinGoldMetrics.iloc[1,0] = "Winning Blue gold (20 min)"

averageBlueWinGoldMetrics.iloc[1,1] = np.round(st.mean(bronze20BlueWin["blue_gold"]),3)

averageBlueWinGoldMetrics.iloc[1,2] = np.round(st.mean(diamond20BlueWin["blue_gold"]),3)

ax[0,2].set_title('Winning Blue gold (25 min)')

ax[0,2].hist(diamond25BlueWin['blue_gold'], alpha=0.5, label='diamond')

ax[0,2].hist(bronze25BlueWin['blue_gold'], alpha=0.5, label='bronze')

ax[0,2].legend()

averageBlueWinGoldMetrics.iloc[2,0] = "Winning Blue gold (25 min)"

averageBlueWinGoldMetrics.iloc[2,1] = np.round(st.mean(bronze25BlueWin["blue_gold"]),3)

averageBlueWinGoldMetrics.iloc[2,2] = np.round(st.mean(diamond25BlueWin["blue_gold"]),3)

ax[0,3].set_title('Winning Blue gold (30 min)')

ax[0,3].hist(diamond30BlueWin['blue_gold'], alpha=0.5, label='diamond')

ax[0,3].hist(bronze30BlueWin['blue_gold'], alpha=0.5, label='bronze')

ax[0,3].legend()

averageBlueWinGoldMetrics.iloc[3,0] = "Winning Blue gold (30 min)"

averageBlueWinGoldMetrics.iloc[3,1] = np.round(st.mean(bronze30BlueWin["blue_gold"]),3)

averageBlueWinGoldMetrics.iloc[3,2] = np.round(st.mean(diamond30BlueWin["blue_gold"]),3)

# gold diff distribution (winners only)

ax[1,0].set_title('Winning Gold diff (15 min)')

ax[1,0].hist(diamond15BlueWin['gold_diff'], alpha=0.5, label='diamond')

ax[1,0].hist(bronze15BlueWin['gold_diff'], alpha=0.5, label='bronze')

ax[1,0].legend()

averageBlueWinGoldMetrics.iloc[4,0] = "Winning Gold diff (15 min)"

averageBlueWinGoldMetrics.iloc[4,1] = np.round(st.mean(bronze15BlueWin["gold_diff"]),3)

averageBlueWinGoldMetrics.iloc[4,2] = np.round(st.mean(diamond15BlueWin["gold_diff"]),3)

ax[1,1].set_title('Winning Gold diff (20 min)')

ax[1,1].hist(diamond20BlueWin['gold_diff'], alpha=0.5, label='diamond')

ax[1,1].hist(bronze20BlueWin['gold_diff'], alpha=0.5, label='bronze')

ax[1,1].legend()

averageBlueWinGoldMetrics.iloc[5,0] = "Winning Gold diff (20 min)"

averageBlueWinGoldMetrics.iloc[5,1] = np.round(st.mean(bronze20BlueWin["gold_diff"]),3)

averageBlueWinGoldMetrics.iloc[5,2] = np.round(st.mean(diamond20BlueWin["gold_diff"]),3)

ax[1,2].set_title('Winning Gold diff (25 min)')

ax[1,2].hist(diamond25BlueWin['gold_diff'], alpha=0.5, label='diamond')

ax[1,2].hist(bronze25BlueWin['gold_diff'], alpha=0.5, label='bronze')

ax[1,2].legend()

averageBlueWinGoldMetrics.iloc[6,0] = "Winning Gold diff (25 min)"

averageBlueWinGoldMetrics.iloc[6,1] = np.round(st.mean(bronze25BlueWin["gold_diff"]),3)

averageBlueWinGoldMetrics.iloc[6,2] = np.round(st.mean(diamond25BlueWin["gold_diff"]),3)

ax[1,3].set_title('Winning Gold diff (30 min)')

ax[1,3].hist(diamond30BlueWin['gold_diff'], alpha=0.5, label='diamond')

ax[1,3].hist(bronze30BlueWin['gold_diff'], alpha=0.5, label='bronze')

ax[1,3].legend()

averageBlueWinGoldMetrics.iloc[7,0] = "Winning Gold diff (30 min)"

averageBlueWinGoldMetrics.iloc[7,1] = np.round(st.mean(bronze30BlueWin["gold_diff"]),3)

averageBlueWinGoldMetrics.iloc[7,2] = np.round(st.mean(diamond30BlueWin["gold_diff"]),3)

plt.show()

# create table

averageBlueWinGoldMetrics

Once again you can see that overall diamond players on average collect more gold throughout a game, in this example however we are only analyzing blue wins. It is interesting how the gold_diff however doesn't seem to shrink nearly as much as before. We can see that the difference between Winning Gold diff(20 min) between bronze (3776.497) and diamond (3807.515) isn't much at all.

There is a persistent belief that the blue team has an advantage, i.e. blue team wins more often than red team.

For each stage of the game (15, 20, 25, 30 minutes) and for both diamond and bronze players compute the win percentage of the blue team (fraction of times that blue_win == 1).

Plot two lines (label them) indicating the percent of the time blue wins. X-axis is match time (15, 20, 25, 30 minutes) and y-axis is win percentage of blue team.
One line shows the win percentage of blue for diamond players
The other line shows the win percentage of blue for bronze players
Label the axis and title the plot appropriately.

diamondBlueWinRate = [0] * 4

diamondBlueWinRate[0] = np.round(len(diamond15BlueWin)/len(diamond15)*100,2)

diamondBlueWinRate[1] = np.round(len(diamond20BlueWin)/len(diamond20)*100,2)

diamondBlueWinRate[2] = np.round(len(diamond25BlueWin)/len(diamond25)*100,2)

diamondBlueWinRate[3] = np.round(len(diamond30BlueWin)/len(diamond30)*100,2)

bronzeBlueWinRate = [0] * 4

bronzeBlueWinRate[0] = np.round(len(bronze15BlueWin)/len(bronze15)*100,2)

bronzeBlueWinRate[1] = np.round(len(bronze20BlueWin)/len(bronze20)*100,2)

bronzeBlueWinRate[2] = np.round(len(bronze25BlueWin)/len(bronze25)*100,2)

bronzeBlueWinRate[3] = np.round(len(bronze30BlueWin)/len(bronze30)*100,2)

matchTimes = [15,20,25,30]

# plot goes here

plt.plot(matchTimes,diamondBlueWinRate, label='diamond')

plt.plot(matchTimes,bronzeBlueWinRate, label='bronze')

plt.title("Blue Team Win %")

plt.xlabel("Match Times")

plt.ylabel("Blue Win %")

plt.xticks([15,20,25,30])

plt.legend()

plt.show()

Visually on the graph you can see a similiar shape for the bronze and diamond win percentages. Both of them seem to decrease overtime by a percent or so. According to the data, blue team has a slight advantage in bronze but overtime that becomes even smaller. In Diamond the blue team seems to win more just slightly but over longer matches loses more than the red team. In bronze, the blue team seems to win more, this could be a total random error or perhaps it has something to do with the perception of colors (seeing red makes you more aggressive) so perhaps lower ranks are more susceptible to this effect.

Predicting Winners

Now we want to further investigate how bronze and diamond players differ. Since bronze and diamond players have different skill levels, we think their games might be played differently. For example, maybe "xp" and gold are more important to bronze players and "wards" are more important for diamond players.

Lets build some models to predict the winner of a match using the provided match information. We will investigate a few different phenomena

Is it easier to predict the outcome of bronze or diamond league matches?
Do different features determine the winner between bronze and diamond league players?

Lets start with the diamond players. We want to classify if blue team will win (blue_win == 1), given match information like xp, gold, wards, etc., at the 15, 20, 25, and 30 minute marks. I.e. we need 4 classification models.

Import the diamond player data (done for you)
Separate the target variable (blue_win) and the feature matrix (everything else) (done for 15 minutes data, you do the rest).
Fit any classification model you like to each dataset.
- Make sure your model has an out of sample brier < 0.4 and an accuracy > 0.65 on the 15 minute data. We dont want to use bad models!
- You need a variable importance measure, so maybe dont choose nearest neighbors.
- For logistic regression use the absolute value of the coefficients as variable importance.
- For decision trees or random forest use the feature_importance_ score.

diamond15 = pd.read_csv('data/timeline_DIAMOND_15.csv', index_col = 0)

diamond20 = pd.read_csv('data/timeline_DIAMOND_20.csv', index_col = 0)

diamond25 = pd.read_csv('data/timeline_DIAMOND_25.csv', index_col = 0)

diamond30 = pd.read_csv('data/timeline_DIAMOND_30.csv', index_col = 0)

# 15 minutes

x15 = diamond15.drop(['blue_win'], axis=1)

y15 = diamond15.loc[:,['blue_win']]

x15_train, x15_test, y15_train, y15_test = train_test_split(x15, y15, test_size=0.33, random_state=42)

x15_train = np.array(x15_train)

y15_train = np.array(y15_train)

x15_test = np.array(x15_test)

y15_test = np.array(y15_test)

# recommend keeping a consistent naming scheme

# 20 minutes

x20 = diamond20.drop(['blue_win'], axis=1)

y20 = diamond20.loc[:,['blue_win']]

x20_train, x20_test, y20_train, y20_test = train_test_split(x20, y20, test_size=0.33, random_state=42)

x20_train = np.array(x20_train)

y20_train = np.array(y20_train)

x20_test = np.array(x20_test)

y20_test = np.array(y20_test)

# 25 minutes

x25 = diamond25.drop(['blue_win'], axis=1)

y25 = diamond25.loc[:,['blue_win']]

x25_train, x25_test, y25_train, y25_test = train_test_split(x25, y25, test_size=0.33, random_state=42)

x25_train = np.array(x25_train)

y25_train = np.array(y25_train)

x25_test = np.array(x25_test)

y25_test = np.array(y25_test)

# 30 minutes

x30 = diamond30.drop(['blue_win'], axis=1)

y30 = diamond30.loc[:,['blue_win']]

x30_train, x30_test, y30_train, y30_test = train_test_split(x30, y30, test_size=0.33, random_state=42)

x30_train = np.array(x30_train)

y30_train = np.array(y30_train)

x30_test = np.array(x30_test)

y30_test = np.array(y30_test)

# define and fit models

from sklearn.metrics import accuracy_score

# 15 minutes

lda15 = LDA()

lda15.fit(x15_train, y15_train)

# 20 minutes

lda20 = LDA()

lda20.fit(x20_train, y20_train)

# 25 minutes

lda25 = LDA()

lda25.fit(x25_train, y25_train)

# 30 minutes

lda30 = LDA()

lda30.fit(x30_train, y30_train)

y15_hat = lda15.predict_proba(x15_test)

lda15Brier = np.round(brier_score(y15_test, y15_hat),3)

print("diamond15 LDA Brier Score:", lda15Brier)

p15_hat = lda15.predict(x15_test)

lda15Acc = np.round(accuracy_score(y15_test,p15_hat),3)

print("diamond15 LDA Accuracy Score:", lda15Acc)

diamond15 LDA Brier Score: 0.322

diamond15 LDA Accuracy Score: 0.761

Compute the brier score and accuracy of each model (4 models in total) on the test set.

Compute and print the brier score and accruacy of each model (round to 3 decimal places)
Display the brier score and accuracy in a single table. The table should be 4x3. Each row is a time point (15, 20, 25, or 30 minutes). Column 1 is the time point as a string, column 2 is the brier score, column 3 is the accuracy. For example row 1 might look like 15 Minutes, 0.351, 0.825.
We might naively expect that its easier to predict the winner the longer the match goes on. Comment on: Which time period is the hardest to predict? Which time period is the easiest to predict? Do matches become more predictable (better scores) over time? Use the computed Brier and accuracy scores to inform your response.

# predict if blue wins

y15_hat = lda15.predict_proba(x15_test)

lda15Brier = np.round(brier_score(y15_test, y15_hat),3)

print("diamond15 LDA Brier Score:", lda15Brier)

y20_hat = lda20.predict_proba(x20_test)

lda20Brier = np.round(brier_score(y20_test, y20_hat),3)

print("diamond20 LDA Brier Score:", lda20Brier)

y25_hat = lda25.predict_proba(x25_test)

lda25Brier = np.round(brier_score(y25_test, y25_hat),3)

print("diamond25 LDA Brier Score:", lda25Brier)

y30_hat = lda30.predict_proba(x30_test)

lda30Brier = np.round(brier_score(y30_test, y30_hat),3)

print("diamond30 LDA Brier Score:", lda30Brier,'\n')

# predict probabilities of blue winning and losing

p15_hat = lda15.predict(x15_test)

lda15Acc = np.round(accuracy_score(y15_test,p15_hat),3)

print("diamond15 LDA Accuracy Score:", lda15Acc)

p20_hat = lda20.predict(x20_test)

lda20Acc = np.round(accuracy_score(y20_test,p20_hat),3)

print("diamond20 LDA Accuracy Score:", lda20Acc)

p25_hat = lda25.predict(x25_test)

lda25Acc = np.round(accuracy_score(y25_test,p25_hat),3)

print("diamond25 LDA Accuracy Score:", lda25Acc)

p30_hat = lda30.predict(x30_test)

lda30Acc = np.round(accuracy_score(y30_test,p30_hat),3)

print("diamond30 LDA Accuracy Score:", lda30Acc)

# create brier and accuracy table

brierAccScores = pd.DataFrame(columns=["Time Point","Brier Score","Accuracy"],index=['0','1','2','3'])

brierAccScores.iloc[0,0] = "15 minutes"

brierAccScores.iloc[1,0] = "20 minutes"

brierAccScores.iloc[2,0] = "25 minutes"

brierAccScores.iloc[3,0] = "30 minutes"

brierAccScores.iloc[0,1] = lda15Brier

brierAccScores.iloc[1,1] = lda20Brier

brierAccScores.iloc[2,1] = lda25Brier

brierAccScores.iloc[3,1] = lda30Brier

brierAccScores.iloc[0,2] = lda15Acc

brierAccScores.iloc[1,2] = lda20Acc

brierAccScores.iloc[2,2] = lda25Acc

brierAccScores.iloc[3,2] = lda30Acc

brierAccScores

diamond15 LDA Brier Score: 0.322

diamond20 LDA Brier Score: 0.276

diamond25 LDA Brier Score: 0.245

diamond30 LDA Brier Score: 0.267

diamond15 LDA Accuracy Score: 0.761

diamond20 LDA Accuracy Score: 0.8

diamond25 LDA Accuracy Score: 0.826

diamond30 LDA Accuracy Score: 0.809

According to the scores, the hardest time period to predict seems to be 15 minutes, having the highest brier score and lowest accuracy. The 25 minute games seems to be the easiest to predict with the best scores. The matches tend to become easier to predict over time, with the exception of 30 minute games being slightly harder to predict.

Now plot the ROC curve for each model in a single figure. Make sure each line is appropriately labeled.

Create a single ROC curve plot
Compute and print the AUC values for each model.

# compute ROC curves and AUC values

fpr_lda15, tpr_lda15, thresholds = roc_curve(y15_test, y15_hat[:,1])

fpr_lda20, tpr_lda20, thresholds = roc_curve(y20_test, y20_hat[:,1])

fpr_lda25, tpr_lda25, thresholds = roc_curve(y25_test, y25_hat[:,1])

fpr_lda30, tpr_lda30, thresholds = roc_curve(y30_test, y30_hat[:,1])

print('15 Minutes:', np.round(roc_auc_score(y15_test, y15_hat[:,1]), 3))

print('20 Minutes:', np.round(roc_auc_score(y20_test, y20_hat[:,1]), 3))

print('25 Minutes:', np.round(roc_auc_score(y25_test, y25_hat[:,1]), 3))

print('30 Minutes:', np.round(roc_auc_score(y30_test, y30_hat[:,1]), 3))

# plot ROC curves

plt.plot(fpr_lda15, tpr_lda15, label = '15')

plt.plot(fpr_lda20, tpr_lda20, label = '20')

plt.plot(fpr_lda25, tpr_lda25, label = '25')

plt.plot(fpr_lda30, tpr_lda30, label = '30')

plt.title("ROC Curves")

plt.xlabel('False Positive Rate', fontsize = 15)

plt.ylabel('True Positive Rate', fontsize = 15)

plt.legend()

plt.show()

15 Minutes: 0.844

20 Minutes: 0.885

25 Minutes: 0.909

30 Minutes: 0.892

Based on the AUC values, 15 minutes is the hardest to predict while 25 minutes is the easiest to predict. This makes sense as it correlates with the outcome expected from the brier/accuracy scores. These findings corroborate my findings in part b. The matches become more predicatble over time, except with a slight decrease in the 30 minute range.

Now we want to know which features are important for predicting the winner.

Print the feature importance of each feature in a table format. Each row should include the feature name and the importance score for each model. Sort this table by the feature importances for the 15 minute mark model.
If you used logistic regression use the coefficients (coefs_) as the importance measure
If you used decision trees or random forests use the feature importance score (feature_importances_) as the importance measure
Comment on: What are the top 5 most important features for predicting the winner of the game at the 15, 20, 25, and 30 minute marks of the match. Are these variables the same? Do any features become more or less important over time? Briefly argue these points, a simple "yes" or "no" is insufficient.

# create table here

featureImportance = pd.DataFrame(columns=["Feature Name", "15 - Importance", "20 - Importance", "25 - Importance", "30 - Importance"])

featureImportance["Feature Name"] = x15.columns

featureImportance["15 - Importance"] = abs(lda15.coef_[0])

featureImportance["20 - Importance"] = abs(lda20.coef_[0])

featureImportance["25 - Importance"] = abs(lda25.coef_[0])

featureImportance["30 - Importance"] = abs(lda30.coef_[0])

featureImportance = featureImportance.sort_values('15 - Importance')

featureImportance

The 5 most important features for a 15 minute game are:

first_inhibitor, blue_hextech, first_turret, red_earth, and red_fire.

For the 20 minute game:

blue_inhibitors, first_inhibitor, inhibtor_diff, red_inhibitors, and red_earth

25 minute game:

first_inhibitor, water, red_water, red_earth, and blue_earth

And lastly for the 30 minute time period:

first_inhibitor, blue_earth, red_water, hextech, blue_fire

These features are not the same through out every model. First_inhibitor and earth are consistently in the top 5 while hextech importance fades out over time period duration.

Now lets build models for the bronze players. Use the same model as you did for diamond players so that the results are comparable. We'll skip through a bit this time. By "same" I mean if you used logistic regression before then use it again. You of course have to refit the models to the bronze data. You should again have 4 models.

Compute the brier score and accuracy of each model (4 models in total) on the test set.
Display the brier score and accuracy in a single table. The table should be 4x3. Each row is a time point (15, 20, 25, or 30 minutes). Column 1 is the time point as a string, column 2 is the brier score, column 3 is the accuracy.
Comment on: Do matches get easier to predict over time? Also are the brier scores for bronze players lower or higher than diamond on average? I.e. is it easier to predict the outcome of a bronze game or a diamond game?

bronze15 = pd.read_csv('data/timeline_BRONZE_15.csv', index_col = 0)

bronze20 = pd.read_csv('data/timeline_BRONZE_20.csv', index_col = 0)

bronze25 = pd.read_csv('data/timeline_BRONZE_25.csv', index_col = 0)

bronze30 = pd.read_csv('data/timeline_BRONZE_30.csv', index_col = 0)

# 15 minutes

x15 = bronze15.drop(['blue_win'], axis=1)

y15 = bronze15.loc[:,['blue_win']]

x15_train, x15_test, y15_train, y15_test = train_test_split(x15, y15, test_size=0.33, random_state=42)

x15_train = np.array(x15_train)

y15_train = np.array(y15_train)

x15_test = np.array(x15_test)

y15_test = np.array(y15_test)

# recommend keeping a consistent naming scheme

# 20 minutes

x20 = bronze20.drop(['blue_win'], axis=1)

y20 = bronze20.loc[:,['blue_win']]

x20_train, x20_test, y20_train, y20_test = train_test_split(x20, y20, test_size=0.33, random_state=42)

x20_train = np.array(x20_train)

y20_train = np.array(y20_train)

x20_test = np.array(x20_test)

y20_test = np.array(y20_test)

# 25 minutes

x25 = bronze25.drop(['blue_win'], axis=1)

y25 = bronze25.loc[:,['blue_win']]

x25_train, x25_test, y25_train, y25_test = train_test_split(x25, y25, test_size=0.33, random_state=42)

x25_train = np.array(x25_train)

y25_train = np.array(y25_train)

x25_test = np.array(x25_test)

y25_test = np.array(y25_test)

# 30 minutes

x30 = bronze30.drop(['blue_win'], axis=1)

y30 = bronze30.loc[:,['blue_win']]

x30_train, x30_test, y30_train, y30_test = train_test_split(x30, y30, test_size=0.33, random_state=42)

x30_train = np.array(x30_train)

y30_train = np.array(y30_train)

x30_test = np.array(x30_test)

y30_test = np.array(y30_test)

# define and fit models

# 15 minutes

lda15 = LDA()

lda15.fit(x15_train, y15_train)

# 20 minutes

lda20 = LDA()

lda20.fit(x20_train, y20_train)

# 25 minutes

lda25 = LDA()

lda25.fit(x25_train, y25_train)

# 30 minutes

lda30 = LDA()

lda30.fit(x30_train, y30_train)

# predict if blue wins

y15_hat = lda15.predict_proba(x15_test)

lda15Brier = np.round(brier_score(y15_test, y15_hat),3)

print("bronze15 LDA Brier Score:", lda15Brier)

y20_hat = lda20.predict_proba(x20_test)

lda20Brier = np.round(brier_score(y20_test, y20_hat),3)

print("bronze20 LDA Brier Score:", lda20Brier)

y25_hat = lda25.predict_proba(x25_test)

lda25Brier = np.round(brier_score(y25_test, y25_hat),3)

print("bronze25 LDA Brier Score:", lda25Brier)

y30_hat = lda30.predict_proba(x30_test)

lda30Brier = np.round(brier_score(y30_test, y30_hat),3)

print("bronze30 LDA Brier Score:", lda30Brier,'\n')

# predict probabilities of blue winning and losing

p15_hat = lda15.predict(x15_test)

lda15Acc = np.round(accuracy_score(y15_test,p15_hat),3)

print("bronze15 LDA Accuracy Score:", lda15Acc)

p20_hat = lda20.predict(x20_test)

lda20Acc = np.round(accuracy_score(y20_test,p20_hat),3)

print("bronze20 LDA Accuracy Score:", lda20Acc)

p25_hat = lda25.predict(x25_test)

lda25Acc = np.round(accuracy_score(y25_test,p25_hat),3)

print("bronze25 LDA Accuracy Score:", lda25Acc)

p30_hat = lda30.predict(x30_test)

lda30Acc = np.round(accuracy_score(y30_test,p30_hat),3)

print("bronze30 LDA Accuracy Score:", lda30Acc)

# create brier and accuracy table

brierAccScores = pd.DataFrame(columns=["Time Point","Brier Score","Accuracy"],index=['0','1','2','3'])

brierAccScores.iloc[0,0] = "15 minutes"

brierAccScores.iloc[1,0] = "20 minutes"

brierAccScores.iloc[2,0] = "25 minutes"

brierAccScores.iloc[3,0] = "30 minutes"

brierAccScores.iloc[0,1] = lda15Brier

brierAccScores.iloc[1,1] = lda20Brier

brierAccScores.iloc[2,1] = lda25Brier

brierAccScores.iloc[3,1] = lda30Brier

brierAccScores.iloc[0,2] = lda15Acc

brierAccScores.iloc[1,2] = lda20Acc

brierAccScores.iloc[2,2] = lda25Acc

brierAccScores.iloc[3,2] = lda30Acc

brierAccScores

bronze15 LDA Brier Score: 0.321

bronze20 LDA Brier Score: 0.277

bronze25 LDA Brier Score: 0.261

bronze30 LDA Brier Score: 0.254

bronze15 LDA Accuracy Score: 0.761

bronze20 LDA Accuracy Score: 0.801

bronze25 LDA Accuracy Score: 0.81

bronze30 LDA Accuracy Score: 0.816

The matches get easier to predict as the time increases. The brier scores are lower on average for the bronze players. By this it would be easier to predict a bronze game vs a diamond game.

Print the feature importance of each feature in a table format. Each row should include the feature name and the importance score for each model. Sort by the feature importances for the 15 minute mark model.

# create table here

featureImportance = pd.DataFrame(columns=["Feature Name", "15 - Importance", "20 - Importance", "25 - Importance", "30 - Importance"])

featureImportance["Feature Name"] = x15.columns

featureImportance["15 - Importance"] = abs(lda15.coef_[0])

featureImportance["20 - Importance"] = abs(lda20.coef_[0])

featureImportance["25 - Importance"] = abs(lda25.coef_[0])

featureImportance["30 - Importance"] = abs(lda30.coef_[0])

featureImportance = featureImportance.sort_values('15 - Importance')

featureImportance

The 5 most important features for a 15 minute game are:

first_inhibitor, red_inhibitors, inhibitors_diff, blue_inhibitors, red_heralds

For the 20 minute game:

first_inhibitor, red_inhibitors, inhibitors_diff, blue_inhibitors, red_fire

25 minute game:

earth, fire, first_inhibitor, blue_fire, hextech

And lastly for the 30 minute time period:

fire, first_inhibitor, earth, water, air

These features are not the same through out every model. First_inhibitor is consistently in the top 5 while the more specific inhibitor feature importance fades out over time period duration.

CONCLUSION

Some of the features are the same for both diamond and bronze players, however there were some differences. Earth and fire played a more significant role for bronze players but inhibitor gain was important for both ranks. The most major similarity was first_inhibitor being the prominent feature in both ranks of players.

Page updated

Google Sites

Report abuse