Luck Causes 61% of Rank Variance in Mario Kart 8 Races
So then, how many races does it take for you to know you’re better than someone else?
This post is structured into three segments. In Part 1, I provide the results of my statistical analysis of 11 CPUs across 48 Mario Kart 8 Games and discuss the significance and limitations of my findings. In Part 2, I provided step-by-step instructions for readers to replicate my analysis in Mario Kart or any other game where luck plays a significant role in success and has an easy display of scores. In Part 3, I explain my understanding of the cool math that allows this analysis to work. Feel free to only read the parts that interest you, and give it a like and hop off when you’re done.
Part 1: Statistics, Significance, Limitations, and TL;DR
My interest in Mario Kart began when, at a hangout with some friends, I suddenly found myself getting hit by three consecutive red shells and catapulting backward from second place to ninth. I was frustrated by my bad luck, but I knew that it should be hard to distinguish between success based on luck and success based on skill. After rethinking what happened, I decided I shouldn’t attribute significant meaning to the outcome of just a few races.
Then, after only 4 races, one of my friends made the bold claim that he was the best, and he didn’t need to play any more because he had already “proven everything he had to prove” with his victory. While his superiority sounded somewhat plausible, I was skeptical of how much he had proven. Can a person really say, with a reasonable level of confidence, that they are the best, after no more than 4 races? How luck-based is Mario Kart, anyway?
Later, while reading Thinking, Fast and Slow, I was reminded of the fact that when we have enough data, we can calculate what percent of success in a domain is caused by luck. For example, some activities might be very luck-based, like poker, so that skill only accounts for a tiny bit of what allows a player to succeed. Meanwhile, other games, such as chess, are extremely skill-based, so luck plays very little part in causing a player to succeed. Perhaps, if I gathered enough data and learned the statistical methods, I could see how luck-based Mario Kart was! So, as any normal teenager would in their last summer before college, I set out to spend hours gathering data and studying math to see if my friend did “prove” everything he claimed to have proven.1
What I Did
To determine how luck-based Mario Kart is, I watched two recordings of 48 consecutive races (one race for each of the 48 courses), each containing its own set of the same 11 CPUs on 200cc with all items enabled. For both recordings, after each race, I recorded the place each of the CPUs received in a spreadsheet2. Then, I calculated the intraclass correlation coefficient (ICC) of each set of games to determine the percentage of rank variance caused by skill, rather than luck. Finally, after averaging my two ICCs, I calculated my margin of error and used my data to determine how many races would be needed to have varying degrees of certainty that the drivers’ final placements reflected their skill level.
Results
Amongst the games analyzed, 39.1%3 of rank variance4 was between drivers5, while factors that were random from game to game, such as track-specific benefits, items, and interactions with other CPUs, accounted for the remaining 60.9% of rank variance. I also found that the accuracy of rankings based on the number of games, y, could be calculated with the following equations:
The second equation is the margin of error bounds; the first equation is the median. Graphs for the equations will appear in the next section.
I also found that the chance that one player is better than another, given the mean difference between the two players’ ranks, d, after n races, is equal to the density under the standard normal distribution of the following equation:
From here, I calculated that my friend only had about a 76% chance of actually being the best, given the data.
I’m confused. Rank variance? Between drivers? Within drivers? What does any of this mean?
I’m glad you asked! In general, variance is a way of understanding how consistent results are. High variance means results are consistent; low variance means results are inconsistent. In this context, when variance is between drivers, it means that part of the differences between their places can be explained by things that stay the same in each driver from game to game, such as their kart and skill level. When variance is within drivers, it means that those differences can be explained by things that change from game to game, such as luck with power-ups, interactions with the course, and interactions with other CPUs. Because our ICC value was 0.391, only 39.1% of the variance is between drivers, meaning that 60.9% of the variance was luck-based.
This information gives us a pretty solid understanding of how many games it takes to know with y degree of certainty that you are better than someone else at Mario Kart (under these specific settings). Now that we know how much of the variation in rank is caused by factors outside of the player, we can calculate the probability that one player is better than another player based on the difference between their average ranks across n games.
Below is a graph of the degree of certainty, y, that you get from x races, as it pertains to all players:
After a single race, you would expect 39.1% of rankings to be reflective of the relative players’ skills. In practical application, this means that on average, only 39.1% of players are going to be within one place6 of the spot that accurately ranks them relative to the other players within the game. After 4 races, you would expect 71.9% of the final rankings to be (within one spot) reflective of the players’ skills. It is only after 20 races that you would expect at least 90% of the final rankings to be reflective of the players’ relative skill levels.
This is a cool graph to know about, but as a competitor looking for the best player, it’s not very helpful. After all, it doesn’t tell you which players are the good ones; it only gives you a vague sense of the accuracy of the scorecard. So, I present my rainbow road: a graph of how certain you can be that one player is better than another, given the number of races that have been run and the difference between their two mean ranks:
Note: Mario Kart’s ranking system tells you players’ final ranks in whole numbers, but this will not give you an accurate picture of the mean difference between player ranks. For example, the game will tell you that the difference between someone who is in first after 4 races and someone who is in second after 4 races is 1 rank, but if one player got first three times and second once (mean: 1.25), and the other player got second three times and first once (mean: 1.75), the difference between their mean ranks will be 0.5, rather than 1. This means there will only be about a 59% chance that the higher-ranked player is better, rather than a 68% chance.
This information is helpful when you’re raging at your friends and want to show that you still plausibly could be better than them, even if you got unlucky in the first game. This information is also helpful toward disproving your friend who claims that they only lost because they were consistently unlucky, even though your overall score across 16 races is still significantly better than theirs.
Finally, this information tells me that there was less than an 80% chance that my friend who had “proven everything he needed to prove” was better than his competitors. Perhaps he was the best,7 but it seems like a stretch to say anything was “proven” that day.
Limitations
While my data is helpful under certain circumstances, it has certain limitations. I will discuss them in order of what I consider to be the least problematic to the most problematic.
This data uses ordinal rank to calculate variance, rather than the built-in Mario Kart scoring system.
For those of you who are unaware, Mario Kart determines its final place by using a point scoring system. The first-place player receives 15 points, the second-place player receives 12 points, the third-place player receives 10, and each consecutive rank receives one less point than the previous one after that, so that the twelfth-place player only receives a single point. I chose to calculate the variance using ordinal rank, rather than score, because I thought calculating the variance in score would make the ICC less generalizable to all players and generally worse. Additionally, the score system only predicts what ending ranks the game will give people, not their true skill level. Because I’m trying to measure the effect of skill on within-race rank, rather than the effect of skill on the game’s shaky attempt to measure your skill, I figured I should measure the ordinal rank. However, I may be wrong about my choice in measuring this like this, so if you are good at statistics, let me know if this was a mistake.
This data is collected from CPUs, so the results only generalize to other CPUs.
While the data is collected from CPUs, I expect it to generalize pretty well to human players. CPUs are still affected by opponents’ power-ups in the same way that human players are, except for the fact that humans might use mushrooms and bananas in slightly more clever ways.8 Because power-ups are the main source of randomness within games, the fact that the data was collected from CPUs should not present a significant problem. The primary difference I would expect to see between humans and CPUs would be that humans are not nearly as consistent in their performance level as CPUs are. If humans are inconsistent, then we can view their skill as a range of values that is randomly selected from9 each time they play a game. For a more inconsistent player, the potential skill range is larger (and probably lower) than it would be for a consistent player, meaning that their performance is more based on luck10 than my data would suggest.
CPU data is determined by their position when the final human crosses the finish line, meaning they don’t get to play a full game.
This is true, but the difference in game duration is usually less than 20 seconds as a result of this. As the game plays for longer, players have more opportunities both to gain and to lose places based on their skill and their luck. It seems fairly unlikely that allowing the CPUs to play out the final 20 seconds of each game would have significantly altered my ICC, even if it would have changed within-game results, so I expect my results to be about the same for real human players.
These results primarily generalize to 200cc; conclusions about different game modes need more data to be solidified.
This is true. I have no idea how these results would generalize to games of different speeds. However, I imagine that they would not be hugely different. If you are interested in calculating the ICC in other game modes, you can read my instructions in Part 2.
Item draws are affected by place in the race, and because there was a human player in first for every round in my data, item draws were slightly shifted.
Yep, this is true, but I have no idea how it will affect the outcome. I think it would make the ICC slightly more luck-based than it naturally should be.
The “39.1%” value doesn’t just measure skill; it measures everything consistent across all 48 races. That means, in addition to skill, it also measures advantages and disadvantages from cart/character selection.
This is a problem only if players randomly select their character and kart. If this were true, correcting this problem would likely lower the ICC (meaning that Mario Kart is less skill-based than currently measured). However, to correct for it, I would need to let the exact same CPUs play as different characters and carts for thousands of additional races, so I could control for the confounding variable. I can’t do this for 2 reasons: As far as I’m aware, the CPUs change from game to game, and I would rather do something else with my life than dedicate it to refining a relatively unimportant Mario Kart statistic. However, I would say that there is skill involved in choosing a strong kart and character, so this doesn’t matter.
The best and worst players are not affected by luck in the same way that mediocre players are, as shown by players who consistently get first. Additionally, power-up draws change based on the place, meaning that luck for everyone doesn’t work the same way.
Someone who doesn’t know how to play the game will almost always finish last, and someone significantly better than everyone else at the game will almost always finish first. In both cases, the players are still somewhat affected by luck. The bad player may get a sequence of Bullet Bill power-ups, and the best player might be hit by a sequence of blue shells. However, because both of the players are away from the fray of random fireballs, shells, boomerangs, and stars, they are not as affected by power-up randomness in the same way that mediocre players are affected. If I had to guess, I would say that the last and (especially) the first-place players are less affected by luck than the middle players, meaning a true ICC would be much higher for them. This is consistent with the variance in results between players I found, but the sample size for this graph feels smaller than it ideally should be.
Additionally, rank is indeed going to be more luck-based in lower places. This is reflected by the variance based on place graph from above. Of course, this graph is partially a statistical artifact, rather than a meaningful observation: for a player whose true ranking is 6th, there’s more room to vary in either direction than a player whose true ranking is 2nd. We should naturally expect the graph to look like an arch, but the graph’s asymmetry indicates that varying power-up randomness has some effect on luck.
ICC assumes interval scale measurements, but Mario Kart places are ordinal measurements.
This somewhat distorts the results. They could be improved by measuring the exact finishing time of each CPU relative to the finishing player or something similar. Unfortunately, I don’t have that data.
These results only generalize to Mario Kart 8 Deluxe.
Yup. They should generalize pretty well to regular Mario Kart 8, but they don’t say much about Mario Kart Wii or Mario Kart 64.
These results are only accurate when all power-ups are enabled.
Yup. When fewer offensive power-ups are available, the game requires more skill, and it becomes less reliant on luck.
I lost focus at some point during the last section, so can you tell me exactly what to tell my friends about this in a way that won’t misinterpret the results?
I appreciate your dedication to accuracy! When you play some casual (all items enabled) Mario Kart 8, playing four races (one Grand Prix) will make it so that only 70% of players will be within one place of the place that reflects their true skill level. To get above 90% accuracy, you should play sixteen races (four Grand Prixes). If someone consistently holds a far-up first place while someone else is around fourth, you don’t need nearly as many games to determine who’s better. If you want to figure out your certainty that one player is better than another, you can use my rainbow road graph. This calculation isn’t perfect, but it’s pretty good. Also, the game is more luck-dependent for the middle places than it is for the top and bottom places.
If I had to guess, I would say that Mario Kart is slightly more luck-based than my median estimate, but figuring out that number requires a lot more replications and rigorous experimentation.
Part 2: How to Replicate My Analysis
Estimated completion time: a little less than 2 hours to find the ICC.
You can make any modifications you think are necessary. My spreadsheet with all my data can be viewed here if you want to see it. If you have any questions, your best bet is probably to explain your situation to ChatGPT and ask for help. If you replicate this (or perform a similar analysis) and post something about your replication, DM me and I’ll restack whatever you post!
My step-by-step guide will be tailored toward Mario Kart, but the system I used will work for any game where you receive a score level for multiple players across different games. In Mario Kart, this was my relative ranking per race.
Step 1: Gather Your Data
To gather your data, find any consecutive sequence of 48 races.11 You can play these games yourself, or you can find a recording of any speedrunner’s gameplay on YouTube. If you need help doing this, o3 does a pretty good job of finding videos based on your specifications.
After each race, take a picture of the first scoreboard that shows up, the one that shows what place each participant got within the race that just happened. You do NOT need to take a picture of the scoreboard that shows the final places for each CPU, the one that represents the sum of all games played so far.
Once you have all 48 of your pictures, create a new spreadsheet on Google Sheets. In the first column, starting on the second row and moving down, write the name of each of the CPUs in the game. Across the top, label the race number. Doing this will make it much easier to salvage any mistakes you might make while entering the data. Then, for each CPU, record the place they received in each game. After doing this, your data table should look something like this:
In your 50th column, calculate the average place of that row’s player. You can do this easily by typing a command into the cell: “=AVERAGE(cell with game 1 place for that player:cell with game 48 place for that player)”.12 By completing the top cell first, clicking on it, and using the circle icon to drag and select the lower cells, the lower cells will be automatically filled in.
Step 2: (Optional? You should probably do it, just in case.) Transformation
Here’s something I did with my data, but I’m not sure it’s essential. For each cell on the sheet so far, create an equivalent mirror cell, about 15 cells lower. For each of these, use the command =(13-name of original cell), then drag the circle (like you did earlier) to mirror all of the cells from before. If done correctly, it should look like this:
Step 3: Do Some Math
To the right of all of your columns, using only your mirrored values, calculate the variance of each row. You can do this by using the command “=VAR(cell with game 1 score for that player:cell with game 48 place for that score)”
It should look something like this (ignore the red box):
Next, you’re going to need to calculate the Mean Squared between groups (MSB) and the Mean Squared within groups (MSW). The MSB is equal to the variance between each of the averages, multiplied by the number of races, 48. The MSW is equal to the average of all of the within-group variances. Your MSB should be much larger than your MSW.
Then, you can calculate the variance due to skill (σα2). That’s equal to (MSB-MSW)/(# of races (48)). Then, you can calculate the variance due to chance (σε2), but that’s an easy calculation because it’s just equal to MSW.
Your ICC is equal to σα2/(σα2+σε2). This number, times 100, equals the percentage of rank variance that is due to factors that are consistent across games (such as skill and kart).
(Optional) Step 4: Make Graphs
You can plug your numbers into Desmos or any other sufficient graphing software to see how your equation works in practice. To see how accurate placements are based on the number of games, use the equation y=((ICC)x/(1+(x-1)(ICC))). Seeing the probability that one player is better than another, given the mean distance between places and the number of games, is trickier. You use this equation:
The mean distance between players ((mean place of Player 1)-(mean place of Player 2)) is d. The number of games is n.
And to make Desmos calculate the standard normal distribution (that’s what ϕ is here), you can input this (using your accurate numbers):
ϕ(z)=the first equation, and z is the second equation. The number of games, n, can be changed with the slider, so you can see how accuracy changes based on the number of games played.
Part 3: How the Math Works
ICC and One-Way Random Effects ANOVA
Everything in this post revolves around calculating the intraclass correlation coefficient, or ICC, of Mario Kart racers, and the corresponding between-group and within-group variances. The idea is that how the racers are spread out across the places in each race is caused by two types of variance: within-group variance and between-group variance. In this case, a group is the set of all of a single racer’s places, like 2nd, then 6th, then 4th, then 3rd… then 7th.
The within a group, or within a racer, variance comes from things that change from race to race: the effects of power-ups, lucky draws, and courses that are difficult for the racer to handle. These are the things that change a player’s rank that are beyond their control: their luck.
The between-groups variance, or between-racers variance, is the differences between racers that come from things that stay constant throughout the games. For example, a player’s skill level and their kart settings are both expected to stay constant from game to game.
The ICC is a measurement of how much of the total variance is between groups (skill levels and kart settings), as opposed to the variance within a group (luck). If a game’s variance were entirely dependent on the skill level of its players (the game is 100% skill13), its ICC value would be 1, it would only take a single race to determine each player’s definitive rankings, and there would be no luck involved in determining the winner.14 If a game’s variance were entirely dependent on luck (like Rock, Paper, Scissors,15 for example), its ICC value would be 0, no amount of games could determine how skilled each of the players truly is, and the winner would be randomly selected.
But how do we get to our ICC? We start out by doing a one-way random effects analysis of variance (ANOVA). We do that by first splitting our total sum of squared deviations (SST) into 2 parts: the sum of squares16 between subjects (SSB), and the sum of squares within subjects (SSW). That can be done with the following equation, where n is the number of subjects, k is the number of games, and each cell is in the ith row, jth column:
The squares in the SSB are multiplied by k to account for the number of times that variation occurs.
Then, we need to calculate the mean values of the SSB and the SSW, which will be represented as MSB and MSW. To calculate these new values, the SSB is divided by n-1, and the SSW is divided by n(k-1).
At this point, if you’re paying very close attention, you might realize that the MSB is actually just equal to the variance between each of the averages times the number of games (k), and the MSW is equal to the average of the variances of each subject. That’s why, instead of having you calculate the SSB and the SSW in Part 2, I had you calculate the MSB and the MSW directly. It’s much easier to do that on Google Sheets.
Now we need to turn our MSB and MSW into our σα2 and our σε2, which represent our between-groups variance (skill) and our within-groups variance (luck). Our MSW is actually already equal to our σε2, but we need to do some work for our σα2. So, how do we do this?
Each subject mean (mean of Yi.) contains the true subject effect αi (variance of σα2) plus the averaged noise εi.. Mathematically, this is represented like this:
μ is the grand mean, and the εi. with the line above it can be decomposed into εi./k. This makes sense, because as we play more games, k, our mean εi gets smaller, and we get closer to seeing our true αi.
Because we are working in broadly applicable, rather than row-specific terms, we need to first start out using our variables for variance, our σε2 and our σα2. Our variance between our averages should be equal to σε2/k plus σα2, but our MSB isn’t just our variance between our averages. It’s our variance between our averages times k. So, to correct for this, we multiply the other side of the equation by k, and we get this:
From there, we can easily solve for σα2 by substituting in the variables we already have, and we get this equation:
σε2 is there as a reminder because I love you <3
Now that we’ve calculated the between-groups variance and the within-groups variance, we just need to calculate our ICC. This is just the between-groups variance divided by the total variance (between-groups variance plus within-groups variance). We use this equation because we’re calculating what percent of the total variance is between groups. Ta da! We’ve done it! We now have our ICC!
Graphs
But how do the equations for our graphs work? Our first graph, the accuracy of placements based on the number of games, is simple. You can use either side of the following equation, where ρ equals the ICC:
And that will be equal to the accuracy of the data after k games. If you want to check and see why these two equations are equal to each other, you can do the math yourself (it’s a lot easier if you start on the right side and substitute in the σ2s).
This works because we’re dividing the between-groups variance by the placement within an individual game, where the noise (within-groups variance) begins to average itself out over time and approach zero.
Now it’s time for us to learn how creating my mathematical rainbow road works:
The equation for this graph is a form of Bayesian inference. Here, we’re calculating the probability that one player is better than another, given the distance between their mean ranks and the number of races played (P(Player 1 is better|Distance between mean ranks and number of races)).
Let’s say that each player has a true skill value that we’ll call S1 and S2.17 Let’s call their mean ranks r1 and r2, and their distance between mean ranks (r1-r2), d. Let’s call the number of races n. Let’s call the difference between the players’ true skill (S1-S2) values Δ. Using these new terms, we can say that we’re calculating P(Δ>0|data), where our data is our d and our n.
Our prior for Δ is distributed as a normal (Gaussian) distribution with a mean of zero and a variance of 2σα2. It’s multiplied by 2 because Δ=S1-S2, and if two independent variables are normally distributed, the variance of their difference is the sum of their variances. Both of their variances are σα2, so the sum of their variances is 2σα2.
Our likelihood of d, how probable our observed difference in mean ranks is under a hypothetical assumption of Δ, is distributed normally with a mean of Δ and a variance of 2σε2/n. As more games are played, we get closer to the mean, the actual skill difference.
The probability of a certain skill gap being Δ, given the data, is distributed normally with the mean being our best guess of the true skill gap and the variance being our uncertainty about the skill gap, after observing the data. We will call these Mpost and Vpost, respectively. We can calculate them using the variance of the likelihood (Vn) and the variance of the variance of the prior (V0), and plugging them into their formula like this:
Our probability of Player 1 being better than Player 2, given our data, is going to be equal to the cumulative distribution function (ϕ) of Mpost divided by the square root of Vpost (aka the standard deviation of the probability function). Mathematically, that’s P(Δ>0|data)=ϕ(Mpost/sqrt(Vpost)). Before moving on, let’s take a minute to talk about the cumulative distribution function.
The normal distribution function, also known as the bell curve, looks like a bell and represents how we expect certain things to be randomly distributed. It has a mean of 0 and a variance of 1. The cumulative distribution function, ϕ, is the integral of the bell curve from negative infinity until z, the variable we take ϕ of, like ϕ(z). For comparison, here is what the two graphs look like:
As you can see, ϕ(0)=0.5, or 50%. If there was no gap between the 2 players’ mean ranks, we would expect them to have equal probabilities of being the better player, which makes sense. If d is negative, our z value (Mpost/sqrt(Vpost)) will be negative as well, meaning that Player 1 will have a less than 50% chance of being better than their opponent. If σα2 was really high and σε2 was really low, even a low d value could yield a high certainty that one player was better than the other. This equation breaks if we had no variance due to either luck or skill, because we would have either absolute certainty that one player is better, or no amount of games could possibly give us any indication of which player was better.
Hopefully, these examples show some insights into how these equations function.18
After plugging in our Mpost and Vpost numbers into the equation and simplifying, we get this:
That’s it! That’s everything! That’s how the math behind my graphs works. If you read this and followed along the whole way, fantastic work! It took me like 5 pages of notes to do this, so if you followed along in your head, you’re pretty smart. Go ahead and give this post a like, maybe share it around.
Part 4: Bonus Note
This post was inspired by Daniel Kahneman’s book, Thinking, Fast and Slow. It was what first introduced me to the idea of regression to the mean and made me realize I could calculate skill and luck levels in Mario Kart. Unfortunately, even though I’ve taken some pretty advanced math classes and am good at learning about math, I’ve never taken a real stats class before. So, to do all of this, I had ChatGPT teach me a whole bunch of math.
This was fairly tricky (like learning math usually is). However, it was really helpful to be able to ask about anything at any time or receive a framework for how to complete a problem. Of course, I didn’t want to risk having some terrible hallucinations ruin my math, so I constantly had o3 fact-check itself. Additionally, I consulted some outside sources to make sure it was doing a good job. Most of the time, it did really well! The only issue it had was consistently giving me an equation for the accuracy graph that approached the ICC as it approached infinity, rather than 1. When I asked about it, o3 would perform calculations using the correct graph, but the graph it displayed would be incorrect. This was more funny than anything, but it left me paranoid about hallucinations for the rest of the project.
My reasoning for including Part 3 was because of my weaknesses in completing this project. As previously stated, I have not formally learned the math for this, and I’m afraid of making silly mistakes. By explaining how I completed my math in detail, I figured that I would make my process more open to criticism and corrections, ultimately making it much easier to show the world the truth of Mario Kart rank variance. Additionally, by explaining how all of the math worked behind this project, I ensured that I understood what was going on, rather than blindly copying GPT’s work. It was a really great learning experience!
In total, this project took me about 20 hours to complete. This was about 4 hours dedicated to calculating ICCs and gathering data, 6 hours dedicated to understanding the math used, 2 hours making (and fixing) graphs, and 8 hours dedicated to slowly translating my whole process into writing form (followed by editing, of course). If you thought this post was neat in any way, like it, share it, restack it, or subscribe to me! I’m trying out this thing where I only make posts that have something interesting to say, which means that by subscribing to me, you’re going to expose yourself to a low quantity of high-quality content. This is basically the best thing you could hope for on a platform where everyone constantly receives the advice to develop every tiny thought they have into 2000 words.
I’m not actually this petty; I’m mostly just really interested in statistics and thought it would be funny to calculate this. This just makes for better storytelling.
I recorded only the place of the CPUs, and not the place of the real player, because I wanted to see how players in the middle of the game are affected by power-ups. The significance of this will be discussed later.
If you want to generalize across other games, the margin of error on this is ±0.15 at a 95% CI.
Amongst CPUs in Mario Kart 8 Deluxe in 200cc with all items enabled in any place below a far-up first place
This is the variance that is caused by the pre-set CPU skill level, kart stats, and anything else that remains constant in drivers across games.
Why within one place? Because this is what is represented in the data table. Additionally, because we are measuring with ordinal rank, rather than an interval-based measurement to get scores, the importance of tiny differences between players will be exacerbated on game-to-game scorecards.
Although we don’t know that for sure.
But there is still luck involved in drawing the skill-based power-ups at the right time.
This skill range might not have an equal distribution; a player might tend toward their median performance level by an unknown amount. This doesn’t matter very much, though.
I know some people who may take issue with me referring to a person’s single-game performance as luck. I think the objection is understandable: we typically think of luck as random circumstances that are completely beyond our control, but we can improve our skill range with practice. However, I still think that because our performance is inconsistent, could be better or worse than the mean, and isn’t fully in our control, our performance level in any given race will be partially the result of luck. This means that we can improve our luck by improving our skill range. This makes sense in practice; a worse player might beat a better player if the better player gets unlucky and has a bad day, even in a game that’s mostly skill-based.
The more consecutive races, the better. They must be consecutive so that the AIs do not change between games.
For example, =AVERAGE(B2:AW2)
This would only be true when a player’s performance level due to skill stays constant between games, but this is not the case in real life. People have good games and bad games, good days and bad days, and minor mistakes, even when an activity has “no luck” involved. I don’t think human performance could ever be given an ICC value of 1 because of these inconsistencies. Something like a Pinewood Derby race would be much closer to achieving a perfect ICC value.
There are a few exceptions to this. If, at the beginning of Mario Kart, before starting any races, each character was randomly given a power-up to have in all of their races, this would be measured as part of the between-racers variance. For example, one player might always start with a star, while another player might always start with a banana. Even though this hypothetical power-up is luck-based, because it stays constant from race to race, it would be measured as part of the between-groups variance, unless you restarted the game after every race.
Assuming nobody cheats…
Here, squares is just shorthand for “squared deviations from the mean.”
In reality, they would probably have a skill range, but S1 and S2 would just be the median value in that range, so this doesn’t matter that much.
Get it? Function?? Hahaha