Unveiling the Career Trajectory of Baseball Legend Hank Aaron: A Deeper Dive into OPS and Age
Hammerin' Hank
Hank Aaron was one of the greatest ever to play the game. If you ask a biased Braves fan like me, or anyone who doesn't care for Barry Bonds, you’ll hear us call him “The True Home Run King.” Aaron hit 755 home runs in his career. His 715th career home run passing Babe Ruth as the all-time leader is one of the most iconic moments in American sports history. A record that had stood for decades and was owned by arguably one of the greatest baseball players of all time. Henry Aaron was 40 years old when he broke the Great Bambino’s home run record. He retired two years later after the 1976 season. He had not led the league in any offensive category since 1971 where he was tops in SLG, OPS, OPS+, and SF. Mind you, he was 37 years old when he accomplished this feat. If you were to take a look at his career stats you will notice that he led the league in at least one category as early as his second year in the big leagues as a 21-year-old. But was there a single year where Aaron peaked in performance?
Aaron was known for his durability throughout his career. Outside of his rookie career, he would not have played less than 140 games until the previously mentioned 1971 season when he played in 139 games. Looking at his career stats you’ll notice that he performed at a high level throughout his career but there’s no single year or age we can point to and determine this is exactly where he peaked as a major league hitter. So, when did Hammerin’ Hank peak? Luckily there is a way we can take age and OPS and formulate a model that will help answer this question.
To determine when Aaron peaked, we can analyze his age and OPS (on-base plus slugging percentage) using a model described in Chapter 8 of Analyzing Baseball Data with R1. While it's commonly believed that MLB players peak in their late 20s, the data shows that this is not always the case. By plotting Aaron's age and OPS on a scatterplot, we can observe trends in his performance. His OPS fluctuates over the years, with a slight drop at age 24, a significant increase at age 25, and subsequent fluctuations until his mid-30s. Surprisingly, his OPS at age 35 is similar to his numbers in his late 20s. In his late 30s, his numbers take drastic swings, reaching his highest OPS ever, dropping, and then shooting back up at age 39 before declining again in his 40s. It's worth noting that even his "bad years" are seasons most players would envy, with only 6 out of his 22 years in the big leagues ending with an OPS below .900. Overall, Aaron's career OPS is .928, and he holds the MLB record for total bases with 6,856. Below is the code to create the scatterplot.
ggplot(Aaron, aes(Age, OPS)) + geom_point()
We can take this scatterplot and model the relationship using a smooth curve. Once we have a curve, it will be easier to compare to other batters with similar numbers. This is where the quadratic function comes in. Now, I have not thought about quadratic functions in years so I will try to explain it as best as I can. A, B, and C are the “constants where they can be best matched at the points on the scatterplot. The A constant is the value that is being predicted when the player is 30 years old.
This equation shows where the model believes the player peaked in batting performance in their career. These are considered predictors.
The maximum value of the curve estimates the highest OPS of their career.
The C constant in the function provides insights into the curvature of the curve. A higher value of C suggests a player rapidly reaches their peak and then experiences a rapid decrease. Specifically, C represents the change in OPS from the player's peak age to one year later.
In the R programming language, there are functions available to assist with the formulas and model fitting process. The linear model function, lm(), is utilized to fit the quadratic curve. The resulting maximum value and peak age, as mentioned earlier, will be stored in the variables Max and Age.max, respectively.
> fit_model <- function(d) {
+ fit <- lm(OPS ~ I(Age - 30) + I((Age - 30)^2), data = d)
+ b <- coef(fit)
+ Age.max <- 30 - b[2] / b[3] / 2
+ Max <- b[1] - b[2] ^ 2 / b[3] / 4
+ list(fit = fit, Age.max = Age.max, Max = Max)
+ }
Now it’s time to get our coefficients and intercept. “Aaron” is where the Hank Aaron data is stored.
> F2 <- fit_model(Aaron)
> coef(F2$fit)
(Intercept) I(Age - 30) I((Age - 30)^2)
0.9873457928 -0.0005464657 -0.0014949006
> c(F2$Age.max, F2$Max)
I(Age - 30) (Intercept)
29.8172234 0.9873957
Here’s the easy part. We take the outputs from above and plug into the formula from earlier.
Using a quadratic function, we can model the relationship between age and OPS and estimate when Aaron peaked. The coefficients and intercept obtained from fitting the quadratic curve reveal that Aaron peaked at the age of 30 (rounded up from 29.8) with a maximum OPS of 0.987. The curvature parameter suggests a small decrease of 0.00149 in OPS between his peak age and one year older. While this decrease may seem insignificant, it provides insights into Aaron's performance trajectory.
The assumption that most players peak in their late 20’s would apply to Hank Aaron as well seems to be fair. He was still putting up Hall of Fame numbers past age 30, but it is interesting to dive into the numbers and analyze from a different angle.
We’ve seen Hank Aaron’s career trajectory but how does he compare to other players? Well, let’s go back to R and pull some similar player data and see. These names shouldn’t surprise you if you’re a baseball fan. These are the estimated career trajectories for players similar to Hank Aaron.
Comparing Aaron's career trajectory to other players, such as Albert Pujols, Stan Musial, Willie Mays, Frank Robinson, and Barry Bonds, we can observe similarities and differences. Most of these players had strong OPS numbers early in their careers, with Musial's OPS not dropping below .900 until his late 30s. Aaron, Musial, Robinson, and Mays share similar career trajectories, while Pujols and Bonds stand out with their unique patterns. Pujols experienced a decline in his early 30s, while Bonds' career trajectory defied expectations, peaking after the age of 35. The reasons behind Bonds' late-career peak are a topic of debate among baseball fans.
Hank Aaron had a legendary career and he performed at a high level for a very long time. It’s always great to take a deeper look and see just how good these legends actually were.
Please consider leaving a comment or hitting the like button if you find this content enjoyable. Your feedback is greatly appreciated!
Sources
https://www.baseball-reference.com/players/a/aaronha01.shtml
Analyzing Baseball Data with R | Exploring Baseball Data with R (wordpress.com)