Wednesday, July 23, 2008

Is pitches seen per plate appearance over-emphasized?

A statistic that is increasingly pointed to as an indicator of a "good" batter is the number of pitches he sees per plate appearance (P/PA). The logic goes that the more patient a batter a player is, the more likely they are to take a base on balls or get a pitch to hit. In addition, the more pitches a batter takes, the more a pitcher is throwing, therefore wearing him out more quickly, allowing the batters team to more frequently get pitched to by the weaker bullpen arms, resulting in more runs scored. While the logic is there, do the numbers back it up? Let's find out.

Using the stats from 2007, here are some numbers from the top 10 players in P/PA:





































Player P/PA OBP SLG OPS VORP
Reggie Willits 4.44 .391 .344 .735 15.2
Jack Cust 4.40 .408 .504 .912 32.6
Bobby Abreu 4.38 .369 .445 .814 27.9
Todd Helton 4.34 .434 .494 .928 51.9
Kevin Millar 4.32 .365 .420 .785 14.4
Johnny Damon 4.30 .351 .396 .747 17.8
Kevin Youkilis 4.27 .390 .453 .843 31.1
Nick Swisher 4.25 .381 .455 .836 31.5
Brandon Inge 4.23 .312 .376 .688 -3.3
Pat Burrell 4.22 .400 .439 .839 34.5
AVERAGE (rounded to 3 digits) 4.32 .380 .506 .887 25.4


and the 10 players in the middle percentile (Aaron Hill was the 50th) in P/PA:







































Player P/PA OBP SLG OPS VORP
Jorge Posada 3.83 .426 .543 .970 73.4
Edgar Renteria 3.83 .390 .470 .860 47.5
Austin Kearns 3.83 .355 .411 .765 12.6
Adrian Gonzalez 3.83 .347 .502 .849 38.4
Aaron Hill 3.82 .333 .459 .792 27.1
Troy Tulowitzki 3.82 .359 .479 .838 37.8
Chipper Jones 3.81 .425 .604 1.029 76.0
Dustin Pedroia 3.80 .380 .442 .823 35.9
Mike Lowell 3.80 .378 .501 .879 46.5
Alex Rodriguez 3.80 .422 .645 1.067 96.6
AVERAGE (rounded to 3 digits)
3.82 .382 .506 .887 49.2


and the 10 lowest P/PA:







































Player P/PA OBP SLG OPS VORP
Juan Pierre 3.40 .331 .353 .685 16.2
Orlando Cabrera 3.39 .345 .397 .742 31.7
Kenji Johjima 3.39 .322 .433 .755 22.2
Freddy Sanchez 3.38 .343 .442 .784 27.5
Torii Hunter 3.37 .334 .505 .839 39.2
Pedro Feliz 3.30 .290 .418 .708 -2.7
Tony Pena Jr. 3.23 .284 .356 .640 -7.6
Vladimir Guerrero 3.23 .403 .547 .950 62.6
Yuniesky Betancourt 3.19 .308 .418 .725 16.2
Corey Patterson 3.15 .304 .386 .690 8.4
AVERAGE (rounded to 3 digits) 3.30 .326 .426 .752 21.4


While these selections obviously don't tell the whole picture, it's pretty revealing to see that some of the better bats of 2007 - ARod, Chipper, Jorge - are all dead center when it comes to pitches seen, while some scrubs and above average fellows fill out the top 10 for the most part. Granted, we have some pretty poor players in the bottom 10 - Juan Pierre, Pedro Feliz, and the truly awful Tony Pena Jr. - but also clearly valuable players like Vlad and Torii Hunter.

So clearly many of the least valuable players in the league, the guys who don't ever walk, get on base, or even hit for power, are the same players who are impatient at the plate. However, simply being superior at seeing more pitches does not appear to be a big factor in being a valuable player, as the guys in the middle of the pack average nearly identical OBP and SLG to the league leaders, while managing to average a much higher VORP.

Additionally, the assertion that a batter who sees more pitches will wear a starting pitcher down more quickly is pretty ridiculous. If the most patient batters are seeing 1 more P/PA than the least patient, that is a theoretical difference of about 5 pitches per game - and that's over 9 innings. Most starting pitchers are going to see a single batter 3-4 times per game - and it's difficult to make the claim that 3-4 more pitches from a starter is going to truly impact upon the batting ability of the batters facing him, or get him out of a game any faster.

So it's fairly apparent that poor batters show the tendency to be free swingers, but patience at the plate does not a good hitter make. Players like ARod and Chipper Jones are going to swing when they get a pitch to hit - no matter when it comes. It's safe to say that a batter averaging under, say, 3.5 P/PA is doing something wrong, but I believe it's incorrect to look at the stat as some sort of larger indicator of offensive prowess.

Monday, July 21, 2008

Is Cliff Lee overperforming?

For my first post here on SABR Rattling, I will focus on my favorite player of the 2008 season, Cliff Lee. Early in the 2008 season, Lee was throwing absolutely ridiculous baseball. In his first 4 starts he gave up 2 ER, walked 2, and struck out 29, over 33 1/3 IP. Obviously, people responded to this outrageous start by saying that he would quickly fall to earth and regress back to his career averages, or at least close to them. After a few more starts, it became apparent that Lee was going to continue to throw the ball far better than he has in the past, at least based on the traditional stats of Wins, Strikeouts, and ERA. However, the more inquisitive or statistically minded fan may say "Wins are meaningless, and ERA is fuzzy, as it is influenced by relief pitchers and the players in the field. Cliff Lee is nowhere near as good as his numbers are." Is that true?

Well, let's look a bit deeper. The only way to truly analyze how a pitcher is performing is to remove as much luck and team defensive influence as possible. Anyone bothering to read this post already knows about DIPS, but here's a quick summary: In 1999, Voros McCracken came to the startling conclusion that nearly anything that happens to the ball once it leaves the bat is essentially luck. He created a formula for analyzing the individual performance of a pitcher, and called it DIPS: Defense Independent Pitching Statistics. The sabermetric community, of course, responded by analyzing his conclusion and ultimately agreed, and it proved to be a breakthrough method of analyzing pitcher's performances. DIPS looks at all the things that are within a pitchers control: strikeouts, walks, hit batters, and HR allowed, and reveals a far more accurate "ERA" than the traditional measurement, known as dERA.

So let's look at Cliff Lee: after 19 starts (so more or less about 2/3 of his season), he is posting an ERA of 2.29, the 3rd best in the MLB for starting pitchers. As mentioned previously, however, this number isn't ideal - we want to look at his dERA - which is a MLB best 2.57. Along with this, his BABIP (Batting Average for Balls In Play) is a fairly high .283, which puts him at 67th in the MLB for SPs. What does this mean? It basically means Lee isn't getting lucky. When players hit the ball off Lee, they are getting on base more frequently than on 50% of other SPs in the game. So what is Lee doing that makes him so effective, even if he is less lucky than most in terms of BABIP?

Most importantly, perhaps, he isn't walking anyone. He has 20 walks in his 19 starts, which ties him for 3rd least in the MLB. A number of pitchers are exhibiting similar numbers of walks, however, and only a few have been nearly as effective as Lee, and the unifying factor is that they have high numbers of strikeouts. In addition to this, Lee has only allowed 5 home runs all season - tying him for 2nd least. All these individual metrics indicate why Lee's overall numbers are so superior.

It might prove interesting to look at some of the top pitchers in traditional metrics and compare them to some of the more interesting ones that we used to analyze Lee:






















Pitcher Wins ERA dERA BABIP BB K HR
Cliff Lee 13 2.29 2.57 .283 20 110 5
Justin Duchscherer 10 1.87 3.63 .202 26 70 7
Rich Harden 5 2.19 2.78 .282 34 102 5
Edinson Volquez 12 2.49 3.39 .278 59 129 6
Tim Lincecum 12 2.79 3.15 .298 51 143 9


Consistent themes here (out of an obviously small set): Low numbers of HR allowed, high number of strikeouts, fairly low number of walks, and a consistently average or above average BABIP. This leads to ERAs that are close to their dERA, or "real" ERA. Do you notice the outlier? Duchscherer, obviously. His dERA is nearly 2 full runs higher than his current ERA, most likely caused by his fairly modest K:BB ratio and extremely low BABIP (lowest in the majors, even). This indicates that Duchscherer is quite likely to regress to an ERA of 3 or above. There hasn't been a single starting pitcher in the 2000's to throw a full season with a BABIP of .202 - for example, the best in 2007 was Orlando Hernandez's .214.

Here is a table showing a few starting pitchers who are currently underperforming in traditional metrics vs. their actual pitching (a difference of 1 run or greater between ERA and dERA):



















Pitcher Wins ERA dERA BABIP BB K HR
A.J. Burnett 10 4.83 3.77 .311 60 132 12
Andrew Miller 5 5.63 3.87 .333 50 80 6
Randy Johnson 6 5.13 3.95 .319 28 95 15
Kevin Millwood 6 5.23 3.97 .360 38 78 10


Consistent themes here? Fairly obvious: much higher than average BABIP (an astonishing .360 for Millwood!), combined with fairly decent BB:K ratios and/or low numbers of HR allowed. These are SPs that are likely to lower their ERA (and might make for good fantasy pickups if your league is deep).

Besides Duchscherer, who else is overperforming? Yet another poorly constructed table tells the tale:



















Pitcher Wins ERA dERA BABIP BB K HR
Joe Saunders 12 3.05 4.59 .232 34 64 15
John Lannan 6 3.29 4.58 .260 38 63 12
Armando Galarraga 7 3.41 4.69 .231 37 68 13
Gavin Floyd 10 3.52 5.11 .212 50 81 18


Themes? Low BABIP, fairly poor BB:K ratios, higher number of HR allowed. These fellows are more likely to regress, so may make good trade bait (although its unlikely people have or want Lannan or Galarraga on their rosters).

However, there is the possibility that pitchers like Floyd, Duchscherer, Saunders, etc, are just BETTER at inducing ground outs, which would lead to a lower BABIP than average. As the DIPS 2.0 formula used to calculate the dERA does not incorporate such factors, it may be incorrect. In 2005, David Gassko wrote an article entitled Batted Balls and DIPS for The Hardball Times, in which he attempts to create an even superior version of DIPS by incorporating things like ground ball/fly ball rates. Even so, research by McCracken and others shows that BABIP is hugely volatile season to season (especially compared to the consistency of HR and BB allowed), indicating that it is unlikely that a pitcher has much control over BABIP. However, pitchers DO have large differences between career BABIP, and in many cases these career numbers account for their success.

So while McCracken's findings still hold up quite well, there is likely more work to be done on DIPS. Even working with what we have access to now, though, we can analyze whether or not they are over or underperforming, and make predictions on their future performance. So, is Cliff Lee overperforming? A tiny bit. By any measure, however, he is an elite pitcher in 2008.