» Fourshizzle? Bronx Banter

1 RagingTartabull ~ Oct 30, 2009 4:24 pm

ya GOTTA go with Carsten Charles for Game 4 now.

you're up 2-1 you can put your foot on their throats, you're down 2-1...well do you want Gaudin out there down 2-1?

Game 5 is a horse of different color, but I'd like to find a way to survive the next 2 games before I start worrying about that.

2 monkeypants ~ Oct 30, 2009 4:27 pm

[1] Hopefully the Yanks can score some runs off of Blanton and take the pressure off CC from having to go 7 or 8 innings.

3 Cliff Corcoran ~ Oct 30, 2009 4:27 pm

I think this move hands the Series to the Yankees. It's theirs to lose now.

4 ms october ~ Oct 30, 2009 4:28 pm

i'd like to think this is good news - he better not turn it to joe blantonfax

as great as cliff lee was in game 1 and with cc not quite as good as cliff lee in game 1, the fact that cc can go on short rest and it doesn't seem to even be an option for lee is big points to cc

5 RagingTartabull ~ Oct 30, 2009 4:31 pm

I was expecting this honestly, Blanton gave them a decent start against LA and Lee on 3 day's is a completely unknown commodity. He could very well be fine on short rest, but Game 5 of the World Series is no time to go trying to find that out.

My focus is on tomorrow, I think Andy will be ok but I could also see Hamels rearing back and throwing a little '08 action at us. I do however think you'll see A-Rod breakout in that ballpark.

6 The Hawk ~ Oct 30, 2009 4:33 pm

Man this site is all over the place today. We are a dispersed people.

7 wcyankee ~ Oct 30, 2009 4:43 pm

[5] I just hope someone told A Rod that he did great prior to the serious and now he's pressing. Cut. That. Shit. Out.

8 51cq24 ~ Oct 30, 2009 4:49 pm

if we can manage to win tomorrow and sunday, i guess i'd go with gaudin against lee in game 5. that way we can avoid aj on short rest and without molina (or with molina) vs. lee, and we'd have him and cc lined up for games 6-7 with pettitte backing them up (either in short relief if we're up in game 6 or short/long relief in game 7). if we lose one of those games, it's a close question. i think i'd go to aj in game 5 then, but i'm not sure. if we lose both games, obviously we have no choice.

9 The Hawk ~ Oct 30, 2009 4:52 pm

The one thing I really wonder about is if it comes down to a Game 7 would I really want CC to come in and replace Pettitte? It's one thing for Sabathia to pitch instead of Gaudin, but Andy's another story ...

10 51cq24 ~ Oct 30, 2009 4:54 pm

[9] hey i love pettitte, but i don't think there's much of a question. and if cc struggles at all, you go straight to andy.

11 RIYank ~ Oct 30, 2009 4:56 pm

[7]
Alex, you MUST STOP PRESSING. It's absolutely critical for you to stop it RIGHT NOW. Everything depends on it: your team, your reputation, your place in history!

[8] I think AJ on short rest is fine. Pettitte is the question mark for me.

12 51cq24 ~ Oct 30, 2009 4:56 pm

[10] of course, if pettitte is awesome tomorrow and cc doesn't look great sunday, i might change my mind about that.

13 RIYank ~ Oct 30, 2009 5:04 pm

Burnett last night threw first-pitch strikes to 21 of the 26 batters he faced. I guess that stat has been floating around out there enough that most Banterers have heard it already, but I thought it was remarkable.

I figure when his turn comes up again, the Phils might be swinging at that first pitch a lot more often...

14 51cq24 ~ Oct 30, 2009 5:05 pm

[11] good point on pettitte. although i think they're both question marks because of the molina/posada thing. let's just hope we win the next 2 games. i think we will.
when i was younger i always kind of liked away games because of the chance to jump out to a 1st inning lead before the other team even bats. let's hope for some early offense, 2000 alds game 5 style.

15 Yankster ~ Oct 30, 2009 5:05 pm

Something I read on a Phillies blog: "Why did they go public on this decision so early?" I hope the Yanks very quietly prepare whoever will pitch but don't pre-announce. No reason to give too much notice...

16 The Hawk ~ Oct 30, 2009 5:05 pm

[10] But at the point of Game 7, Pettitte would be fully rested and CC would be about to pitch for the third time in ... nine days? Something like that.

17 The Hawk ~ Oct 30, 2009 5:08 pm

http://tinyurl.com/ye8hzfr

God I love this article. It is so wrong-headed to me, and well, what I really love is some of the responses. They really say so much that I've tried to say and usually failed.

So funny though. Girardi was genius last night - everything worked out with Hairston and Molina. Posada's hit was terrific - what at great job by ol' Jorge.

18 thelarmis ~ Oct 30, 2009 5:11 pm

[6] i am a schizo. (and so am i)

19 RIYank ~ Oct 30, 2009 5:20 pm

[17] Interesting.
I actually agree with Rosenheck about Swisher/Hairston. But, as you point out, every decision Girardi made yesterday worked out (I said in an earlier thread that he had the golden touch), so I'm not complaining about it.

20 Just Fair ~ Oct 30, 2009 5:20 pm

[13] i think that's the sure fire sign that AJ is going to have a good game. When he throw first pitch strikes he's fun to watch. Anyone count all the those balls that got by Molina last night? I think it was zero. : ) I will continue to contend that Molina's slick glove behind the plate gives space brain AJ confidence. But that's just me.

21 The Hawk ~ Oct 30, 2009 5:21 pm

[19] I think sitting Swisher was a great idea. It can't hurt, and if he comes back and has calmed down a bit, then it's worth it.

22 monkeypants ~ Oct 30, 2009 5:25 pm

[20] Anyone count all the those balls that got by Molina last night?

You're joking, right? because at least three balls got away from him (though none with men on). I think he just got "crossed up" a few time.

[19][17] Yeah, I actually agree with the article--it's spot on. Except of course the moves seem to have worked out and, more importantly, the team got the W. I hope to keep being wrong, if the results turn out the same.

23 The Hawk ~ Oct 30, 2009 5:33 pm

[22] No no no, I disagree with the article and agree with the bulk of the comments.

24 The Hawk ~ Oct 30, 2009 5:34 pm

[23] I just realized that you may already understand this - your "yeah" got me flying in the wrong direction (maybe).

25 Just Fair ~ Oct 30, 2009 5:37 pm

[22] I wasn't joking. The 55 foot curveballs in the dirt that Molina blocks with his chest stick out as well as a few of the 58 footers that he easily scoops up. Plus picking off Werth. I honestly don't remember any balls getting by Molina. And if there were, they were not with men on base as you said. I know your position on this. I on the other hand, support the Molina-AJ battery.

26 monkeypants ~ Oct 30, 2009 5:37 pm

[24] Sorry, I was unclear. I meant "yeah I agree with RIYank" that I actually am of the same mind as the article, but I recognize that the moves turned out well, so you (Girardi) are looking like the smart guys!

27 monkeypants ~ Oct 30, 2009 5:41 pm

[25] Molina did have a fine game, fo sure. A few balls bounced off his glove and trickled away, as I recall. I thought at the time it was funny, because the night before someone (Robertson?) threw a pitch that "cross up" Posada (no one advanced, maybe no one was on base?), Sutcliffe said "that's why Molina is catching Burnett. Then within the first few innings, the same thing happened to Molina and Sutcliffe was silent.

Ultimately, these things are (in my mind) of little consequence. If AJ feels warm and cozy with Molina catching him---and that's the official line---then he should catch him. The external evidence ("cross ups," mound visits), these happen to Molina and Po alike when they catch AJ.

The pick off was mighty nice.

28 RIYank ~ Oct 30, 2009 5:41 pm

I guess I don't have a huge problem with the Hairston/Swisher thing, but Swisher is much better at hitting and fielding. He's better at baseball. I agree with Rosenheck that it's a mistake to make your decisions based on what a position player has done in his past few games.

29 RIYank ~ Oct 30, 2009 5:42 pm

Yeah, that pick-off was very gratifying.
Then again, so was Posada's PH single.
Girardi could do no wrong.

30 51cq24 ~ Oct 30, 2009 5:46 pm

[16] well i think we should wait til it gets there to decide, but that's definitely not a bad point. i guess you could go either way, with cc backing up andy if andy falters. and that might make a little more sense. i think a lot of it has to do with how cc looks in game 4. honestly, he hasn't been quite as sharp in his last 2 starts as game 1 against the angels, but he's still be pretty nasty, and whatever early control problems he had wednesday might have been because of the weather. let's just see how he looks in game 4 and how many pitches he throws.

[17] i agree with you. normally i wouldn't want to sit someone because of a few games, but swisher has looked so bad that it was the right move. plus, it's not like nick swisher is a superstar. he had a good year and he gets on base a lot, but facing pitchers like pedro who throw strikes is not his strength. as for molina, i have been skeptical, but it's hard to find fault when aj pitches this well. outside of 1 really bad inning, he's been quite great with molina this postseason.

31 Yankster ~ Oct 30, 2009 5:47 pm

Stats based on large samples suggest probabilities for subsequent large samples. Those stats (dependent on things like the variability of the values measured) are meaningless in small sample events like Molina's two or three at bats. Molina in a useless sample, has a higher postseason ops than posada, which isn't probable according to the large sample stats, for example.

Posada in 358 postseason at bats has a .742 OPS and Molina, in 13 postseason at bats has an OPS of .745. Neither are statistically meaningful in predicting how they will perform in two isolated at bats. Posada's are probably meaningful in predicting how he will perform on average in the next 358 at bats.

Another huge fallacy is that streaks are unlikely. Even in random coin tosses, long streaks are very likely to occur with a frequency that people in lab tests rarely anticipate. Which can cause players with streaks that start randomly to believe they are non-random which can cause them to become non-random, as Swisher's certainly looks.

32 51cq24 ~ Oct 30, 2009 5:49 pm

[30] i think that's the second time in 2 days i've left out the "en" in been. strange.

33 The Hawk ~ Oct 30, 2009 5:54 pm

Yeah I mean the thing in the article I disagree with most is that streaks are a "myth". I honestly feel that's an outrageous contention, in application in this case particularly - that people don't get gummed up or in a rut. Yeah statistically it will even out over time, but that doesn't mean there aren't streaks - and in a seven game series, time is one thing you don't have.

Again, I am proving that the replies to that article are better expressed than I'm capable of.

34 RIYank ~ Oct 30, 2009 5:57 pm

[31]

Stats based on large samples suggest probabilities for subsequent large samples. Those stats (dependent on things like the variability of the values measured) are meaningless in small sample events like Molina’s two or three at bats.

Uh, no, that's not true. How could they be meaningful over a long stretch if they weren't meaningful in each short part of that long stretch? That's mathematically impossible (I mean, literally, it is, because probability is additive).

35 RIYank ~ Oct 30, 2009 5:58 pm

[33] I know what you mean, but there really is a lot of evidence that streaks are a kind of cognitive illusion. I'm not as impressed by that evidence as some people, but I'm still somewhat impressed by it, enough to be wary of my own instincts (and my instinct certainly would have been to bench Swisher).

36 The Hawk ~ Oct 30, 2009 6:02 pm

[35] But often streaks are caused by something actually being out of whack - it's not someone's imagination, they're really doing something wrong. Then they correct it and get back on track. I'll take your word for the evidence but I'm resistant in principle.

37 RIYank ~ Oct 30, 2009 6:07 pm

[36] I know, it's hard to deny that. And everyone who has ever played baseball at a high level says it, too. That's one reason I'm not really sold on the Tom Tango line.

38 monkeypants ~ Oct 30, 2009 6:20 pm

[36] That is true, which is why the random coin toss analogy does not really hold up. But presumably players with high OBP (for example) have such because in part they are less prone to streaks (greater physical skill, better able to correct mistakes, better able to compensate, etc.). Swisher is historically a much much better player than Hairston. So, the gamble is that Hairston's known ability (albeit stinky) is more likely to pay off that Swisher is likely to "get back on track."

In a short series, it is indeed difficult to let a slumping player work it out. On the other hand, outs are precious in a short series and it is hard to justify sitting better players. The Hairston thing worked out when he got a nice hit off of a tiring Pedro, and he didn't screw up in the field. I'm happy now to put that little experiment to rest and go back to Swisher...or even try Hinske if the coaches believe that Swish is still fouled up.

39 OldYanksFan ~ Oct 30, 2009 6:47 pm

[17] There is onefact that should be remembered. Swisher could come in any time for JHJr, in the 2nd inning (if need be), and Po often gets 2 ABs when he comes in for Molina later in the game. So, unlike roster construction, these move could be 'righted' at any time if need be. And having a really good PHer is a nice thing.

Out of curiosity, do we have an emergency #3 catcher? Is it Swisher? I wonder if they give him any practice time as a C. If he wasn't a total dud, it would be nice for Girardi to bring in Po for Molina without sweating bullets.

40 thelarmis ~ Oct 30, 2009 6:51 pm

[39] Hairston is the emergency catcher.

i say let Swish start tomorrow. see how he does. might be smart to get Hinske in for Game 4 vs. Blanton - he has 4 hits against him: 2 doubles, 2 homers

41 Rob Abruzzese ~ Oct 30, 2009 6:54 pm

This is really great news.

42 OldYanksFan ~ Oct 30, 2009 7:28 pm

[40] I am all for Swisher starting, and thought he should have started yesterday. My comment was not in support of playing JHJr , but rather that is was not a HORRIBLE move,

43 Just Fair ~ Oct 30, 2009 7:34 pm

Deadspin runs a feature titled Why Your Stadium Sucks. They're generally entertaining. But I don't need much. YS 2.O in the spotlight.
http://deadspin.com/5393033/why-your-stadium-sucks-yankee-stadium

44 monkeypants ~ Oct 30, 2009 7:56 pm

[39] I'm not sure how "having really good PH" plays into it. If he is slumping so bad that he can't start, then it is probably the case that he shouldn't PH---he's most certainly not a "really good" PHer at that point. Plus, for whom else would he PH except the player who has replaced him? maybe Melky, but Girardi seems loathe to PH for anyone except Molina (who will be PH for by Posada), to the point that even the PRs have ended up hitting for themselves. He certainly doesn't seem to trust Swisher in the field more than Johnny.

[40] That's right, Hairston is the emergency C. So, by playing him in RF and starting Molina---and then PR for Hairston---Girardi used his BUIF and both BUCs!

Actually, this doesn't bother me so much, since the risk of injury to Po in the last few innings is so small and one should not build a strategy around it.

45 mehmattski ~ Oct 30, 2009 8:27 pm

if you're flipping a coin 1000 times, and somewhere in the middle you get like 20 heads in a row, would you consider heads "hot"? If someone then came up to you and wanted to bet you one hundred dollars on a coin flip, would you use that coin and call "heads" because the coin was "hot"?

I'm guessing that a large number of people would actually use the "hot" coin thinking it gave them a better shot. And if those folks won the bet, it would seem self-validating. If those folks lost the bet, they would chalk it up to being "unlucky." I'm also guessing these same folks are the ones more likely to think that 33 AB against Pedro, or 37 Post-Season ABs, are good reason to sit/start someone, rather than the overall likelihood that someone gets on base (measured as that player's recent on base percentage). Or would think that Jeter bunting last night was a good idea.

I would also wager that folks thinking this way don't understand anything about probability or statistics.

46 monkeypants ~ Oct 30, 2009 8:37 pm

[45] i agree with you in general. But can you compare a truly random event (a coin flip) with a non-random even (an AB by a player)? While the coin that came up head twenty times in a row will no more likely come up heads the next time (it's hot) nor to tails (the "law of averages"), that does not mean that it is impossible for a player in a slump to have a lower chance than his historical averages of getting a hit the next time up (perhaps he is injured, or has developed a short-term mechanical problem that he hasn't worked out yet).

I agree with you that benching Swisher because of 37 post season ABs (versus his career numbers) is a relatively poor use of statistics. But his ABs are not random events and thus are not exactly analogous to your coin that came up heads (or perhaps better in this comparison, tails) 20 (or 37) times in a row.

47 mehmattski ~ Oct 30, 2009 8:54 pm

[46] If a player is injured, it's the duty of the training staff and the manager to not send out a suboptimal lineup. Slumps that are due to mechanical errors are trickier, because there's never any suggestion that correlation implies causation.... if a player is slumping and the hitting coach identifies a "hitch" in the swing and suddenly the hitter is hot, it's somehow seen as proof.

The phenomenon that I am fairly confident is going on is the uncanny ability of the human brain to identify patterns where none exist. Constellations are the obvious example... truly random distributions of stars from our position in the universe used to depict the outlines of familiar objects. In sports, pattern identification is self-verifying, and we often convienently forget the instances where our pattern is wrong. Every time a manager calls for a bunt when the team is ahead by two runs, he is putting the team in a worse position to win the game. But every time that bunt is followed by a hit and a runner scores, it validates the bunting philosophy. Sac bunts in innings resulting in no runs are conveniently forgotten.

And for all I know about statistics and our understanding of the way the human brain works, I know nothing about being a professional (or even amateur) baseball player. I still don't understand how practicing by hitting 70 mph meatballs prepares a hitter for 99 mph fastballs, or Santana's change, or AJ's curve, or Mariano's cutter...

48 Shaun P. ~ Oct 30, 2009 9:08 pm

AWESOME news. I am very excited by this!

[40] Thank you for the birthday wishes in one of the earlier threads, thelarmis! I hope your dad (and 51cq24's dad) had a good birthday too! For the record, yesterday was also the birthday of "Ken Tremendous" of FireJoeMorgan and The Office fame, and Jesse Barfield. And unnoticed, even though Pedro was starting, it was also the birthday of a Banter favorite. A gentleman known as "Who is". =)

49 Mr. OK Jazz TOKYO ~ Oct 30, 2009 9:08 pm

[43] That's hysterical. The New YS really is a joke in so many ways..they really do make it hard to be a fan sometimes!

Balnton: Thank you Charlie Manuel! No Lee = Homer Happy Yanquis!

50 monkeypants ~ Oct 30, 2009 9:21 pm

[47] Again, I dig what you're saying. But at the same time, there is a difference between a truly random event (a coin flip) and one that is not (an AB). Maybe a player goes 0-12 over three games because he bats poorly against LHP, and the team faced three tough lefties. His 0-12 is not a random occurrence analogous to twelve straight coin flips landing heads. There is a cause and effect dynamic influencing the outcome of those 12 ABs.

To my mind, the word "slump" merely describes a prolonged period of highly disproportionate number of negative outcomes. Now, when a player is in a "slump," it may be more or less random (i.e., he has some bad luck, hits the ball herd but the balls find fielders' glove), or the result of circumstance (facing a series of very tough pitchers), or it may be the result of some physical or psychological factors.

I'm just saying that the entire issue of slumps (or whatever you want to call them) is not really the same thing as a coin tossed heads 20 times in a row.

Now, clearly the fact that a player is 0-10 or 0-20 or whatever will not MAKE him go 0-4 the next day. On the other hand, a prolonged "slump" be the result of and indicate some underlying cause. If so, then it is reasonable to consider that a player in a slump may continue to slump until the underlying cause(s) is/are resolved.

51 OldYanksFan ~ Oct 30, 2009 9:27 pm

[46] I think that even if there is A reason, or many reasons for a slump, that any small sample is somewhat random. Swisher may have a .400 OBP, but not because he gets on base 4 out of every 10 times. For any random 10 ABs, he could get on base 8 times, or 0 times. That's why [34] is wrong.

If Swisher gets on base around 200 times in around 500 PAs, he might have a .400 OBP (Large sample size), but it has little bearing of what might (ACTUALLY, not PROBABLY) happen in his next 20 ABs (small sample size). Yes, MATHEMATICALLY he SHOULD get on base 8 times.... but 4 times, or 12 times, is just as likely.

RCNB (random chaotic nature of base).
Real life does not follow precise mathimatical probability. How about Mathias in the CS? Or Ruiz?

While Posada has a much higher OBP then Molina, Molina has had 3 hit games and Posada has had 0-5 games. Yes, the probability is on Posada's side, but there have been, and will be, small sample size realities where Molina will do better.

Historical Actuality does not equal future probability in small samples.

Here are some facts.
Over the largest sample size, ARod has an OPS of .965.
In his last 2 PS series (2009), his OPS was around 1.500
In the 2 PS series before those last 2 (06,07) his OPS was.525ish.

So... would you like to predict what ARod's OPS will be in the next 2 series? .950ish? My guess would be between .400 and 1.600.

In the 2009 WS, Molina has a better OPS then ARod, and has a far better K/9 rate.

52 OldYanksFan ~ Oct 30, 2009 9:30 pm

"And for all I know about statistics and our understanding of the way the human brain works, I know nothing about being a professional (or even amateur) baseball player."

FUCKING AWESOME!
I feel the same way.
I think it applies to all of us (except William, of course).

53 Mr. OK Jazz TOKYO ~ Oct 30, 2009 9:35 pm

[52] Tee-hee-hee..

So happy we are in the WS, everyone join me to dance at 02:30 of this!
http://www.youtube.com/watch?v=gWiXzP3w8hk&feature=related
Chacun dansez en musique Tout Puissant OK Jazz!

54 RIYank ~ Oct 30, 2009 9:46 pm

[51]

If Swisher gets on base around 200 times in around 500 PAs, he might have a .400 OBP (Large sample size), but it has little bearing of what might (ACTUALLY, not PROBABLY) happen in his next 20 ABs (small sample size).

I think that is just gibberish.
What on earth does "what might (ACTUALLY, not PROBABLY) happen" mean? I understand what it means to say something will probably happen. I know what it means to say something actually happens. I know what it means to say that something might happen. I think it is complete nonsense to speak of what might actually not probably happen.

Yes, MATHEMATICALLY he SHOULD get on base 8 times…. but 4 times, or 12 times, is just as likely.

No, that is not right. It's simply a misunderstanding. It is much more likely that he will get on base eight times in his next twenty than that he will get on base twelve times in his next twenty, or four times in his next twenty.

55 thelarmis ~ Oct 30, 2009 9:48 pm

[54] ah, whadda you know about mathematics anyway?!?!?!

have you ever even tasted it? i mean, how do you know you don't like it, if you've never even tried it??? jeesh...

; )

56 monkeypants ~ Oct 30, 2009 9:52 pm

[52] So… would you like to predict what ARod’s OPS will be in the next 2 series? .950ish? My guess would be between .400 and 1.600.

Well, if you want to play this game, then I predict it will be between .000 and 5.000. It all depends on the specificity to which you ant the prediction. If you asked me to predict his performance within a very small span (say +/- .050 OPS), I would have to predict .925-.975 OPS, no?

I understand what you are trying to say, about small samples not necessarily reflecting closely a player's overall numbers. But you are wrong, I think, to say that there is essentially no relationship between the larger samples and the smaller samples.

That is, unless you really do believe that you just can't predict baseball.

57 monkeypants ~ Oct 30, 2009 9:54 pm

[56] was for [51]...but as usual. RIYank's objections [54] are more clear and succinct than my own.

58 RIYank ~ Oct 30, 2009 9:58 pm

Thelarmis, I meant to mention to you: although I was sorry to miss the Bantering last night, I think it might have been a disaster for my kishkes.

59 RIYank ~ Oct 30, 2009 10:08 pm

There's a recent investigation of the "hot hand" in baseball that does find evidence of streakiness. The basic finding is that there are more long hitting streaks (in the usual sense of streaks of games in which the player gets at least one hit) than is predicted by the simple model where the batter is thought of as a weighted coin, his chance of getting a hit on any given AB equal to his BA. A few potential explanations are tried (BA varies with the weather, with the pitcher, with the park) and found insufficient to explain the extra streaks. The author thinks it is indeed psychological, but there isn't a lot of evidence for that.

"Hitting Streaks Don’t Obey Your Rules ", Trent McCotter, 2009 -- I think I found it in Scientific Commons.

60 Mr. OK Jazz TOKYO ~ Oct 30, 2009 10:27 pm

Interesting argument, Banterers. Thank you for all the insightful comments! Nothing to contribute myself (I plan absentee ownership..will stick to building ships and linking to music clips).
After some thought though..I just side with Papa John.."ya just can't predict baseball, Suzyn"!

Oh, and Swisher should be playing. :)

Beautiful autumn day here, time to head out for a Tokyo old-town walk and some "okonomiyaki" (fry-as-you-like) & beers!

61 OldYanksFan ~ Oct 30, 2009 10:44 pm

[55] I have tasted Mathematics. It tastes a lot like bacon flavored chicken. What do you think?

"I would have to predict .945-.990 OPS, no?"
Well, take ARod's PS history, take the first 2 series and calc his OPS. Now the next 2, and so on. How many come out within 25 pts of .965? Less then half? Less then 1/3rd? I agree, based on historical data, .925-.975 might be the best guess, but not necessarily an accurate guess the majority of the time. Again, just look at small groups of actual data.

"That is, unless you really do believe that you just can’t predict baseball."
Well... I can... but you can't.
(j/k)!

RIYank. Easy enough. Look at Swishers history. Break it into groups of each 20 PAs the effect OPS. Now tell me how many times he gets on base 8 times. I'll bet it's less then half.

I can't predict future ACTUALITY, but past ACTUALITY is called history. Just check out Swishers history, and you will find MANY, MANY small groups of PAs where his OBP for that for group doesn't conform to his (mathematical) career (large sample size) OBP.

Again, look at the vast difference in ARod's last 2 series (1.500 OPS) to his previous 2 (.525), compared to his career (.965 OPS).. Do it wuth ANY player and you will find that there are small sample size groups that are vastly different then career norms.

Here's a fun question. What do you think are the odds that ARod has a .945-..990 OPS this WS? (Hint: In 12 PS series, ARod has NEVER posted a .945-..990 OPS)
http://www.baseball-reference.com/players/r/rodrial01.shtml

62 Rich ~ Oct 30, 2009 10:53 pm

A-Rod's performance in the postseason reminds me of Yogi Berra's supposed quote that: "Ninety percent of the game is half mental." I'm not sure why.

63 OldYanksFan ~ Oct 30, 2009 10:54 pm

"Baseball is ninety percent mental and the other half is physical." Yogi.

64 OldYanksFan ~ Oct 30, 2009 10:56 pm

[63] But that might not be true, because Yogi said:
"I never said most of the things I said."

65 monkeypants ~ Oct 30, 2009 11:08 pm

[61] I agree, based on historical data, .925-.975 might be the best guess, but not necessarily an accurate guess the majority of the time.

Ah, you're verging into "you can't predict baseball" territory again.

It may not be "accurate" (though this is misleading; you need to define the acceptable range of error...do you mean withing 1%, 10%, etc?) the majority of the time, but i suspect that it is more often accurate than any other range you give. In other words, I suspect that if you broke up A-Rod's career into ten game samples, more of them would verge towards his career numbers than the number that would vary wildly. Lined up and graphed, I suspect that all of the 10-game samples would form some sort of bell curve, centered more or less around his career averages.

So, no, I cannot guarantee that over the next few game's A-Rod will replicate his career numbers (on a smaller scale) exactly. But the best guess says that his performance will verge toward those career numbers.

You seem to think that there is practically no relationship between small samples and larger samples. More perplexing, you seem to argue that larger samples have virtually no predictive value.

If what you are arguing is true, then it really doesn't matter who starts in any given game, right? Because whodathunk that Molina has outhit A-Rod so far in the WS...???

66 weeping for brunnhilde ~ Oct 30, 2009 11:56 pm

Andy's going to step up tomorrow.

I have tremendous faith in him these days.

67 Rich ~ Oct 31, 2009 12:01 am

[63] Googling yielded both versions of the quote.

68 OldYanksFan ~ Oct 31, 2009 12:30 am

"the majority of the time, but i suspect that it is more often accurate than any other range you give." --------- yes... a best guess (whether an 'accurate' one or not)

"I suspect that if you broke up A-Rod’s career into ten game samples, more of them would verge towards his career numbers than the number that would vary wildly." ---- Maybe, but it would vary with different players. A consistent guy like Jetes, maybe. Swisher? Maybe not. But it certainly isn't by definition. And certainly, slumps and hot streaks would skew your conclusion.

"You seem to think that there is practically no relationship between small samples and larger samples." ---- There is some relationship, at some times. And over a season, the relationship might be apparent. I'm just saying in any ONE instance, the odds of not conforming to the relationship may be close to 50%. Isn't it widely concluded that small samples sizes have little predictive value?

"More perplexing, you seem to argue that larger samples have virtually no predictive value." Your question: What is the range of 'Predictive? Within 5%? 10%? 20%? Well... Cano batted .270 last year (relatively large sample size) and .320 this year (relatively large sample size). That's like a 17% swing comparing large sample sizes.

But your assertion is basically correct more then incorrect, as stats do have some predictive value. But.... NOT necessarily on small sample sizes (again, in TWELVE PS series (not games, series) ARod has NEVER been within 25 OPS points of his career average. Zero for Twelve!

So I am NOT arguing that stats are valueless. However, when we Banterers jump to the conclusion that Girardi mad a TERRIBLE move because his move was 'wrong' based on a statistically 0.25 Run probability, well.... we are misapplying how stats shoud be used.

Stats are PART of the decision making process, but not all of it.But for we (us?) Banteres, Stats are basically the only part we have. But Girardi has many other factors, some of which may be 'gut feelings', some of which may be valid enough to go against a small statistical advantage.

69 monkeypants ~ Oct 31, 2009 12:55 am

[68] Isn’t it widely concluded that small samples sizes have little predictive value?

Yes. But we aren't talking about using small sample sizes to predict the future (i.e., Player X is 0-2 against Pitcher Y, so I predict that he will not be able to hit pitcher Y in the future). We are talking about using large sample sizes to make predictions about a small number of future events (Player X has hit .320 for his career, so I suggest that he will probably get a hit about once every three ABs next game). The problem is that YOU have it backwards when discussing small sample sizes and predictions.

However, when we Banterers jump to the conclusion that Girardi mad a TERRIBLE move because his move was ‘wrong’ based on a statistically 0.25 Run probability, well…. we are misapplying how stats should be used.

This is so wrong on multiple levels, not the least of which is that a .25 run swing (on average) is huge. As I demonstrated, that amount alone accounted for the difference in average win differential between the Yankees and Angels. But let us forget about this specific case.

You happily would ignore statistics based on notion, as far as I understand it, that just because a player who bats (hypothetically) .333 doesn't literally get a hit once every three BAs, it is inconsequential to bench that person in favor of someone who bats .200. In my mind, it is YOU who are grossly abusing statistics and statistical terminology.

But Girardi has many other factors

That's another discussion altogether.

a small statistical advantage

I think that it is necessary to define what is a "small" advantage. For example, you dismiss out of hand that starting Molina over Posada is a problem, not because you think Molina makes AJ pitch measurably better (that is the better defense of the move), but because it probably doesnt matter all that much because one game is a small sample size.

Yet the offensive drop off from Posada to Molina is perhaps the largest on the team for any two players at the same position. If you see that as inconsequential, then you would (by extension) see no problem in benching A-Rod for Hairston, or DHing Gardner over Matsui.

Cuz' hey it's just one game and it's only a few ABs, and small sample sizes are not predictive so (somehow) they erase essentially all larger bodies of statistical evidence. The implications of your argument are that starters simply don't matter in one game. And I think that is kooky talk.

70 monkeypants ~ Oct 31, 2009 1:00 am

[68] ...as stats do have some predictive value. But…. NOT necessarily on small sample sizes (again, in TWELVE PS series (not games, series) ARod has NEVER been within 25 OPS points of his career average. Zero for Twelve!

More misuse of statistical terminology. They have the exact saem predictive value on small sample sizes as they do large. To use the coin analogy:

A coin is flipped 100,000 to prove that it is weighted evenly, and it comes up heads very close to 50,000 times. I would posit that there is a 50% chance that the coin lands heads on the next flip, and that it lands heads six times over the next twelve flips. That is my prediction.

When the coin comes up tails twelve straight times (0 for TWELVE!!), you would argue that the statistics have no predictive value. Wrong. The stats have the same predictive value, even if a highly unusual outcome happened to emerge over the short term.

71 monkeypants ~ Oct 31, 2009 1:13 am

[68] Cano batted .270 last year (relatively large sample size) and .320 this year (relatively large sample size). That’s like a 17% swing comparing large sample sizes.

Sloppy. First of all, his average over those two years (an even bigger sample) would be .295. His two peaks (.270 and .320) only vary from that mean by about 9%. Second, it is pretty widely accepted that batting averages vary much more than OBP, given the more "random" nature of BABIP. Anyway, let's look at Robbie's entire career, since he has been a somewhat inconsistent player. He has a .306 BA, and the most his BA has varied from that has been about +/- 12% (when he hit .342 and .271). He has a career .339 OBP. The most he has deviated from that is by about - 11% (.305 last year) and +8% (.365 in 2005). He has been, despite his inconsistency, largely consistent. Given that data, I feel pretty confident predicting what Cano will do over the course of next season, within about +/- 10%.

72 monkeypants ~ Oct 31, 2009 1:47 am

So OYF, I am curious. Let's say that it's the 1956 World Series. Would you think that the manager made a terrible move if he, say, sat Yogi Berra in favor of Charlie Silvera for a game. Of how about sitting Mickey Mantle for Norm Siebern? I mean, it wouldn't be a big deal because it's just one game, right? Large data sets have little or no predictive value in small samples, so we would not have any reason to predict that Mantle would be the team's best hitter during the WS. The drop off in offense would probably be negligible for just one game.

73 RIYank ~ Oct 31, 2009 8:33 am

RIYank. Easy enough. Look at Swishers history. Break it into groups of each 20 PAs the effect OPS. Now tell me how many times he gets on base 8 times. I’ll bet it’s less then half.

Of course.
But you said it would be four or twelve as often as it would be eight. That's what's dangerously wrong.

I can’t predict future ACTUALITY, but past ACTUALITY is called history.

Yes. But what could "might actually not probably happen" mean?

Just check out Swishers history, and you will find MANY, MANY small groups of PAs where his OBP for that for group doesn’t conform to his (mathematical) career (large sample size) OBP.

Obviously.
It's what you think this implies that's so badly off.

I'd be interested to see your answer to monkeypants' [72].

74 OldYanksFan ~ Oct 31, 2009 12:39 pm

Mickey Mantle vs. Norm Siebern
Well.... based on the limited data ya gave me, knowing that the Mick was the greatest, I go with Mickey. That's my best GUESS. But if you want me to be more predictive, I would need to know a whole bunch explanatory variables.
Wait...
What's that you say?
Mantle is on CRUTCHES? And he's drunk as a skunk?
(Is that really enough to give Siebern the nod?)

This might seem like semantics, but to me, there is a difference between taking a best, or educated guess, and predicting. To me, 'prediction' implies a greater degree of accuracy then does educated guessing.

I think when using a limited number of stats alone, we are not predicting, but taking a best guess. For example: It rains here 52 days a year on average (I'm making this up). That's an average of once in seven days. If it hasn't rained in 10 days, just using this stat only, I might guess rain tomorrow, as it is due. But when weathermen 'predict the weather', they use various tools to examine OTHER FACTORS (that effect immediate rainfall chances), rather then just historic rainfall patterns.
However, just based on statistical data, I guess it will rain everyday, I will be correct around 14% of the time.

wikipedia: "Predictive analytics is an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns. The core of predictive analytics relies on capturing relationships between explanatory variables and the predicted variables from past occurrences, and exploiting it to predict future outcomes."

"information from data" in baseball is Stats.
"capturing relationships between explanatory variables"
So... what is "explanatory variables?"

In a game, behind a run, should we swap Matsui (.876/.852) for Gardner (.724/.677)
... handedness of pitcher/batter
... history between pitcher and batter
... types of pitches thrown/types of pitchers batter is strong/weak with
... player's defense
... player's baserunning
... player's physical health
... player's mental health
... day/night splits? splits at THAT stadium? Grass/turf splits?
... use career stats... or this years stats..... or last weeks stats (hot or slumping)
... how the player looks TODAY (BP and game so far)
... player's experience
... players's BFOG?
Any other explanatory variables we might look at? Did Gritner get laid last night? Does Marsui have crabs? Could these have any effect on the outcome?

And we know baseball is chaotic... that there doesn't appear to be a pattern to just what effects a player, and what might get him a hit in THIS ONE AB.

Maybe there are UN-explanatory variables that ultimately effect a player's success in a specific AB. Of course, being UNknown, we can't make them part of the equation (is there such a thing as clutch? Being Hot? Being Due?)... but they might effect the ultimate AB.

So, to make a statement of 'prediction' or 'rightness/wrongness' based on 1 or 2 specific stats, and actually be dogmatic about your OPINION, is NUTS, when there are many other "explanatory variables" not being taken into account.

Player (A) gets 180 hits in 600 AB is batting .300
Player (B) gets 150 hits in 600 AB is batting .250
The difference is 30 hits, or ONE hit in every 20 AB.
So, using this stat alone, when thinking about a PHer, a best guess is to use Player (A).
If each player has 1 PH appearence in each of 20 games, Player (A) will theoretically get one more hit (unless PHing ability turns out to be an explanable variable or an UN-explanatory variable)

So Player (A), over 20 games, has a successful AB one more time then Player (B). But in WHICH ONE of those 20 games? Can we predict that based on those 2 BA stats? If we examine ALL the explanatory variables, would this help us 'predict' any one specific AB? Might all those vaiables actually be more valuable/telling then just the .300/.250 BA (or OBP, if you like).

And of 'UNexplanable variables'? Could gut feeling be real? If a coach sees the .250 player looking REALLY GOOD during BP, and his gut says 'use him' (instead of the .300 guy), might that be valid? What if the .300 batter happens to be having a Herpes outbreak that day? (I believe intuition is based on a whole bunch of logical conclusions all smeared together to create one impulsive thought).

I do like stats and believe they must be analyzed in attempting be make better 'best guesses'. I do believe using stats is a big part of the process. But THERE ARE OTHER PARTS.

And we must be aware of something. Stats are PERFECT. They are Never wrong. The formula for BA is hits/AB; 180 dividied by 600 will always be .300. 180 dividied by 600 was .300 2,000 years ago, and will be .300 in 2,000 years from now. Stats are Math, and math is absolute objective (and tastes like bacon flavored chicken).

However, to say that Player (A) is better then Player (B) is a subjective opinion. While .300 is definitely better then .250, on that alone, you can not say with certainty that Player (A) is better then Player (B). Why people say that stats lie... they are wrong. Math is PEREFCT... but our interpretation of stats is subjective, and prone to error.

Also, while the MATH of stats is perfect, the formula may not be. With run expectancy, is it based on 1 years data?. All data sice data tracking? Does the formula have different results using 1960 data as opposed to using 2008 data? And this data is comprised of thousand of thousands of instances, and those instances vary greatly, from the quality of the pitcher, fielder, batter, weather, stadium, and dozens of other factors that effect play and outcome. That stat is one giant smear of data, all of which happened under different circumstances.

So we smear the data from thousands of games together, and it tells us that THIS play will score 0.2 runs more then THAT play (a sac bunt). But that smear is a mathimatical average based on different circumstances. How does that 0.2 runs play into THIS SPECIFIC AB UNDER THESE SPECIFIC CURCUMSTANCES?.

Do you want to make a 'bast guess' base on a new and very generalized stat? Or do you want to look at

Is the guy a good bunter? If all the data in 'The Smear' always had a good bunter up, might the same mathematical formula come up with something other then 0.2? What about if the infield grass is wet? How about if there is a shitty defensive 3rd baseman? Might that effect the probability? What about if the pitcher's landing has his back turned to the batter? How fast is the runner? How good is the 1st baseman at scoops? How good a fielder is the pitcher? How hard is the infield dirt? Grass or Turf? Is the 3rd baseman playing back? How imprtant is trading a higher probability of getting one run for the lesser probability of getting 2 or 3? If every instance in the historical smear has a great bunter who was a speedy runner, wet grass, and poor fielding infielders, would our formula still come out with 0.2?

Are there other factors effecting THIS AB, looking at all the specific data for this AB, that is 'smeared over' in the data used in the formula? And will the guys at BR CHANGE THE FORMULA NEXT YEAR?

Do you guys really believe that 0.2 in sacrosanct? That the formula is sacrosanct?. For many, many years, GMs and managers mostly judged players on a stat call 'Batting Average'. Do you think their conclusions were correct?

Do you believe in ANY ONE (very) SPECIFIC INSTANCE, that the 'interpretation' of one stat (0.2) is absolutely better then the decision made by a human brain (with lots of actual MLB experience, knowledge of the palyers, etc) that DOES look at all the specific factors of THIS AB?

In looking at real predictions, a mathematical MEAN is often more useful then a mathematical AVERAGE. I would like to see some existing stats that deliver AVERAGES also have a sister stat that uses MEANs. Especially ERA. And maybe ALL stats that use BA should instead use the average of BA + BABIP? Maybe a LD/GB/FB% factor further qualify the results? How about a 'Pitchers Quality' factor, that effect both the formula, and again, when we apply the formula to a specific AB with a specific Pitcher?

A pitcher throws 8 nine inning games. Gives up ONE ER in each. His ERA is 1.00. His next game out, he gives up EIGHT ERs before he records and out. His ERA is now 2.00. But his ERM(ean) is very close to 1.00. So which is more predictive of his next outing. 1.00 or 2.00?

And we also know that HITS have a large 'Luck; compionent.... hense that stat BABIP. So formulas that use HITS have some SMEAR factor in them.

So.....
I have lots of extra room here in my house.
Maybe yuz guys should come up for a week. Bring your very best weed, and be prepared for a lot of all nighters, so we can examine all the variables and flaws in Stats, and maybe... maybe come a little closed to the truth to what goes into making ONE managerial decision.

Bottom Line:
Many of us LAUGH at the fact that 20 years ago, players were mostly judged on BA... rather then OBP, or OPS, or OPS+, or wOBA, or any number of other 'new' stats.

And is 20 years, people will laugh at many of the assumptions made here, based on certain stats, when a whole new generation of better stats, with better best guessing probability, is invented.

And in 40 years, when the splitting of the gene yields all kinds of new knowledge, and other advances are made, and some UNexplanatory data become Explanatory data.... people will REALLY have a good laugh at just how sure we all here think we are.

75 monkeypants ~ Oct 31, 2009 1:02 pm

[74] Well…. based on the limited data ya gave me, knowing that the Mick was the greatest, I go with Mickey. That’s my best GUESS. But if you want me to be more predictive, I would need to know a whole bunch explanatory variables.
Wait…
What’s that you say?
Mantle is on CRUTCHES? And he’s drunk as a skunk?

Oh, now you are changing the story. Of course if Mantle is injured, etc. you may opt to go with a backup. But this is NOT relevant to the discussion at hand, which is the way you seem to apply larger data sets to tactical decisions. But before you were arguing (or seemed to be arguing) that the numbers themselves had no predictive value over the short term (in what you termed a small sample).

In other words, as I understand what you were arguing, you seemed to be saying that the fact that Mantle (or whomever) has a 1.000 career OPS, or that he is batting .350 for the season, etc. has practically nothing to tell us about what he probably will do in the next game, and therefore such data can be cast aside as non-predictive in a small sample.

And if that's what you were arguing, I contend forcefully that it is wrong-thinking, plain and simple.

76 Yankster ~ Oct 31, 2009 1:08 pm

Some of you are conflating the value of probability based on average with the more nuanced version of probability based on understanding the probable distribution of values. Average from large samples is more useful for predicting subsequent large sample averages. But the distribution of observations (which can be indicated by deviation from the mean) gives you a much better sense of the probability of a subsequent single event. What I think oldyanksfan and I are saying is that monkeypants is ignoring the distribution and banking on the mean.

The problem is going from the abstract to the specific: What's the event? In my view the reliable elementary event statistically is a single response to a single kind of pitch, not an at bat. But stats are generally discussed at the at bat event level and then that's combined into numbers that are to me confusing in their value, like batting average. OPS is even more confusing given its (in my mind underweighting of on base). (batting average clearly has some strong relationship to the results of individual pitches - I'm just saying I don't know exactly what that relationship conceals).

I don't think that anyone is arguing that given the choice of who to bat between Posada and Molina you pick Molina. The point is that the season's batting average or OPS is less predictive of two world series at bats than the distribution of at bats and definitely much less predictive than the median event in the single pitch conditional probability (if this probability of pitch and this probability of batter reaction, then this probability of the single pitch event outcome).

77 monkeypants ~ Oct 31, 2009 1:10 pm

[74] So we smear the data from thousands of games together...[but how does it] play into THIS SPECIFIC AB UNDER THESE SPECIFIC CURCUMSTANCES?....etc....

All of the contingencies that you pose are measurable, or nearly all of them. And I have no problem with someone bringing additional factors into the equation (e.g.: yes, Hinkse slugs more than Hairston, but Hairston hits lefties much better, etc, etc.).

Again, that is not what you were arguing before. You were arguing that all of these differences are essentially small and meaningless, and that we shouldn't really bother questioning decisions because larger data sets dont really tell us about what is going to happen next.

But even here, you appeal to larger data sets (what a player does against lefties, or in day games, or in certain circumstances). You are simply making the case that certain data sets are more relevant to particular tactical decisions than are other data sets. Still, your entire argument in this post fundamentally contradicts the reasoning you presented in previosu posts.

In fact, you DO believe that larger data sets have predictive value in "small samples," so long as we isolate the most relevant data sets!

I agree!!

78 monkeypants ~ Oct 31, 2009 1:16 pm

[76] In my view the reliable elementary event statistically is a single response to a single kind of pitch, not an at bat.

Interesting!

The point is that the season’s batting average or OPS is less predictive of two world series at bats than the distribution of at bats and definitely much less predictive than the median event in the single pitch conditional probability...

I agree. But all we have are larger data sets on which to make decisions (or predictions, as all managerial decisions are effectively predictions)...unless we defer to "gut instinct" or other nebulous concepts.

Where I disagree with OYF is that he (it seemed to me) glided too easily from "large data sets have less predictive value over a couple of WS at bats" to "and as such they are meaningless, so it doesn't matter who starts." If the larger data sets are so un-predictive as to be meaningless, then it really doesn't matter who starts or what the lineup is for single game. And I don't buy that.

79 monkeypants ~ Oct 31, 2009 1:19 pm

[76] I don’t think that anyone is arguing that given the choice of who to bat between Posada and Molina you pick Molina.

That's not what I am arguing against. Rather, I understood OYF's argument to be: it doesn't matter whom you pick, because how they did all season doesn't tell us much of anything about how they will do this game.

Again, I think that is a problematic approach. Unless I have grossly misunderstood what was being argued, which is distinctly possible.

80 OldYanksFan ~ Oct 31, 2009 4:52 pm

MP... do a page search on the word "meaningless", and please tell me what comment# I said it in. I can't find ANY.

I not sure what you are 'understanding' about what I said, but you are saying words I never said, and making assertion I never made. And I have said in every post that Statistic analysis IS important, and plays a role in decision making. I believe in your effort to reenforce your own view, that you are NOT getting what I am saying.

Yankster pointed out some flaws in the 'simple' way people here are analyzing data, and while presenting it differently (and better) he is basically making the same point I am trying to make about applying large sample size to a SPECIFIC event.

Again, to predict something (as ossposed to guessing) implies a certain degree of accuracy. When you say a bunt is (always?) a bad play because statistically it gives away 0.25 runs, you are assuming that a huge amount of NON specific data can meaningful be applied to ONE SPECIFIC event with a number of specific contingencies, and be somewhat accurate.

This is where I disagree with you. I don't think you have a very high degree of accuracy doing this (bearing in mid that a 50% accuracy rate is totally meaningless.... random guessing yields the same rate.

I brought up a half dozen or more examples of specific contingencies that could effect 'how smart/successful' a bunt might be.

But here's what I am REALLY objecting to.
Many people here call Girardi STUPID on a certain play, and use 1 or 2 stats, without even knowing how accurately they apply, to give weight to their opinion. Some people (who shall remain nameless) even glue all kinds of assumptions to their stats in hopes it further qualifies their opinion.

How about this c example. Here are some facts.
1) Girardi has a number of years of MLB experience as a player.
Most commenters (on all blogs) have none.
2) Girardi has 2 years of MLB experience as a manager.
Most commenters (on all blogs) have none.
3) Girardi has meeting with the Yankees FO, coaches, scouts and other personel involved with baseball decisions.
Most commenters don't.
4) Girardi has personal relationships with the players, sees them everyday, talks to the frequently and witnesses the batting/pitching practice daily.
Most commenters don't.
5) Girardi is paid a lot of money, is carefully watched by his bosses, and most take responsibility for his actioms
Most commenters aren't and don't.
6) Commenters had access to BR.com and other websites to view statisical data in various forms.
Girardi does also, but I'm guessing he ALSO has additional statisical data and analysis provided by the Yankees.

So my question is, statisically speaking:
What are the odds that any given commenter can make better managerial decisions then Girardi?

To read most blogs, the answer seems to be between 50% and 100%
(Yup... Girardi made the RIGHT move there, because I said so. Yup, Girardi made the WRONG move there, because I said so.)
I object to people not only thinking they know more then Joe (and every other manager) and then actually being dogmatic about it, if/when someone provide another point of view.

And frankly, EVERYONE knows Posada is a far better hitter then Molina. This is beyond obvious, and I believe Girardi is even aware of this (do ya think?). To criticise Joe for playing Molina over Posada, using only this ONE piece of information, is beyond shallow. If you don't offer all the many other pieces of qualifying data that apply to the SPECIFIC situation (because the exact same decision may be much more right or wrong depending on the given situation), and analyze the data correctly, then I don't believe your opinion (that's a collective your) carries any weight.

81 monkeypants ~ Oct 31, 2009 5:34 pm

[80] MP… do a page search on the word “meaningless”, and please tell me what comment# I said it in.

That was a paraphrase, an interpretation of various statements like:

[51] "...that any small sample is somewhat random."

[51] "he might have a .400 OBP (Large sample size), but it has little bearing of what might (ACTUALLY, not PROBABLY) happen in his next 20 ABs (small sample size). Yes, MATHEMATICALLY he SHOULD get on base 8 times…. but 4 times, or 12 times, is just as likely ."

[51] "Historical Actuality does not equal future probability in small samples."

[61] "I agree, based on historical data, .925-.975 might be the best guess, but not necessarily an accurate guess the majority of the time."

[68] "I’m just saying in any ONE instance, the odds of not conforming to the relationship may be close to 50%. Isn’t it widely concluded that small samples sizes have little predictive value?" [note: I responded to this above; you have confused terms here, I think, but that is not important at this juncture.]

[68] "But your assertion is basically correct more then incorrect, as stats do have some predictive value. But…. NOT necessarily on small sample sizes..."

[74] "That stat is one giant smear of data, all of which happened under different circumstances."

[74] "And we know baseball is chaotic… that there doesn’t appear to be a pattern to just what effects a player, and what might get him a hit in THIS ONE AB."

and for fun...

[80] "I don’t think you have a very high degree of accuracy doing this (bearing in mid that a 50% accuracy rate is totally meaningless…. random guessing yields the same rate."
====

So, you have consistently argued, with varying degrees of intensity, that large data sets cannot be used to predict what will happen in a small number of events (one AB or a few ABs, for example). You posit thatbaseball is chaotic and lacking predictable patters in such cases. You posit that in any given even (e.g., one AB) the odds of getting a prediction wrong are at least as likely as it is getting it right (in other words, the prediction itself is no more or less certain than random chance). And so on.

And yet you take umbrage at me interpreting your basic argument as saying that large data sets are essentially meaningless in making a tactical decision (a "prediction") about a single AB or even a single game?

How else am I supposed to interpret your argument?

82 monkeypants ~ Oct 31, 2009 5:39 pm

[80] To criticise Joe for playing Molina over Posada, using only this ONE piece of information, is beyond shallow.

And who, precisely, has done that? Every person who posted for or against the move---which really seems to be at the core of your complaint about predictions and small sample sizes---cited (as I recall) several pieces of evidence:

On the offensive side: various averages, RC totals (which encompass several stats), probable numbers of ABs, short term trends (Posada was scuffling some), etc.

On the defensive side: AJ's ERA with Molina catching, AJ's best games caught this season (w/various catchers behind the plate), short term evidence (good and bad starts in the play offs), Molina's superior ability against the running game, subjective evidence (trips to the mound, cross-ups) that suggest the quality of the relationship between C and P, and so forth.

Who criticized Girardi for using "one piece of information"? That's a straw man.

83 OldYanksFan ~ Oct 31, 2009 5:47 pm

"That’s not what I am arguing against. Rather, I understood OYF’s argument to be: it doesn’t matter whom you pick, because how they did all season doesn’t tell us much of anything about how they will do this game. "

Large sample sized Data allows us to make a 'Best Guess' for an action. Making a Best guess is indeed far, far better then making a poor guess, so it does matter and does have value. If I have no other qualifying data, I will ALWAYS play ARod over JHJr. Always. It's a great guess. But it you want to be predictive... meaning actually being able to predict with some reasonable degree of accuracy the outcome of ONE very small sample.... forgetaboutit.

Of course you play ARod, because after years of watching him and collecting data, we know he is a superior ballplayer. Of course you play him..... and the Mick too.

But lets look at some REAL data. History. Stuff we don't have to guess at, because it has already happened.

ARod has a career OPS of .965. This is a large sample size.
ARod had a 1.500 OPS in his last 2 PS series (small sample).
STATISTICALLY speaking, please show me a 'predictive' analogy to account for this.
ARod had a 525 OPS in his previous 2 PS series (small sample).
STATISTICALLY speaking, please show me a 'predictive' analogy to account for this.

It's random. We have NO idea how ARod will do in the next 5 games. It's random. It's random. It's random. It's random. Wanna guess somewhere between .900 and 1.050? great. I agree. Good guess. Good. Guess.
But don't bet the farm on it.

His career stats will not predict anything over a very small sample size. He could have a .525 OPS, or (coincidentally) have a .965 OPS, or a 1.500 OPS, or anything in between. It can't be predicted.

However, over the next 5 years (large sample size), if ARod stays healthy, then my guess he will post similar numbers as his current career numbers, decremented by some sort of aging factor. I believe a large sample size DOES have some predictive accuracy when applied to another large sample size.

In the ALDS, Nick Pinto had a 1.139 OPS. How predictive was his .647 career OPS?
Jeff Mathis. 1.400 in the PS vs a .597 career OPS. Anybody predict that?

84 monkeypants ~ Oct 31, 2009 5:54 pm

[80] 1) Girardi has a number of years of MLB experience as a player.
Most commenters (on all blogs) have none....etc...

To read most blogs, the answer seems to be between 50% and 100%
(Yup… Girardi made the RIGHT move there, because I said so. Yup, Girardi made the WRONG move there, because I said so.)
I object to people not only thinking they know more then Joe (and every other manager) and then actually being dogmatic about it, if/when someone provide another point of view.

One final comment on this thread for me. It is worth noting that some of the major advances in the analysis of the game (the development of new and better statistics, which you cite in one of your longer threads) have been developed by guys like Bill James who---before he was hired on by the Sox---had no major league experience.

I reject the implication that ONLY insiders are allowed to analyze and critique.

Former players like Timmy say stupid things when they are announcers---we all recognize this---and I see no reason why former players who are managing are immune from outdated or mistaken modes of thinking, or blinded by loyalty, etc.

85 monkeypants ~ Oct 31, 2009 6:03 pm

[83] But all predictions are guesses! You take cases where the guesses turn out wrong, and use that to come very close to denying the value of trying to make a prediction.

It’s random. We have NO idea how ARod will do in the next 5 games. It’s random. It’s random. It’s random. It’s random.

But it's not "random." If it's random, then the Yankees should just pull names out of hat when writing out the lineup card for tonight's game.

But you don't REALLY believe that, do you? You know that the odds are better if they play better players than worse players---indeed, you admit as much above, when you say "Of course you play ARod, because after years of watching him and collecting data, we know he is a superior ballplayer."

That very statement, the underlying assumption assumption denies your claim that it's all "random" even in small samples.

Maybe we are just speaking past each other in terminology. I don't think that you are using words like "random" or "predictive" or "small sample sizes" properly, but then maybe I am using them improperly.

86 RIYank ~ Oct 31, 2009 6:06 pm

I seriously don't understand most of the discussion. I'll make one more comment.

Yankster:

Average from large samples is more useful for predicting subsequent large sample averages. But the distribution of observations (which can be indicated by deviation from the mean) gives you a much better sense of the probability of a subsequent single event.

No, that's not right. The average gives you a much better estimate of the probability of a single event than the standard deviation does. The standard deviation is useless.
If one player has batted .350 for the past five years and another has batted .200, and you want to know which one is more likely to get a hit tomorrow, you should rely on the averages. It makes absolutely no difference whether the player with a higher average has a large standard deviation in his average from year to year.

87 OldYanksFan ~ Oct 31, 2009 6:15 pm

[84] I agree. Bill James (and others) are simply coming up with new analysis. This is great. We need more of it. However, I don't think he is making overall jugements on people based on 1 or 2 analytical/probablility assertions.

What is outdated or mistaken? In who's opinion? Is the sac bunt a mistake? Always? Never? Does it depend on the situation? These guys are developing formulas and crunching numbers to come up with general probabilities. Do you think Bill believes that his probablitlies are absolute, and apply in any and all situations? I don't.

88 OldYanksFan ~ Oct 31, 2009 6:22 pm

"But all predictions are guesses!"
Again, this may be about semantics. My definition is that predictions are more accurate then random guesses, and also based on more data and more comprehensive data analysis.

We average 80 inches of snow here every year (I'm making that number up). I could guess that this year, we will get at least 20" of snow. Safe guess, yes? But I'm just guessing (even if I'm correct) and basing my guess on thay one piece of data above,

Guys who predict the weather, look at lots and lots more data, and do more complicated analysis. Weathermen don't guess at the weather based on previous years. They study the science of weather. Their predictions are more accurate then my guesses. Yes?

89 monkeypants ~ Oct 31, 2009 6:30 pm

[87] Do you think Bill believes that his probablitlies are absolute, and apply in any and all situations? I don’t.

I actually think that our archetypal Bill James thinks that his probabilities (or whatever you want to call them) are very broadly applicable and apply to most situations, yes.

To use the bunt, which you have adopted as an ongoing hypothetical example: yes, I think that the Bill Jame's types will tend to think that the bunt is a "bad" play in the great many circumstances in which it has been used historically and continues to be used. They may not be "dogmatic" about it, but I bet they feel pretty strongly about such tactics based on their statistical analysis. I mean, read Rob Neyer...especially his older stuff. He was pretty willing to call out a manager for what he concluded was a bad tactic or strategy.

90 monkeypants ~ Oct 31, 2009 6:42 pm

[88] Again, this may be about semantics. My definition is that predictions are more accurate then random guesses, and also based on more data and more comprehensive data analysis.

They are. But then you take predictions that turn out to incorrect (Molina getting a couple of hits or Posada going 0-4, when the prediction suggests very different outcomes), and then declare that the process is "random" and thus largely invalidate the decision-making process.

Going back to the hypothetical situation of Posada and Molina: Posada should start tonight instead of Molina because large data sets show that he is clearly the better player. I am implicitly making a prediction that Posada will have a much better chance of contributing positively than will Molina. If they play tonight and Posada goes 0-4 or Molina starts and goes 3-4...i.e., that the prediction was incorrect on some level...does not mean that the thinking behind the prediction was wrong, or that the outcome was "random."

91 OldYanksFan ~ Oct 31, 2009 6:50 pm

"But it’s not “random.” If it’s random, then the Yankees should just pull names out of hat when writing out the lineup card for tonight’s game."

2 vastly DIFFERENT issues.

1) I say it's random based on history (see ARod: career and PS performances) as well as seeing that day to day, a players performance can have vast fluctuations, and I can't say which game he gets 3 hits or no hits.

2) However, who we play in an attempt to win has nothing to do with that. We can't control or predict this randomness. However, we can make our BEST ATTEMPT at winning by putting the best players we have on the field. We play the odds. The odds are that better players have a better performance. It a best guess.... a smart guess, given what's in our power. But playing our best players is not a prediction we will win .

Teams with the best players don't always win and the best team does not always (less then 5o% I believe) win the WS. But knowing this doesn't mean we don't try.
You can't predict baseball, right?

You are making TREMENDOUS assumptive leaps based on my statements.

I say: "In baseball, a singular large sample sized data statistic is not particularly predictive of the results of any individual singular event". I am making an analytical statement.

And you take the meaning of this statement as:
It doesn't matter who we put on the field, might as well start the scrubs?

Really?????? Are you just being argumentative?
I mean, I'm not that good at communicating my analytical thoughts, but man............. how do you arrive at these conclusions????

92 monkeypants ~ Oct 31, 2009 6:52 pm

[87] Regarding Bill James and devotees, taking your example of the bunt. Here is what Steve Goldman had to say about Jeter's bunt:

As for Jeter's non-bunt, although the Old Captain is top-20 in double play percentage (17 percent of his chances, worst on the Yankees) giving away outs, as opposed to gambling on the better than 80 percent chance that a very good hitter WON'T hit into one, is not good managing. It was a poor decision by Joe Girardi which Jeter doubled down on by bunting foul with two strikes.

That strikes me as a fairly blanket judgment.

93 monkeypants ~ Oct 31, 2009 6:59 pm

[91] We could go around in circles all day...in fact we have. The degree to which you seem to think that larger data sets tell us very little about small samples (i.e., single games, events)---that the outcomes of these small samples are random and unpredictable---implies inherently that who plays tonight doesn't matter.

But I know that you don't believe that. I know that think it's better to play better players than worse players, because it increases the chances of success tonight. Implicitly you DO believe that small samples are predictable, because you predict that the better players give you a better chance of success tonight, in just a few ABs.

But you refuse to give up the rhetorical structure you have created, and you adhere to certain key terms.

94 OldYanksFan ~ Oct 31, 2009 7:13 pm

If they play tonight and Posada goes 0-4 or Molina starts and goes 3-4…i.e., that the prediction was incorrect on some level…does not mean that the thinking behind the prediction was wrong, or that the outcome was “random.”
--------------------
1) I consider your Posada/Molina situation a guess, even if you say: "I predict..."
2) Moliona could defy the odds and go 3 for 4. Posada could go 0 for 5. We don't know. We can't predict the outcome. It is random.
3) Your thinking was perfect. Given what is within our control, playing Posada over Molina puts the odds of greater production in our favor. It's a best guess, which is corect thinking.

Again, you are confusing my analytical statement for making out a lineup card.

95 OldYanksFan ~ Oct 31, 2009 7:43 pm

"that the outcomes of these small samples are random and unpredictable—implies inherently that who plays tonight doesn’t matter."

ABSOLUTELY, 1 million% wrong.
My statement has NOTHING to do with who plays tonight.
It has nothing to do with tonight, or the Yankees, or what actions should be taken.

This is your problem and why we are going in circles.
What I said...
DOES NOT (No No No NO NO)
"implies inherently that who plays tonight doesn’t matter."

You are simply not getting what I'm saying, and constantly assuming assertions that I am NOT making.

Yankees

Arts and Culture

NYC

Sportswriting

Games We Play

Staff

Memoir

Fourshizzle?

95 comments