Accused Harvard Professor Claims Innocence! (Fake Data Scandal)

preview_player
Показать описание


Hey! Thanks for watching today's video. Anticipating certain comments, so here are some points I didn't cover in the video:

1. Why the anomalies in the first place?
The study was done by students on paper, then copied over to Excel. Gino argues that the RA who collected the papers roughly sorted them into conditions but not perfectly, so they were imported as such. Plausible story, but admittedly convenient and sloppy.

2. What about CalcChain?
If you read Gino's defence, she explains that CalcChain anomalies that DataColada show as evidence can happen from all kinds of different Excel operations, not just from manipulating data. CalcChain does not show the document in its previous state as they suggest. This is not the smoking gun we thought it was.

Follow me:
Behavioral Science Instagram: @petejudoofficial
Instagram: @petejudo
Twitter: @petejudo
LinkedIn: Peter Judodihardjo

Good tools I actually use:
Рекомендации по теме
Комментарии
Автор

I'm afraid I don't. Removing the suspicious data points and saying "look it's still significant" isn't that compelling. It assumes the obviously suspicious data points were the only manipulation. The fact that any part of the data set looks (very) dodgy means the whole set should be regarded with suspicion.

stumpy
Автор

Shouldn’t there just be *zero* manipulated data points

Archimedes
Автор

But this "explanation" doesn't explain why there was suspicious data in the first place. Why would some students have duplicate id numbers and why would she switch the rows around? Her defense that here were even more duplicate IDs that they didn't find is rather weak. It just proves that there's more manipulation than they caught. At best, she is extremely careless with her data analysis and it's not up to the standard for a professor at a university.

nobodyqwertyu
Автор

I’m not at all convinced by this “arguments”. The guys at Data Colada never said they had found an exhaustive list of false entries, so I don’t think it’s really that important that removing the points doesn’t get rid of the full effect, the tendency of the change is good enough.

It also stands to reason that, given how badly concealed the supposed manipulation was, the reason for having those entries amongst the other ones is that those values were *edited*, not created anew.

If the previous values were evidence against the hypothesis, changing them for values that support the hypothesis will have this exact effect (removing the points still keeps the effect significant).

theondono
Автор

"beyond a reasonable doubt" is only a standard for criminal charges. for academic fraud "really really fishy" is well enough.

alexmikhylov
Автор

No, I'm not convinced. They pointed out the most obvious ones. The question is why are there 'irregularities' like that in the study at all? Could it be that she just did a better job on hiding the ones?

quaest
Автор

This is far too generous. Saying "these results would hold if we removed the suspicious data points" only matters if the allegation is that these data points were manufactured out of nothing. I believe Data Colada suggested that the points were moved from one condition to the other, in which case removing them would be dropping all of the points that tended not to support her hypothesis, so this would still be cheating. Additionally, the reasonable doubt standard is used in criminal trials when someone could end up in prison. In a civil trial, which is much more similar to this case, the standard is "a preponderance of evidence" (at least in the US)

Aigul
Автор

She hired a PR firm and is trying to abuse the legal system to intimidate whisteblowers. This should not be rewarded by "reserving our judgement and looking at both sides of the debate", it should be strongly condemned.

lukas_
Автор

The argument that removing the suspect points doesn't change the overall trend of data isn't as convincing as you think because it assumes, firstly, that the obviously suspect data points are the only ones that were manipulated and, secondly, that these points were added. If the professor chose to alter the data points that most conflicted with her desired outcome, then simply removing those points will not show the original outcome of the study.

jacksonallan
Автор

Assuming innocence until proven guilty does not mean that one has to interpret any argument in her favor.

marcfelix
Автор

I'm not convinced. "If I removed the suspicious data I get same result" is nonsense. p<0.002 is way more significant than arbitrary p<0.05 (which is DEFINITION of arbitrary). Astonishingly so. Bigger conclusions require bigger proof and anticipate bigger scrutiny. She got lazy, too big to believe she was subject to scrutiny and dissent. Common for overpaid prima donna professors in many, many schools. A bank can be "too big to fail". A professor can too. Maybe.

She cannot hide behind the "suspect data didn't matter" argument. Is this your data or not? Logically that's the "I robbed the bank but there was no money in there so I'm innocent".

Finally, if this person "gets away with it", what's the message to ALL professors. Wiggle the data, nobody is watching, nobody will notice. Harvard sure isn't watching. Best school in the world? They have no idea what their employees are doing. Coasting on reputation and rumor of glory.

Note I'm an engineering PhD. I remember months trying to reproduce data, trying to control fickle instruments and low-quality equipment. I could have faked small parts of it and been done a whole lot quicker. So this professor and her troubles really irritates me. Really.

press
Автор

I want to hear an explanation of why these data ordering and duplicate anomalies are there in the first place. A more convincing case against their approach would have been to explain how these got there in the first place, which eliminates the “suspicious” label altogether. And the motive is obvious: an ordinary significance vs. an extremely high significance will make the paper more compelling.

JonKPowers
Автор

She needs to show that the data was not manipulated by providing an innocent explanation for those unusual rows. The argument you are forwarding from her website is that since the flagged rows only increase the effect, rather than are responsible for it entirely, she must be innocent, but that presumes that the other data is not manipulated - which we have reason to doubt now. The presumption of innocence is lost when there are signs of manipulation, and she needs to address those signs. I agree it may feel less clear cut, but Science is built on trust, and without an explanation for altered data in this pattern we cannot continue with that trust in her.

spshkyros
Автор

Wait a minute:

DataColada flagged suspicious data, and now she pointed more suspicious data

Is this a TikTok where people are registering on camera their own crimes? 🤣

TripImmigration
Автор

I disagree with your overall assessment. Her main argument is that data colada didn't identify all of the problematic rows in their original blog post, which could theoretically introduce bias.

However, the Harvard investigation had access to the original files before they were tampered with. Data colada did a recent blog post on September 16 that you didn't mention in your video, where they discuss some of the screenshots from the original files that became available in the course of the trial. Apparently, it could be confirmed that all 8 rows mentioned in the original blog post were meddled with, while they missed an additional 3. Furthermore, there were many more alterations that affected individual Excel cells.

Francesca attacking the blog post is irrelevant and a distraction. She needs to explain all the discrepancies between early and late versions of the data files for her studies.

lukas_
Автор

Elaborating that the data pool contains even more dodgy data results and then accuse Data Cokada of "cherry-picking" examples is no argument in her favour. Not at all.

Christian-quzi
Автор

7:30 "Ha! You missed another examples that shows I was dishonest, checkmate!"

switted
Автор

It seems that her explanation is that regardless of data manipulation her results are the same so no harm no foul. Though the issue is manipulating data in the first place. Arguing that removing the allegedly manipulated data still results in her findings doesn't address the issue of manipulating the data.

charlie-qhll
Автор

I still have questions about how exactly the participant ID's came to be duplicates... I'm a Ph.D. student who runs a department's student pool at an R1, and it's EXTREMELY odd to me that the ID's were duplicated. In our pool, we give each student a unique 5 digit anonymous ID to use for studies. Then, the researcher adds sequential ID's to the data and removes the initial 5 digit number. SO: The researcher would have to use a generate variable command to create the ordered ID's. In Stata, this command would be "gen subject= _n". I would want to know exactly how they managed to have duplicate responses as well... That seems SO suspicious, or at least it seems they don't know how to use online survey software. Basically, I have many questions about the process, not just the overall results...

spqueue
Автор

"Why would I lie if the data would have proven I was right"
"Why would a pro athlete dope? why would a famous gamer cheat?"
Because the pressure of society causes those of mild success to strive for greater renown, wasn't there an entire other story in this same channel where the other professor who faked studies literally admitted to this reasoning?

Spoopball