向量積判斷優劣弧
There are a host of pathologies associated with the current peer review system that has been the subject of much discussion. One of the most substantive issues is that results reported in leading journals are commonly papers with the most exciting and newsworthy findings. The problem here being that what might be novel and newsworthy for some may be overreaching with questionable validity for others. The ability to publish ‘sexy’ findings with questionable validity is often facilitated by a variety of problems in the research design, such as small samples and the winners curse, multiple comparisons, and the selective reporting of results.
與當前的同行評審系統相關的許多病理已成為許多討論的主題。 最實質性的問題之一是,領先期刊報道的結果通常是具有最令人興奮和新聞價值的發現的論文。 這里的問題是,對于某些人來說可能新穎而有新聞價值的東西可能超出了人們的理解范圍。 研究設計中經常會出現各種問題,例如小樣本和獲勝者的詛咒, 多次比較以及對結果的選擇性報告等,都有助于發布具有可疑有效性的“性感”發現。
Fortunately, these issues have been the subject of much discussion and self-reflection amongst scientists across all disciplines. While career incentives may lead to researchers being careless with their analysis in order to publish exciting findings, most often issues are the result of misinformation coupled with cognitive biases such as confirmation bias which we are all susceptible too (e.g. we tend to only see the evidence we want to see) rather than any malfeasance. Ultimately, I feel much of this is a problem with statistics education and more generally a focus on the teaching of a technique as opposed to problem-orientated skillset without the appropriate focus on critical thinking skills. Well certainly upon graduation I could do all manner of analytical techniques without really understanding what I was doing!
幸運的是,這些問題已成為所有學科的科學家之間討論和自我反思的主題。 盡管職業動機可能會導致研究人員為了發表令人振奮的發現而粗心地進行分析,但大多數情況下,問題是錯誤信息加上認知偏見(例如確認偏見)的結果 ,我們也容易受到影響(例如,我們傾向于只看到證據)我們希望看到),而不是任何瀆職行為。 最終,我覺得這在統計學教育中是一個很大的問題,更籠統地說是側重于技術的教學,而不是面向問題的技能組,而沒有適當地關注批判性思維技能 。 可以肯定的是,畢業后我可以做各種分析技術,而無需真正了解自己在做什么!
Much of the focus in this area has correctly been on trying to get researchers to change their behaviour by being more reflective and also transparent in the presentation of their methods. Less focus has been placed on the behaviour of reviewers and editors. What should reviewers be on the lookout for? It can be hard to distinguish between novel yet valid results versus those of questionable validity particularly for those without a great deal of experience of working with data. This blog is an attempt to provide some general rules to guide reviewers.
這個領域的許多焦點正確地集中在試圖使研究人員通過在方法的介紹中更具反思性和透明性來改變他們的行為。 很少關注審稿人和編輯的行為。 審閱者應注意什么? 很難區分新穎卻有效的結果和有疑問的有效性,尤其是對于那些沒有大量數據處理經驗的人。 該博客試圖提供一些一般規則來指導審稿人。
In particular, in the spirit of the insightful and engaging 10 commandments of applied econometrics I have prepared the 10 commandments for reviewers. It is not an exhaustive and by no means complete list and aimed predominantly at empirical as opposed to purely theoretical papers. One of the main motivations if not the main one for how researchers structure and write their paper is that the approach they follow is what they deem most likely to be deemed a publishable paper by editors/reviewers. Researchers respond to incentives, and I suggest that if reviewers follow these simple steps we can change the ‘rules of the game’, and therefore, ultimately change submission practices and behaviour.
特別是,本著敏銳而引人入勝的應用計量經濟學十誡的精神,我為審閱者準備了十誡。 它不是詳盡無遺的清單,也不是完整的清單,并且主要針對經驗性論文,而不是純粹的理論論文。 研究人員如何構造和撰寫論文的主要動機(如果不是主要動機)之一是,他們遵循的方法是他們認為最有可能被編輯/審閱者認為是可發表論文的方法。 研究人員對激勵措施做出了回應,我建議,如果審稿人遵循這些簡單的步驟,我們可以更改“游戲規則”,從而最終改變提交行為和行為 。
1. Be more open to uncertainty: Reviewers at least in my experience have a preference for strong statements regarding causality but everything is not always so black and white. It should be okay (encouraged even) for authors to present their findings as suggestive, acknowledge limitations and suggest what future work is needed. Increasingly what reviewers demand, however, is unfailing certainty. Often researchers aim at demonstrating ‘proof’ of the estimated effect and through a series of robustness checks demonstrate no other possible alternative interpretations of the findings. This in turn can lead authors to overstate their findings for fear of being punished by reviewers if they were more circumspect.
1. 對不確定性更開放 :至少在我的經驗中,審稿人傾向于對因果關系做出強有力的陳述,但并非總是如此。 作者應該將其發現表現為具有啟發性,承認局限性并提出未來需要開展的工作,這是可以的(甚至鼓勵)。 但是,審稿人越來越要求的是確定性。 研究人員通常旨在證明所估計效果的“證明”,并通過一系列的穩健性檢查證明對結果沒有其他可能的解釋。 反過來,這可能會導致作者夸大其發現 ,因為擔心如果審慎的話,他們可能會受到審稿人的懲罰。
2. Be more accepting of small/modest effect sizes: Not every study will demonstrate that changes in the key variable of interest will lead to big changes in the outcome under examination. Indeed most will not or at least should not. Research proceeds incrementally and demanding large effect sizes is unrealistic. The main problem here is that effect sizes can often be presented in a number of different ways and such expectations distorts incentives so that authors can find creative ways of presenting their estimated effects as ‘large’. While publication should not depend on demonstrating large effect sizes, neither should effect sizes be so small as to be trivial and unimportant. Statistical significance should not be enough and apart from a problem regarding the reporting of the actual magnitude of effect sizes, an issue that is just as problematic is that it is relatively common to make little effort to report effect sizes at all. This should also be discouraged by reviewers.
2. 接受較小/中等的效應量 :并非每項研究都表明所關注的關鍵變量的變化會導致所檢查結果的較大變化。 確實,大多數人不會或至少不應該。 研究是逐步進行的 ,要求大的效應量是不現實的。 這里的主要問題是效果大小通常可以用多種不同的方式表示,而這種期望會扭曲激勵,因此作者可以找到創造性的方式將其估計的效果表示為“大”。 雖然發布不應該依賴于展示較大的效果尺寸,但效果尺寸也不應該如此之小,無關緊要。 統計意義不應該是足夠的,除了關于效果大小的實際大小的報告方面的問題外,同樣有問題的問題是, 很少花力氣報告效果大小是相對普遍的 。 審稿人也不應鼓勵這樣做。
3. Don’t be fooled/impressed by complexity: Econometric complexity should not be mistaken for rigour. Simple analyses are often not only easier to understand and communicate but also less likely to lead to serious errors or lapses. If complicated models are needed then ensure that the researchers have presented all the necessary detail so that the ‘technical detail’ can be readily understood. Some examples of prudent questions to ask depending on context might include: What does the simple bivariate relationship look like? Do the results hold up even without the somewhat strange looking functional specification? What do the results look like from a simple comparison of the treatment with the control group before the addition of control variables?
3. 不要被復雜性所迷惑/打動 : 不應將計量經濟學的復雜性誤認為是嚴格的 。 簡單分析通常不僅易于理解和溝通,而且不太可能導致嚴重的錯誤或失誤。 如果需要復雜的模型,請確保研究人員已經提出了所有必要的細節,以便可以輕松理解“技術細節”。 根據上下文提出的審慎問題的一些示例可能包括:簡單的雙變量關系是什么樣的? 即使沒有看起來有些奇怪的功能規范,結果是否仍然有效? 通過在添加控制變量之前與對照組進行簡單比較,結果如何?
4. As a natural extension to the above, apply the laugh test: Apply what Kennedy (2002) refers to as the ‘laugh’ test, or what Hamermesh (2000, p. 374) calls the ‘sniff’ test: ‘ask oneself whether, if the findings were carefully explained to a thoughtful layperson, that listener could avoid laughing’. Sometimes if the results appear to be too good to be true, then often they are.
4. 作為上述內容的自然擴展,請應用笑聲測試 :應用肯尼迪(Kennedy(2002))稱為“笑聲”測試或哈默梅什 (2000, p。374 )所謂的“嗅探”測試:“問自己”如果將調查結果仔細地告知有思想的外行,該聽眾是否可以避免笑”。 有時,如果結果似乎太好而無法實現,那么往往是這樣。
5. Ask the right questions: Some potentially useful questions that I find myself commonly asking include: Is the explanation for why the results are only observed for a particular sub-group or in a particular situation plausible? Related to this point I often find myself asking is the explanation presented being derived to fit the results (observational data is inherently noisy!) or is it reasonable for the authors to have had these priors in advance. Are the substantive results greatly impacted by adopting different procedures that seem more sensible? This might include changes to the functional form or selection of control variables. I want to emphasise that these questions should not be designed to ‘null hack’ findings away (also consider point 10 here) but rather get a general sense of how sensible the analysis is and whether conclusions are warranted.
5. 提出正確的問題 :我發現自己經常提出的一些潛在有用的問題包括:為什么僅針對特定小組或在特定情況下才觀察到結果的解釋合理嗎? 與此相關的是,我經常發現自己提出的解釋是為了適合結果(觀測數據本質上是嘈雜的!),或者作者事先擁有這些先驗是否合理 。 采用似乎更明智的不同程序會極大地影響實質性結果嗎? 這可能包括更改功能形式或選擇控制變量。 我想強調的是,這些問題不應被設計為“廢除”發現(在這里也要考慮第10點),而是應該對分析的合理性和結論是否必要有一個普遍的認識。
6. Don’t be fooled by or ask for too much by way of robustness checks: Robustness checks can be important but they should not be needed to persuade you of the veracity of the main findings. It is perhaps also worth noting that from an author perspective, suggesting additional robustness checks can add substantial time with often little by way of additional benefit.
6. 不要被魯棒性檢查欺騙或要求太多 :魯棒性檢查可能很重要,但是不需要它們說服您主要發現的準確性。 從作者的角度來看,也許還應該指出,建議進行額外的健壯性檢查可能會增加大量時間,而帶來的好處卻很少。
7. Don’t discriminate: Judge the research on its merits, not by whether it’s a topic you like or comes from big name researchers or institutes. Publication bias is such that the people doing careful and considered research may not always be ones with big ‘reputations’.
7. 不要歧視 : 根據研究的優劣來評判研究 ,而不是根據它是您喜歡的話題還是來自知名研究人員或機構而來。 出版偏見使得認真研究和認真研究的人們可能并不總是具有“聲譽”。
8. Related to the point above and one aimed at editors don’t desk reject based on how newsworthy the reported findings are.
8.與上述觀點有關,一個針對編輯 不要根據報告的結果有多新聞來拒絕 。
9. Be wary of grandiose statements, convoluted language or abstract theoretical frameworks.
9. 警惕宏偉的陳述 ,令人費解的語言或抽象的理論框架。
10. Don’t obsess over p-values: As reported by the American Statistical Association, “A conclusion does not immediately become ‘true’ on one side of the divide and ‘false’ on the other”. A p-value coupled with an estimated effect size conveys some useful information but it should not be the be all and end all. As the ASA statement concludes, “No single index should substitute for scientific reasoning.” Personally speaking, there are many things I look for in judging the scientific merits of a paper (many of which are discussed above), but the actual reported p-value, whether it be .01 or .10 is far down the list.
10. 不要迷戀p值 :如美國統計協會所報道,“結論不會立即在鴻溝的一側變成'true',而在另一側變成'false'”。 一個p值加上一個估計的效應大小可以傳達一些有用的信息,但它不應該是全部和全部。 正如ASA聲明得出的結論:“ 任何單一指標都不能替代科學推理 。” 就我個人而言,在判斷論文的科學優點時,我需要做很多事情(上面討論了很多),但是實際報告的p值(無論是0.01還是.10)都遠遠不夠。
翻譯自: https://towardsdatascience.com/10-commandments-for-judging-the-merits-of-an-empirical-paper-862100b6b32d
向量積判斷優劣弧
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/387957.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/387957.shtml 英文地址,請注明出處:http://en.pswp.cn/news/387957.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!