Best practices in data analysis and statistics symposium
Slides
We will ask the presenters to upload their slides to the OSF Meetings page of the symposium.
Program
Tutorials
All tutorials start at 9:00. See the appropriate room below. Bring your notebook for the tutorials, with the required software installed (see the required software below for the specific tutorial).
Resampling: doing classical statistics the easy way
Ricardo Vieira (Central European University, Department of Cognitive Science)
9:00, Room 4 on ground floor, optional: install Python 3 on your computer for this tutorial
Bayesian hypothesis testing
Bence Bakos (ELTE Eötvös Loránd University, Institute of Psychology)
9:00, Room 302, install JASP on your computer for this tutorial
Have a different look at your data with CogStat
Attila Krajcsi (ELTE Eötvös Loránd University, Institute of Psychology, Department of Cognitive Psychology)
9:00, Room 303, install CogStat on your computer for this tutorial
Talks
Talks start at 11:00 in Room 4 (ground floor). The talks will be 15 minutes long, and the discussions will be practically unlimited.
General considerations
Little QRP Bestiary
Tamás Nagy (Eötvös Loránd University, Faculty of Education and Psychology, Department of Personality and Health Psychology)
A new common language: StatOkos
Csaba Kazinczi, Emese Alter, Adrienn Holczer (University of Szeged, Department of Psychology)
Who knows what var4 stands for? On the importance of research data management standards
Márton Kovács (Institute of Psychology, ELTE Eötvös Loránd University, Budapest, Hungary)
Software solutions
Interactive visualizations for fundamental statistical methods in R
Bertalan Polner (Department of Cognitive Science, BME)
Starting with predictive modeling made easy: RapidMiner Auto Model cloud application
Balázs Péter Hámornik (RapidMiner/BME)
Specific procedures
Smartphones as data collection devices for perception, action and cognition
Zsolt Palatinus (University of Szeged, Institute of Psychology)
Accessible single-case statistical analyses tools
Attila Krajcsi (ELTE Eötvös Loránd University, Institute of Psychology, Department of Cognitive Psychology)
General panel
In the general panel anyone can raise an issue, ask a question, share important experience, etc., then anyone can reflect on that topic. The point is that problems that are not related to a specific talk can also be discussed here. Cool!
Abstracts
Resampling: doing classical statistics the easy way
Ricardo Vieira (Central European University, Department of Cognitive Science)
(Optional: Install Python 3 on your computer for this tutorial.)
Classical statistical testing is seldom well understood, in great part because too much attention is drawn to the least important aspects of it: what tests work with which kind of data, what are the assumptions of test X and Y, when is one test more appropriate than the other, when are they equivalent, what are their caveats, and so on.
In this tutorial we will circumvent these issues, by constructing our own statistical tests from scratch with just a few lines of code. I will show how resampling from the obtained data (bootstrap) can be used to directly assess the likelihood of any test-statistic under the null hypothesis.
This procedure not only performs as well as most classical tests (Binomial tests, Chi-square, T-tests, Anova, etc.), but it can also be easily (and safely) adapted to novel situations, for which the appropriate classical method is unknown or does not exist, avoiding the all too common fear of ‘choosing the wrong test’.
My hope is that this tutorial will motivate an intuitive understanding of the physical process of random sampling, and of the logic behind null hypothesis testing. At the same time, I will try to demystify the usefulness of the p-value, highlighting the nature of uncertainty in statistical inference, and bringing attention to what matters most in empirical research: clear reasoning, careful experimentation and precise estimation.
Bayesian hypothesis testing
Bence Bakos (ELTE Eötvös Loránd University, Institute of Psychology)
(Install JASP on your computer for this tutorial.)
Bayesian hypothesis testing offers an alternative approach to commonly used frequentist practices. In this tutorial I offer a short introduction to the basics of Bayesian statistics, compare its analytical methods to traditional ways & show a few examples of hypothesis testing using JASP.
Have a different look at your data with CogStat
Attila Krajcsi (ELTE Eötvös Loránd University, Institute of Psychology, Department of Cognitive Psychology)
(Install CogStat on your computer for this tutorial.)
CogStat is an easy to use data analysis software that compiles the analysis automatically and displays and visualizes the results in a way which makes it easier to understand your data. Working with CogStat is efficient: For example, in a typical series of analysis steps it takes 13 steps in SPSS to compare the means of two groups, while it takes only 3 steps in CogStat to run the same analysis. Also, CogStat displays the data in unusual way that gives another insight into your data, e.g., sample and population properties are strictly separated; measurement levels are always displayed on charts; instead of the value, the order is displayed for ordinal variables; “ordinal” statistics are calculated for interval variables; repeated measures data are always connected between the measurements, etc. In this tutorial, we review how to use CogStat, how it is different from other analysis software, and why special display methods makes the data and statistical concepts more understandable. We also review why some solutions are not used, e.g., why mode is not a central tendency index, why range is meaningless for ordinal variables. CogStat not only makes your analysis more efficient, but it also helps to teach statistics, highlighting many details that are usually hard to understand.
Little QRP Bestiary
Tamás Nagy (Eötvös Loránd University, Faculty of Education and Psychology, Department of Personality and Health Psychology)
Scientists have long been aware of questionable research practices (QRPs) that are methods used to alter, obfuscate, or miscommunicate research findings to change the impact of the work. Though only recently did it become clear, that the prevalence of QRPs is much higher than previously thought. Large scale meta-research and reproducibility studies — such as the Many Labs projects — found less than half of the most influential results in psychology from the last few decades replicable. This discovery urged a rapid response in the scientist community, leading to the formation of a reformist movement in the field. As part of this endeavour, special emphasis is laid on the education of current and future researchers about best practices. While teaching best practices is paramount, it may also be important to highlight the most common research mistakes and bad practices to know what to avoid. The current presentation lists and categorizes the most frequent QRPs in psychology, based on current meta-research and expert opinion. Apart from describing QRPs and the harm they may cause, we also offer remedies and current best practices to avoid them. It is important to note that the aim of the presentation is not to blame researchers for previous use of QRPs, but to raise awareness, and to show how open science practices may help to avoid these mistakes.
A new common language: StatOkos
Csaba Kazinczi, Emese Alter, Adrienn Holczer (University of Szeged, Department of Psychology)
As young scholars, we all have faced the challenges of onducting and publishing our research. Acquiring the skills in experimental design, statistics, programming, creating charts and tables is a steep hill to climb. To acquire knowledge, we attended classes, read books or articles and gained some practical experience. Although, there is an agreement about what is the easiest way to open a spreadsheet program, can we say the same about different statistical procedures or editing a chart in APA Style? In addition, different educational institutions teach the basics of research methodology in slightly different ways.
To help future students and anyone interested, we started a new platform in the fields of research methodology, statistics, informatics, and publication. Even though there are countless websites that provide knowledge in the mentioned topics, none of them combines and manages them as a whole system, all in one place. The project, named StatOkos, is trying to open these borders and initiates the information flowing between methodology, statistics, informatics and publication.
Our offer is quite simple: Using your existing knowledge combined with StatOkos can lead you to the answers you seek. In case you do not have any experience at all, you can enhance your knowledge from the very beginning with examples, downloadable contents and practice tasks.
We are continuously developing our website to make it more accurate and professional. For this purpose, we are in contact with some of the psychological institutes in Hungary (ELTE and DTE) to share and exchange information. We hope this will provide the opportunity to create a „new common language” for psychology students and other social and life sciences. We present the current form of StatOkos and we will share our vision for the future.
Who knows what var4 stands for? On the importance of research data management standards
Márton Kovács (Institute of Psychology, ELTE Eötvös Loránd University, Budapest, Hungary)
The past few years have witnessed an increase in data sharing in psychological research, as researchers understood that sharing data is profitable and desirable for their field. Still, for many, one of the reasons for not making their data openly available is that they have never learned how to properly share data. The lack of knowledge behind the practice of making data available and reusable has come to light when a recent study found that only 62% of the openly shared datasets of the inspected articles were reusable meaning that the data were complete and understandable (Hardwicke et al., 2018). In my presentation, I will explore and introduce the most recent developments in the discussion concerning research data management and the best practices to create reusable datasets. I will focus on the methods which could be adopted by labs or individual researchers and discuss the possibility of the development of field-wise data standards.
Reference:Hardwicke, T. E., Mathur, M., MacDonald, K., Nilsonne, G., Banks, G. C., Kidwell, M. C., … & Lenne, R. L. (2018). Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition. Royal Society open science, 5(8), 180448. doi:10.1098/rsos.180448
Interactive visualizations for fundamental statistical methods in R
Bertalan Polner (Department of Cognitive Science, BME)
Understanding the logic behind fundamental statistical methods can be facilitated by dynamic visualizations. These visualizations might be used as demonstrations in introductory stats courses. Moreover, coding such visualizations from scratch could be a fruitful exercise in advanced courses. In this talk, I will use a simple example to show how the power of the R language can be used for such purposes.
Starting with predictive modeling made easy: RapidMiner Auto Model cloud application
Balázs Péter Hámornik (RapidMiner/BME)
Practitioners have their focus on their field’s problems to solve rather than the data science method to use and configure. Starting with predictive modelling is often hard and the value is uncertain that the choice of a predictive model and its predictions would deliver. Facing a steep learning curve in data science, domain experts (e.g. business analysts, researchers, students) can have a hard time matching the question with the right modelling method, implement it, and use the results for solving the problem.
RapidMiner Auto Model cloud application is a web-based guided predictive modelling tool that helps domain experts starting with data science. Using a dataset, it suggests what features to use and it builds multiple appropriate predictive models for the classification or regression problem automatically. All suggestions and results are explained to help the choice of the best model. Evaluating the models with regard to the problem, the expert can choose the best one and use it for answering her question, or further optimise it in RapidMiner Studio to grow her data science skills. The web application presented in this talk provides an easy and guided start for domain experts with predictive modelling while letting the focus stay on the problem.
Smartphones as data collection devices for perception, action and cognition
Zsolt Palatinus (University of Szeged, Institute of Psychology)
Multifractal (MF) analyses are relatively new analytical tools in cognitve science. The general approach is to take continuous displacement or energy expenditure readings from a person or an animal engaging in a cognitive task. The analysis allows for monitoring and predicting transitions, efficiency and disorders without making assumptions about the origins or the site of the emerging behavior. Data for cognitive MF analysis is usually collected with motion or eye trackers, EMG or EEG machines. At the Institute of Psychology at the University of Szeged, we recently explored the possibility of using built in smartphone sensors as data sources for MF inquiries. We started with a series reliabilty tests and then collected pilot data assessing a range of perceptual and cognitive behavior. In this report I share the results of our initial experiments and the paths we envision in using off-the-shelf devices for serious scientific work.
Accessible single-case statistical analyses tools
Attila Krajcsi (ELTE Eötvös Loránd University, Institute of Psychology, Department of Cognitive Psychology)
There has been a tremendous development in statistical procedures for single-case studies (comparing a single case with a group) in the recent decades. Mostly, the group of John R. Crawford invented various methods to analyze single-cases. However, researchers may ignore or misuse these recent solutions. The present work introduces additional accessible computational tools in spreadsheet software packages, in CogStat, and in Python, which tools simplify the use of the appropriate statistical methods in single-case studies.