Thesis Quality

Most successful thesis projects at the UvA are graded 6,5 - 7,5 and the overwhelming majority are between 6,0 - 8,0. This page explains some of the features that might lead a thesis project to be scored at 8,0 or above. These are not requests: they are invitations if students are ambitious and want greater challenge. A key challenge in following this advice is writing concisely within the length requirements.

First, carefully note the rubric in your student manual. For example, see this image from the Social Psychology BA thesis manual, section Scientific Reasoning —>

The below is INFORMAL ADVICE. Follow the official rubric and manual, which are the only and final rules for official grading. Thesis projects will only meet these suggestions in rare cases, and will never meet all of them. Consider these aspirational stretch goals.

Writing (see Writing page)

  • Very high quality writing is shown at multiple levels: structural (sections and narrative), paragraph (specific purpose, minimum of circling or repetition), and sentence (clear statements, no extra words, minimum of passive voice).

  • Excellent writing depends on many rounds of editing and restructuring. There is no shortcut for this.

  • Justification: why is this research being done? Explain (don’t state) why the research can be useful. E.g., it addresses a specific problem in the existing literature. That the topic hasn’t been studied in a specific way—a gap—is a weak justification. We also haven’t studied whether people with long noses are more pro-environmental, but it is worthwhile? The justification should be based on what that comparison will enable us to do. See this list of common justifications and when they are appropriate.

Theory and Previous Literature (see Literature Search page)

  • High independence in theory development

  • Deep literature search. Most thesis projects will step through various previous findings and then form a hypothesis. Instead, deeper searches reveal tensions in the literature, theoretical gaps (rather than empirical gaps), and new syntheses that combine findings or approaches. Innovative projects make creative use of concepts and effects.

  • Use reproducible search methods, i.e., enable someone to reproduce your work exactly with specific terms & datasets. See the term ‘systematic review’.

  • Nuanced coverage of previous findings including their designs and methods (strengths and limitations).

  • Low jargon: avoids undefined terms and implied mechanisms. See our paper on assumed essences (Brick et al., 2021).

Screenshot 2020-11-06 at 09.37.39.png

Methods (see Cleaning & Analysis page)

The below are mostly about reliability and validity of measures, tests, and inferences.

  • The size and quality of the sample and the measures are important, particularly if selected/measured by the student. For example, does the study talk about behavior but use self-report rather than observed behavior? The method quality matters, but also very much the concordance between the research question, method, and interpretation.

  • The analytic plan is well-specified, e.g., plans for outliers, missing data, violations of assumptions such as distributions, etc. The analytic plan has a time-stamped pre-registration, e.g., at OSF (add me as an administrator in the project). Anonymized data and reproducible cleaning and analysis code are posted at OSF or similar.

  • The analysis methods should be no more complex than necessary for the hypotheses. Another path to methods sophistication is that the student taught themselves a difficult analytic technique or tools such as tidyverse or RMarkdown. Regardless of which analyses and techniques are used, including t-tests, they are interpreted with nuance and attention to their advantages and disadvantages.

  • A supplement is included with best-practices analysis steps such as visualizing the data (see Yanai 2020). Use histograms or plots to investigate distribution normality for meeting the assumptions of common linear tests. Demonstrate all assumptions are met behind the planned tests, or explain why deviations are acceptable.

  • When multi-item scales are used for composites, attention and nuance is given to dimensionality (e.g., factor analysis) and plans for violations.

  • Innovative design and/or identification of existing data. The design follows from the research question rather than the typical approach of fitting the research question into a familiar pattern like moderation or mediation.

  • Well-powered design with power analysis including a justified smallest effect size of interest (see Lakens 2018).

  • Nuanced presentation and interpretation of alpha, p-values, effect sizes, and confidence intervals, including multiple comparisons correction if appropriate. Clear separation throughout project of confirmatory vs. exploratory testing.

  • Causal language is used carefully and skeptically in all cases. Mediation is a problematic technique. It provides rather less evidence for causality than is commonly assumed. For example, without a manipulation of the mediator or strong evidence that the mediator cannot precede the predictor, it can be quite misleading. See Bullock 2010 and Spencer 2005.

  • Interactions are explored with nuance and clarity, e.g., differences between slopes vs. contrast effects. Special attention is paid to power for moderations, which appear to require 8-16x the sample size for a main effect, especially when the moderation is attenuating rather than crossover. See Simonsohn 2014, Gelman 2018, Gines-Sorolla 2018, and Leon 2009.

Results

  • The results are structured such that the reader can easily answer the main questions posed by the introduction. No steps are skipped, e.g., raw means and SDs are included, and relationships between predictors are assessed for multicollinearity via correlation or similar. Results of secondary importance can be placed in the Supplement and mentioned in the main text.

  • Data are visualized expertly, for example including error bars and the underlying distribution of jittered data points rather than just trend lines. All tables and figures have legible and appropriate text, axis labels, legends, etc. All tables and figures are easy to navigate and understand. Is the font size big enough? Is there extra ink on the page that isn’t necessary? Is the legend easy to connect to the trend lines, etc.? Pursue readability and comprehension.

Discussion

  • All projects have limitations; merely listing them is only half-way to helping the reader interpret the meaning of the findings. Instead, describe what those limitations mean for the current interpretation and give recommendations for future research. These suggestions can be trivial, like advice to collect a bigger sample, or they can be deep and thoughtful based on your observations across the literature.

  • Summarize the results briefly. More importantly, focus on the most important effects and how they inform specific questions or theories from the introduction and previous work. Similar to the Intro, the Discussion should integrate with previous literature and prepare future research. For example, many projects say that “the current results inform efforts to boost sustainability” or similar. This is vague and shallow compared to explaining how the results can be used for what specific goals, either in basic science or in application like by governments.

  • Speculates appropriately about potential mechanisms of observed effects.

  • Specifies future research designs, populations, and contexts.

  • Excellent description of generalizability, e.g., Simons 2017.

And that’s it—easy peasy, right? No, not at all. These are difficult goals. I hope they go some distance to concretely explaining why most projects do not score 8,0 or higher, and how to get there if desired.