Can You Remove Outliers From Your Dataset?

by | Apr 6, 2026

YouTube video

🎯 The Short Answer: Yes, you can remove outliers, but only using a standardized, mathematically defensible method like the IQR (interquartile range) test. Always document and cite your approach to maintain transparency and credibility.

One of the most common concerns we hear from postgraduate researchers is whether it’s okay to remove statistical outliers from their dataset, and more importantly, how to do it without looking like they’re manipulating their results. It’s a legitimate worry. After all, removing data can feel suspicious, even if it’s done for all the right reasons. The good news? You absolutely can remove outliers, but you need to do it the right way.

πŸ” Why Outliers Matter in Your Research

Outliers are data points that sit far outside the normal range of your dataset. They can occur for legitimate reasons (like a measurement error or an unusual case that doesn’t represent your population) or sometimes they’re just part of natural variation. The problem is that outliers can skew your statistical results and make your findings less accurate. However, simply eyeballing your data and deciding “that looks weird, I’ll remove it” is never acceptable. Your examiners will spot this immediately, and it raises serious questions about the integrity of your research.

The key is using a standardized, mathematically justified method that you can defend in your methodology section.

πŸ“ The IQR Test

The most widely accepted method for identifying outliers is the Interquartile Range (IQR) test, also known as the two-key outlier test. This method is established, commonly used across disciplines, and it’s straightforward to apply. Here’s how it works:

  1. You calculate the interquartile range (the difference between your third quartile and first quartile), multiply it by 1.5
  2. You then add that value to your third quartile.
  3. Any data point above that threshold is mathematically classified as an outlier.

The same process works in reverse for your lower quartile. This isn’t arbitrary or subjective; it’s a formula you can cite and defend.

There are plenty of tutorials online and in statistics textbooks that explain this method step-by-step. The beauty of using the IQR test is that it’s transparent, reproducible, and recognized across academic fields. When you use this method, you’re not making a judgment call; you’re applying a standardized statistical procedure.

✍️ Document and Explain Every Removal

Here’s where many students slip up: they remove outliers during data cleaning but then don’t mention it in their methodology. This is a huge red flag. Even if you use a perfectly legitimate method like the IQR test, failing to disclose that you removed data introduces uncertainty about your dataset and makes your research look less trustworthy. We often see this issue come up in ourΒ coaching sessions, and it’s easily preventable with clear documentation.

In your methodology section, you need to be explicit about what you did. Write something like:

I identified outliers using the Interquartile Range test as described by [cite a relevant source]. Using this method, I removed X data points from the original dataset of Y observations. The removed cases were [briefly describe what made them outliers].

This transparency actually strengthens your research because it shows you’ve thought carefully about your data quality and you’re not hiding anything.

🎯 Cite Your Method and Check Your Field’s Norms

Different academic disciplines sometimes have slightly different conventions for handling outliers, so it’s worth checking what’s standard in your specific field. Some disciplines are stricter than others, and your supervisor will expect you to follow disciplinary norms. Once you’ve identified the appropriate method for your field, cite it properly. This might be a reference to a statistics textbook, a methodological paper, or guidance from your disciplinary association. The citation shows that you’re not inventing a new approach; you’re following established practice.

When you cite your outlier removal method, you’re essentially saying to your examiners: “This is how researchers in my field handle this situation, and I’ve applied that standard approach.” It’s a powerful statement because it demonstrates both competence and integrity. You’re not trying to hide anything; you’re following best practices.

βš–οΈ Transparency Is Your Best Defense

The underlying principle here is simple: transparency prevents suspicion. If you’re upfront about what you did, why you did it, and how you did it, your examiners will trust your work. They understand that data cleaning is a normal part of research. What they won’t tolerate is the appearance of data manipulation or hidden decisions that could bias your results. By using a standardized method, documenting it clearly, and citing your approach, you’re demonstrating that you’ve handled your data responsibly.

πŸ“Œ Key Takeaways

  • Use the IQR test to mathematically identify outliers, not subjective judgment.
  • Always document and explain outlier removal in your methodology section.
  • Cite your method to show you’re following established, disciplinary best practices.
  • Transparency about data decisions builds credibility and prevents suspicion.
  • Check your field’s norms as outlier handling conventions may vary by discipline.

P.S. Join our next Live Q&A Session to get your questions answered, for free!

Don’t stop now…

Which Qualitative Analysis Software Is Best?

Which Qualitative Analysis Software Is Best?

🎯 The Short Answer: If you're working with a large volume of data (50+ interviews or 200+ documents), dedicated software like NVivo or Dedoose is (potentially) worth the investment. For smaller projects, a simple spreadsheet approach works just fine. One of the most...

What Do Dissertation Markers Really Want?

What Do Dissertation Markers Really Want?

🎯 The Short Answer: Dissertation examiners want to see clear, coherent research that demonstrates your methodological competence, strong engagement with existing literature, and honest acknowledgment of your study's limitations. They're looking for evidence that you...

Qualitative Analysis 101: The Big Picture Process

Qualitative Analysis 101: The Big Picture Process

🎯 The Short Answer: Qualitative analysis follows a structured process: collect and record your data, clean and verify your transcripts, code your data to identify patterns, organize codes into themes, and then write up your findings with supporting quotes. If you're...

Too Many Qualitative Codes? Here’s What To Do.

Too Many Qualitative Codes? Here’s What To Do.

🎯 The Short Answer: Start by cleaning up similar codes, then create category layers that nest related codes together. This lets you write about the bigger picture while preserving the detailed nuance underneath. If you're sitting on a hundred or more qualitative codes...

How Do I Choose the Right Statistical Test?

How Do I Choose the Right Statistical Test?

🎯 The Short Answer: Start by reviewing the statistical tests you've already learned, then match your research question to what you're actually trying to accomplish (describe, compare, find relationships, or predict). Different tests do different things, so...

What Does P-Value Actually Mean?

What Does P-Value Actually Mean?

🎯 The Short Answer: A p-value tells you the probability of getting your statistical results if there's actually no real effect or relationship (in other words, by chance). It's not the probability that your hypothesis is true, and it's definitely not a measure of how...