Qualitative Data Coding 101
How to code qualitative data, the smart way (with examples).
By: Jenna Crosley (PhD) | Reviewed by:Dr Eunice Rautenbach | December 2020
As we’ve discussed previously, qualitative research makes use of non-numerical data – for example, words, phrases or even images and video. To analyse this kind of data, the first dragon you’ll need to slay isย qualitative data codingย (or just “coding” if you want to sound cool). But what exactly is coding and how do you do it?ย
Overview: Qualitative Data Coding
In this post, weโll explain qualitative data coding in simple terms. Specifically, we’ll dig into:
What is qualitative data coding?
Letโs start by understanding what a code is. At the simplest level,ย a code is a label that describes the contentย of a piece of text. For example, in the sentence:
โPigeons attacked me and stole my sandwich.โ
You could use “pigeons” as a code. This code simply describes that the sentence involves pigeons.
So, building onto this,ย qualitative data coding is the process of creating and assigning codes to categorise data extracts.ย ย You’ll then use these codes later down the road to derive themes and patterns for your qualitative analysis (for example, thematic analysis). Coding and analysis can take place simultaneously, but it’s important to note that coding does not necessarily involve identifying themes (depending on which textbook you’re reading, of course). Instead, it generally refers to the process ofย labelling and grouping similar types of dataย to make generating themes and analysing the data more manageable.ย
Makes sense? Great. But why should you bother with coding at all? Why not just look for themes from the outset? Well, coding is a way of making sure yourย data is valid. In other words, it helps ensure that yourย analysis is undertaken systematicallyย and that other researchers can review it (in the world of research, we call this transparency). In other words, good coding is the foundation of high-quality analysis.
What are the different types of coding?
Now that weโve got a plain-language definition of coding on the table, the next step is to understand what overarching types of coding exist – in other words, coding approaches. Letโs start with the two main approaches, inductive and deductive.
With deductive coding, you, as the researcher, begin with a set ofย pre-established codesย and apply them to your data set (for example, a set of interview transcripts). Inductive coding on the other hand, works in reverse, as you create the set of codes based on the data itself – in other words, theย codes emerge from the data. Let’s take a closer look at both.
Deductive coding 101
With deductive coding, we make use of pre-established codes, which are developed before you interact with the present data. This usually involves drawing up a set ofย codes based on a research question or previous research. You could also use a code set from the codebook of a previous study.
For example, if you were studying the eating habits of college students, you might have a research question along the lines ofย
โWhat foods do college students eat the most?โ
As a result of this research question, you might develop a code set that includes codes such as โsushiโ, โpizzaโ, and โburgersโ.ย ย
Deductive coding allows you to approach your analysis with a very tightly focused lens and quickly identify relevant data. Of course, the downside is that you could miss out on some very valuable insights as a result of this tight, predetermined focus.ย
Inductive coding 101ย
But what about inductive coding? As we touched on earlier, this type of coding involves jumping right into the data and then developing the codesย based on what you findย within the data.ย
For example, if you were to analyse a set of open-ended interviews, you wouldnโt necessarily know which direction the conversation would flow. If a conversation begins with a discussion of cats, it may go on to include other animals too, and so you’d add these codes as you progress with your analysis. Simply put, with inductive coding, you “go with the flow” of the data.
Inductive coding is great when you’re researching something that isn’t yet well understood because the coding derived from the data helps you explore the subject. Therefore, this type of coding is usually used when researchers want to investigate new ideas or concepts, or when they want to create new theories.ย
A little bit of bothโฆ hybrid coding approaches
If you’ve got a set of codes you’ve derived from a research topic, literature review or a previous study (i.e. a deductive approach), but you still donโt have a rich enough set to capture the depth of your qualitative data, you canย combine deductive and inductiveย methods – this is called aย hybridย coding approach.ย
To adopt a hybrid approach, you’ll begin your analysis with a set of a priori codes (deductive) and then add new codes (inductive) as you work your way through the data. Essentially, the hybrid coding approach provides the best of both worlds, which is why it’s pretty common to see this in research.
How to code qualitative data
Now that we’ve looked at the main approaches to coding, the next question you’re probably asking is “how do I actually do it?”. Let’s take a look at theย coding process, step by step.
Both inductive and deductive methods of coding typically occur in two stages:ย initial codingย andย line by line coding.ย
In the initial coding stage, the objective is to get a general overview of the data by reading through and understanding it. If you’re using an inductive approach, this is also where you’ll develop an initial set of codes. Then, in the second stage (line by line coding), you’ll delve deeper into the data and (re)organise it according to (potentially new) codes.ย
Let’s take a look at these two stages of coding in more detail.
Step 1 – Initial coding
The first step of the coding process is to identifyย the essenceย of the text and code it accordingly. While there are various qualitative analysis software packages available, you can just as easily code textual data using Microsoft Word’s “comments” feature.ย
Let’s take a look at a practical example of coding. Assume you had the following interview data from two interviewees:
What pets do you have?
I have an alpaca and three dogs.
Only one alpaca? They can die of loneliness if they donโt have a friend.
I didnโt know that! Iโll just have to get five more.ย
What pets do you have?
I have twenty-three bunnies. I initially only had two, Iโm not sure what happened.ย
In the initial stage of coding, you could assign the code of โpetsโ or โanimalsโ. These are just initial,ย fairly broad codesย that you can (and will) develop and refine later. In the initial stage, broad, rough codes are fine – they’re just a starting point which you will build onto in the second stage.ย
How to decide which codes to use
But how exactly do you decide what codes to use when there are many ways to read and interpret any given sentence? Well, there are a few different approaches you can adopt. Theย main approachesย to initial coding include:
- In vivo codingย
- Process coding
- Open coding
- Descriptive coding
- Structural coding
- Value coding
Letโs take a look at each of these:
In vivo coding
When you use in vivo coding, you make use of aย participantsโ own words, rather than your interpretation of the data. In other words, you use direct quotes from participants as your codes. By doing this, you’ll avoid trying to infer meaning, rather staying as close to the original phrases and words as possible.ย
In vivo coding is particularly useful when your data are derived from participants who speak different languages or come from different cultures. In these cases, it’s often difficult to accurately infer meaning due to linguistic or cultural differences.ย
For example, English speakers typically view the future as in front of them and the past as behind them. However, this isn’t the same in all cultures. Speakers of Aymara view the past as in front of them and the future as behind them. Why? Because the future is unknown, so it must be out of sight (or behind us). They know what happened in the past, so their perspective is that it’s positioned in front of them, where they can โseeโ it.ย
In a scenario like this one, it’s not possible to derive the reason for viewing the past as in front and the future as behind without knowing the Aymara cultureโs perception of time. Therefore, in vivo coding is particularly useful, as it avoids interpretation errors.
Process coding
Next up, there’s process coding, which makes use ofย action-based codes. Action-based codes are codes that indicate a movement or procedure. These actions are often indicated by gerunds (words ending in โ-ingโ) – for example, running, jumping or singing.
Process coding is useful as it allows you to code parts of data that aren’t necessarily spoken, but that are still imperative to understanding the meaning of the texts.ย
An example here would be if a participant were to say something like, โI have no idea where she isโ. A sentence like this can be interpreted in many different ways depending on the context and movements of the participant. The participant could shrug their shoulders, which would indicate that they genuinely donโt know where the girl is; however, they could also wink, showing that they do actually know where the girl is.ย
Simply put, process coding is useful as it allows you to, in a concise manner, identify the main occurrences in a set of data and provide a dynamic account of events. For example, you may have action codes such as, โdescribing a pandaโ, โsinging a song about bananasโ, or โarguing with a relativeโ.
Descriptive coding
Descriptive coding aims to summarise extracts by using aย single word or nounย that encapsulates the general idea of the data. These words will typically describe the data in a highly condensed manner, which allows the researcher to quickly refer to the content.ย
Descriptive coding is very useful when dealing with data that appear in forms other than traditional text – i.e. video clips, sound recordings or images. For example, a descriptive code could be โfoodโ when coding a video clip that involves a group of people discussing what they ate throughout the day, or “cooking” when coding an image showing the steps of a recipe.ย
Structural coding
Structural coding involves labelling and describingย specific structural attributesย of the data. Generally, it includes coding according to answers to the questions of โwhoโ, โwhatโ, โwhereโ, and โhowโ, rather than the actual topics expressed in the data. This type of coding is useful when you want to access segments of data quickly, and it can help tremendously when you’re dealing with large data sets.ย
For example, if you were coding a collection of theses or dissertations (which would be quite a large data set), structural coding could be useful as you could code according to different sections within each of these documents – i.e. according to the standardย dissertation structure. What-centric labels such as โhypothesisโ, โliterature reviewโ, and โmethodologyโ would help you to efficiently refer to sections and navigate without having to work through sections of data all over again.ย
Structural coding is also useful for data from open-ended surveys. This data may initially be difficult to code as they lack the set structure of other forms of data (such as an interview with a strict set of questions to be answered). In this case, it would useful to code sections of data that answer certain questions such as “who?”, “what?”, “where?” and “how?”.
Let’s take a look at a practical example. If we were to send out a survey asking people about their dogs, we may end up with a (highly condensed) response such as the following:ย
Bella is my best friend. When Iโm at home I like to sit on the floor with her and roll her ball across the carpet for her to fetch and bring back to me. I love my dog.
In this set, we could codeย Bellaย as โwhoโ,ย dogย as โwhatโ,ย homeย andย floorย as โwhereโ, andย roll her ballย as โhowโ.ย
Values coding
Finally, values coding involves coding that relates to theย participant’s worldviews. Typically, this type of coding focuses on excerpts that reflect the values, attitudes, and beliefs of the participants. Values coding is therefore very useful for research exploring cultural values and intrapersonal and experiences and actions.ย ย
To recap, the aim of initial coding is to understand andย familiarise yourself with your data, toย develop an initial code setย (if you’re taking an inductive approach) and to take the first shot atย coding your data. The coding approaches above allow you to arrange your data so that it’s easier to navigate during the next stage, line by line coding (we’ll get to this soon).ย
While these approaches can all be used individually, itโs important to remember that it’s possible, and potentially beneficial, toย combine them. For example, when conducting initial coding with interviews, you could begin by using structural coding to indicate who speaks when. Then, as a next step, you could apply descriptive coding so that you can navigate to, and between, conversation topics easily. You can check out some examples of various techniques here.
Step 2 – Line by line coding
Once you’ve got an overall idea of our data, are comfortable navigating it and have applied some initial codes, you can move on to line by line coding. Line by line coding is pretty much exactly what it sounds like – reviewing your data, line by line,ย digging deeperย and assigning additional codes to each line.ย
With line-by-line coding, the objective is to pay close attention to your data toย add detailย to your codes. For example, if you have a discussion of beverages and you previously just coded this as “beverages”, you could now go deeper and code more specifically, such as โcoffeeโ, โteaโ, and โorange juiceโ. The aim here is to scratch below the surface. This is the time to get detailed and specific so as to capture as much richness from the data as possible.ย
In the line-by-line coding process, it’s useful toย code everythingย in your data, even if you donโt think youโre going to use it (you may just end up needing it!). As you go through this process, your coding will become more thorough and detailed, and youโll have a much better understanding of your data as a result of this, which will be incredibly valuable in the analysis phase.
Moving from coding to analysis
Once you’ve completed your initial coding and line by line coding, the next step is toย start your analysis. Of course, the coding process itself will get you in “analysis mode” and you’ll probably already have some insights and ideas as a result of it, so you should always keep notes of your thoughts as you work through the coding.ย ย
When it comes to qualitative data analysis, there areย many different types of analysesย (we discuss some of theย most popular ones here) and the type of analysis you adopt will depend heavily on your research aims, objectives and questions. Therefore, we’re not going to go down that rabbit hole here, but we’ll cover the important first steps that build the bridge from qualitative data coding to qualitative analysis.
When starting to think about your analysis, it’s useful toย ask yourselfย the following questions to get the wheels turning:
- What actions are shown in the data?ย
- What are the aims of these interactions and excerpts? What are the participants potentially trying to achieve?
- How do participants interpret what is happening, and how do they speak about it? What does their language reveal?
- What are the assumptions made by the participants?ย
- What are the participants doing? What is going on?ย
- Why do I want to learn about this? What am I trying to find out?ย
- Why did I include this particular excerpt? What does it represent and how?
Code categorisation
Categorisation is simply the process of reviewing everything youโve coded and thenย creating code categoriesย that can be used to guide your future analysis. In other words, it’s about creating categories for your code set. Let’s take a look at a practical example.
If you were discussing different types of animals, your initial codes may be โdogsโ, โllamasโ, and โlionsโ. In the process of categorisation, you could label (categorise) these three animals as โmammalsโ, whereas you could categorise โfliesโ, โcricketsโ, and โbeetlesโ as โinsectsโ. By creating these code categories, you will be making your data more organised, as well as enriching it so that you can see new connections between different groups of codes.ย
From this categorisation, you can move onto the next step, which is to identify the themes in your data.ย
Theme identification
From the coding and categorisation processes, you’ll naturally start noticing themes. Therefore, the logical next step is toย identify and clearly articulate the themesย in your data set. When you determine themes, you’ll take what you’ve learned from the coding and categorisation and group it all together to develop themes. This is the part of the coding process where you’ll try to draw meaning from your data, and start toย produce a narrative. The nature of this narrative depends on your research aims and objectives, as well as your research questions (sounds familiar?) and theย qualitative data analysis methodย you’ve chosen, so keep these factors front of mind as you scan for themes.ย
Tips & tricks for quality coding
Before we wrap up, let’s quickly look at some general advice, tips and suggestions to ensure your qualitative data coding is top-notch.
- Before you begin coding,ย plan out the stepsย you will take and the coding approach and technique(s) you will follow to avoid inconsistencies.ย
- When adopting deductive coding, it’s useful toย use a codebookย from the start of the coding process. This will keep your work organised and will ensure that you donโt forget any of your codes.ย
- Whether you’re adopting an inductive or deductive approach,ย keep track of the meaningsย of your codes and remember to revisit these as you go along.
- Avoid using synonymsย for codes that are similar, if not the same. This will allow you to have a more uniform and accurate coded dataset and will also help you to not get overwhelmed by your data.
- While coding, make sure that youย remind yourself of your aimsย and coding method. This will help you toย avoidย directional drift, which happens when coding is not kept consistent.ย
- If you are working in a team, make sure that everyone hasย been trained and understandsย how codes need to be assigned.ย
Thanks for reading this post. We hope that you have a better understanding of the qualitative data coding process and that youโre feeling more confident about getting started. Good luck!
I appreciated the valuable information provided to accomplish the various stages of the inductive and inductive coding process.
However, I would have been extremely satisfied to be appraised of the SPECIFIC STEPS to follow for:
1. Deductive coding related to the phenomenon and its features to generate the codes, categories, and themes.
2. Inductive coding related to using (a) Initial (b) Axial, and (c) Thematic procedures using transcribe data from the research questions
Thank you so much for this. Very clear and simplified discussion about qualitative data coding.
This is what I want and the way I wanted it. Thank you very much.
All of the information’s are valuable and helpful. Thank for you giving helpful information’s. Can do some article about alternative methods for continue researches during the pandemics. It is more beneficial for those struggling to continue their researchers.
Thank you for your information on coding qualitative data, this is a very important point to be known, really thank you very much.
Very useful article. Clear, articulate and easy to understand. Thanks
This is very useful. You have simplified it the way I wanted it to be! Thanks
Thank you so very much for explaining, this is quite helpful!
hello, great article! well written and easy to understand. Can you provide some of the sources in this article used for further reading purposes?
You guys are doing a great job out there . I will not realize how many students
you help through your articles and post on a daily basis.
I have benefited a lot from your work.
this is remarkable.
Wonderful one thank you so much.
Hello, I am doing qualitative research, please assist with example of coding format.
This is an invaluable website! Thank you so very much!
Well explained and easy to follow the presentation. A big thumbs up to you.
Greatly appreciate the effort ๐๐๐๐
Thank you for this clear article with examples
Thank you for the detailed explanation. I appreciate your great effort.
Congrats!
Thank you for the detailed explanation. I appreciate your great effort.
Congrats!
Ahhhhhhhhhh! You just killed me with your explanation. Crystal clear. Two Cheers!
D0 you have primary references that was used when creating this? If so, can you share them?
Being a complete novice to the field of qualitative data analysis, your indepth analysis of the process of thematic analysis has given me better insight.
Thank you so much.
Excellent summary
Thank you so much for your precise and very helpful information about coding in qualitative data.
Thanks a lot to this helpful information. You cleared the fog in my brain.
Glad to hear that!
This has been very helpful. I am excited and grateful.
I still don’t understand the coding and categorizing of qualitative research, please give an example on my research base on the state of government education infrastructure environment in PNG
Wahho, this is amazing and very educational to have come across this site.. from a little search to a wide discovery of knowledge.
Thanks I really appreciate this.
Thank you so much! Very grateful.
This was truly helpful. I have been so lost, and this simplified the process for me.
Thank you!
Just at the right time when I needed to distinguish between inductive and
deductive data analysis of my Focus group discussion results very helpful
Very useful across disciplines and at all levels. Thanks…
Hello, Thank you for sharing your knowledge on us.