How to export analysis-ready survey data
By Crystal Lewis in tutorials
October 31, 2022
Have you ever downloaded your data from a survey platform and expected it to look like this? đ»
But instead you export data that looks like this? đ
There are many issues with this data, ranging from unclear and inconsistent value entries, to vague variable names, to conflicting variable types, to multiple things being measured in one variable. While this data can still hopefully be salvaged, the time consuming, decision-heavy process that will go along with cleaning this data is an avoidable step in our data management process.
So how do we prevent this from happening? We build our data collection tools with our end result in mind. In this tutorial I am going to go through some tips to help you export data that is more interpretable and more usable. While building tools with your end in mind does require some up front time, planning and developing, it pays off in dividends.
Planning with the end in mind
So how do we plan with the end in mind? How do we know what our end data should look like? Our data dictionary! If you missed the post on data dictionaries, you can read about them here.
So letâs say weâve created this data dictionary below, and it includes all of the items we plan to add to our survey instrument (as well as items we plan to calculate after we collect data). We can use this as our guide to build a survey that exports data exactly as we expect it to!
Disclaimer: Please ignore the poor quality of these survey items. They are only for demonstration purposes, not to show quality of actual survey questions. :)
Building surveys using the Qualtrics platform
Why the focus on Qualtrics? Well, it just happens to be the survey platform that I am most familiar with. This is definitely not an advertisement for using Qualtrics. I have no opinion on it as a tool other than it does have many helpful features and it meets many industry standard for privacy compliance. I, like probably some of you reading this, gained access to it because my former institution had a contract with Qualtrics and that is the tool that was used and trusted by my organization. As prolific as Qualtrics is though, it also happens to be known for having a fair share of pain points, and a lack of awareness of its quirks has led to many frustrations in the research community.
Have you truly used Qualtrics for data collection if one random item isn't coded from 13-19 instead of 1-7? đ
— Dr. Tessa Benson-Greenwald (@GreenwaldTessa) January 17, 2022
I am hoping some of the tips in this post will help you avoid some of these Qualtrics frustrations. With that said, I believe all platforms have their own quirks and I think many of the tips listed below can generally be applied to other data collection platforms as well.
Disclaimer: The tips below are just that, tips. This is not a comprehensive manual for building a survey. Itâs just a collection of lessons Iâve learned over the years after building surveys and exporting bad data. This is my attempt to give you some ideas to help you export more reliable and quality data from your survey platform.
Tip #1: Use your data dictionary to rename all variables
In Qualtrics, variable names are assigned based on the order in which a variable is created, starting with Q1. Not only does exporting variable names like Q1, Q5, etc. require the extra step of renaming variables after export, it also leads to errors. I have been in several situations where a cleaning syntax is written for a survey, but then at some point the survey items are reordered or a question is added or deleted and Q7 no longer aligns with what we think it does in the syntax.
The solution here is to name all of your variables according to your data dictionary. You can rename variables by simply clicking into the variable name space and renaming the item. While this is a manual process in Qualtrics, it saves you such much time and energy in the long run.
One other piece of advice: If you are collecting data longitudinally, you may be tempted to add time to variable names (ex: age_t1, age_t2). My advice to you is donât. Name variables consistently over time. If you plan to make copies of your survey over time, for different waves of data collection, your process is reproducible if your variable names stay consistent over time. You simply copy the survey and resend as needed. However, if your variable names change each wave, you will need to edit your survey variable names each wave. Not fun. You can always add time to your variable names programatically after you download your data.
Tip #2: Use your data dictionary to recode all variables
Similar to item names, Qualtrics automatically assigns numeric value codes to response options of categorical variables based on the order that options are provided. You can think of this in a yes/no question where you provide the option yes
first and Qualtrics assigns it a 1
behind the scenes, and then you provide the option no
and Qualtrics assigns it a 2
. But similar to the problems above, these may not be the numeric values you want associated with your text labels, requiring additional data cleaning after export. Whatâs worse, reordering or replacing response options may lead to odd values that you were not expecting. In Qualtrics you can recode values by going to âRecode valuesâ and checking the âRecode Valuesâ box to change your values from icky coding like 1,2,5,7
to 1,2,3,4
as denoted in your data dictionary.
Note: If you add new questions or choice options at any point, check the back end coding again before exporting.
Tip #3: Use your data dictionary to add content validation to variables
This step is crucial to ensuring you collect valid data for open-ended response questions. For example, if your variable type is numeric according to your data dictionary, you do not want to allow text entry. If your allowable variable range is only 10-20, you do not want to allow values outside of that range. If you do not add this validation to your data collection tool, you run the risk of potentially collecting unusable data or at best, messy data that requires lots of additional data cleaning. You can add validation in Qualtrics by selecting âAdd validationâ and then selecting the type of validation you want, for instance âContent typeâ. From there you can add more information, such as âContent typeâ is âNumberâ and set a minimum and maximum range for the question.
This content validation does not only apply to numeric questions though. Another example may be only allowing a date input and setting the format for those dates to a format of your choosing. If someone does not enter the date using the assigned format, they will receive the following message Please enter a valid international date of the form: yyyy/mm/dd.
Tip #4: Use controlled vocabularies when possible
As we saw in the beginning of this post, open-text boxes are often riddled with typos, misspellings, unclear responses, and misinterpretations. There are times when you really do need to use a text box to collect data. You may need to use an open response box in order to be inclusive of all response options or you may need to use one because there really is an undetermined number of response options. But if you have a manageable, finite number of response options to choose from, use a multiple choice list or drop-down menu to prevent you from having to clean up a messy text variable.
If you have many response options, one nice feature of Qualtrics is that you can drag and drop a list of options into Qualtrics, removing the step of having to manually type each option. After the response options are added you can then go in and recode the response option values.
Tip #5: Choose the right type and format for the question
This may sound like a no brainer, but itâs not always clear which question type should be used for your question. It is important to become familiar with the various question types available to you and test out what works best. For instance if you were unaware that âRank orderâ was an actual question type available to you, you might be tempted to create a multiple line text entry form where participants freely enter their ranking order of your items. However, without carefully constructing this kind of question, you may miss the fact that a free form may allow duplicate entries (such as selecting a rank of 1 more than once).
However, if for example you choose the âRank orderâ type and select the âDrag and dropâ format, you will no longer run the risk of people selecting duplicate options as they are only allowed to drag the 3 existing response options and order them.
But wait! You still have to pay attention to your options. If you select the âRank orderâ type and instead select the âText boxâ format, users are still allowed to enter duplicate response options, unless you go one step further and select âAdd validationâ and âMust rank allâ. Then if a user tries to select â1â for both âPepperoniâ and âCheeseâ, they will receive this error message, Please assign a value from 1 to 3 for each item. Values may not be repeated.
The takeaway here is to study your question types and formats and choose whichever provides you the most accurate and reliable response data.
Tip #6: Donât collect names as unique identifiers
Oftentimes in survey research, unless we are collecting anonymous data, we need unique identifiers in our survey to be able to link data across time or across forms. You may be tempted to collect names in order to link data, but there are many problems with using names:
- Names arenât unique
- Most studies require confidentiality so you will need to end up replacing names with a numeric or alphanumeric study ID before releasing data
- Free form text fields lead to inconsistent values making linking difficult
An example of entries that might be troublesome to link:
Spelling of names in pre-test | Spelling of names in post-test |
---|---|
Frieda A. Maeda | Frieda Maeda |
violet Mcgonical | Violet McGonical |
Chris Lucass | Christopher Lucas |
Oriana Koyla | Oriana Koyla-Ross |
My recommendation here is, if you are running a study where you already have an existing list of participants in your study, pre-load that list into your survey and use it to collect data. Using your embedded participant list you can do things like send unique links to participants, removing the need to have participants enter their own information. To prevent errors, upon opening each unique link, a prompt can be triggered asking participants to verify their identity (based on the embedded data you have added - such as name). Even better, if you already have both name and unique ID assigned, embed all of that information, removing the need to merge in study ID after export.
To learn more about embedding data in Qualtrics, such as names and IDs from a contact list, you can visit Qualtrics Help.
Tip #7: Check your survey by exporting the instrument
Export your survey instrument to a human-readable document. Exporting your survey this way allows you to more easily check things such as:
- Are all questions accounted for?
- Are all response options accounted for and coded as expected?
- Is display logic doing what you expect?
You can export a copy of your survey in Qualtrics by going to âToolsâ -> âImport/Exportâ -> âExport Survey to Wordâ. This Word document will show you the variable names, the back-end value coding of response options, as well as any display logic associated with your survey.
Tip #8: Collect sample responses
Before you send out your survey to the world, make sure to collect sample data first. Send it to people on your team and provide them with a specific list of things to test and look for, similar to those we mentioned above. For example:
- Can everyone access the survey?
- Are all questions accounted for in the survey (according to the data dictionary)?
- Are all response options visible for each categorical question (according to the data dictionary)?
- Is your data validation working or are people allowed to enter unallowable values?
- Does skip/display logic follow the rules assigned in the data dictionary?
- Are there any poorly worded questions?
- Are there any unclear response options?
Once you gather feedback from all users, integrate that feedback into the survey.
Tip #9: Choose a file type for exporting your data
You may think that choosing a file type to export to is a decision that can be made after your data is collected, and generally it is. But it is important to know that each file exports a little differently and has different export options. One example of this is understanding how âselect allâ questions export differently across different file types. Or understanding how value choices will be displayed. Just for demonstration purposes, I will compare a few differences between exporting to an SPSS file versus a csv file.
Example 1: Choosing to export to an SPSS file
- Multi-value fields (like my âselect-allâ and ârank choiceâ questions) will be exported as unique items, with each item being coded as â1â if the item was selected and blank if not selected.
- My response options will all be exported as numeric values (with the value labels displayed on the Variable View tab)
Example 2: Choosing to export to a csv file
- Here the default is again to export âselect-allâ and ârank choiceâ questions to unique items, however, we now have the option to unselect the check box âSplit multi-value fields into columnsâ and allow all of our response options for âselect-allâ to be displayed in one variable, with each value separated by a comma. NOTE: While this is an allowable option, be aware that a variable in this format will not be analyzable unless you do further transformations in your data cleaning process.
- Again, the default for this format is to export my response options as numeric, but I also now have the option (not provided in the SPSS format), to export choice text instead of numeric response options.
- This file type also exports these additional rows of metadata (notice the 2 additional rows of information below the variable names). I will need to account for this in my data cleaning process.
Choosing your export file type ahead of time allows you to plan for future data cleaning (for example recoding blanks to 0 in your âselect allâ and ârank choiceâ questions) as well as make any needed edits for your data dictionary (for example displaying your âselect allâ question as 4 separate items with yes/no response options, rather than one item as it is displayed as right now).
Tip #10: Check your data by exporting your sample responses
I cannot emphasize enough, how important it is that you export sample data from your instrument before you ever send your survey out for actual data collection. It is so important you check that your data is exporting in the way you expect it to.
Things to check are:
- Are there any unexpected variables?
- Are there any unexpected variable names?
- Are there any unexpected values?
- Are any values missing where you expected data?
- Are there any unexpected variable formats?
- Is data exporting in an analyzable, rectangular format? Is there anything that violates the rules of basic data structure. You can read more about data structure rules here.
You can export your sample data in any way that you usually export your data. Probably the most common way is by going to âData & Analysisâ -> âExport & Importâ -> âExport Dataâ -> And then choose your file type.
In our sample data below, I exported to a csv file. I notice a few things. First I see that Qualtrics exported some additional metadata that I was not expecting (ex: RecordedDate, ResponseId) and I will need to account for that in my cleaning process and data dictionary. Second, I see that a -15 was allowed to be added in my age
variable which means something is wrong with my content validation for this item (which should only allow the range of 10-20). I will want to fix this before I send my survey out for data collection.
Tip #11: Be aware of Qualtrics quirks
Qualtrics has many quirks, features that are not necessarily bad but they just arenât intuitive. My guess is that many of you reading this may have experience quirks of your own. And if you are unaware of these features, it can lead to serious errors in your data. Just a few of the quirks I have come across and want to share with you are:
1. Make sure to publish all changes. Any time you make a change to your survey, those changes will not appear in your public facing survey unless you select âPublishâ (the big BLUE button in the top-right corner of your survey page).
2. If one of your item response options is âdonât knowâ or âunsureâ, Qualtrics automatically recognizes those as special response options and places a warning sign next to it. đ«
If you donât pay close attention (which has happened to me before), you will not notice the warning sign and you will not hover over it to read that Qualtrics has chosen to exclude those response options (meaning that when a user selects âdonât knowâ, instead of that value being coded to 5, as I have assigned, it will be recoded to missing in my data export).đ
If this is not the behavior you want, and you prefer those response to appear in your data as the value you have assigned (in this case a 5), you can click into the response option and uncheck the option âExclude from analysisâ.
I am sure there are MANY more quirks to be known about Qualtrics (or other survey platforms) and if you have any you would like to share with others, please feel free to comment at the bottom of this blog post!
Wrapping it up
I hope some or all of the tips above can help you build your survey instruments, or at least get you started in thinking about the many steps that go into creating a survey that exports analysis-ready data.
I recognize that each survey is different, each project is different, and each userâs needs are different. But ultimately what I want to emphasize here is, take the time to thoughtfully plan and test your survey before you collect data so that you end up with usable and reliable data. If changes are made to your survey at any point after data collection has started, make sure to account for and track those changes in your data dictionary.
Last, J-Pal has put together a very thorough document on survey design if you would like to learn more. This resource also includes a survey platform comparison table (Table1) that can be helpful in deciding which platform best meets your projects needs.
Update 2024-05-08
Tip #12: Use response validation as needed
Itâs worth mentioning that in addition to content validation, also consider response validation. A common type of response validation is adding a âforceâ or ârequestâ response option to questions where missing data would be a problem. However, in considering response validation, you may have reasons for adding or not adding either of those options (e.g., concerns about data ethics).
Another example of adding response validation is not allowing participants to provide answers where they shouldnât provide answers. One common way to do this is through skip or display logic, which directs you to questions based on responses to previous questions. Another example of this type of response validation is when you use a âselect allâ question type but you also provide a response option where, if selected, no other options should be selected. In Qualtrics you can add this kind of validation by selecting the relevant response option and adding the option to âMake Answer Exclusiveâ. Doing this ensures that no other answers are selected when this answer is chosen.