After hosting hundreds of trackers, Crunch.io has learned a thing or two about trackers.
Everyone knows to keep waves of a tracker the same. We recently read four “How-To” guides on survey trackers. They all gave the same advice. Don’t change anything between the waves of a tracker. We all know that.
But, no matter how hard you try, some things always change between waves.
Unexpected changes happen. And, then you have to deal with it. And, dealing with it often takes huge amounts of time and effort. Your survey results get stuck in data processing for what seems like an eternity. You want to look at the latest trends, immediately, right after the survey completes fielding but, something always gets in the way.
Goals:
- Vault over the hurdle of dealing with unexpected changes between tracker waves.
- Automatically update your datasets, analyses, graphs, charts, tables and dashboards.
- Immediately start your survey data analysis as soon as fielding of a tracker wave completes.
So, how do you do that? First, let’s review the challenges.
Here are ten changes that commonly happen between waves of a tracker that just drive you crazy!
10 common changes between waves that give market researchers fits.
Avoidable changes
- Labeled the old Q10 as Q9 and retired the previous Q9.
- Accidentally gave an existing code to a new response option.
- Accidentally gave a new code to an existing response option.
- Switched market research vendors, and the new ones changed all of the above and more, because they were unfamiliar with the established conventions of the tracker.
- Switched survey fielding software, and instead of numbering things 1 through 10, they called it one to ten.
- Reversed coding of some of the scale questions so that code 1 represents the high end of a ten-point scale.
- A programmer decided that actually numbers should be zero-indexed.
- Didn’t proofread and suddenly Finland is “Find Land”.
- Dropped a question.
- Structural changes in the existing question.
So, what do you do about all these pain-in-the-neck changes?
An ounce of prevention is worth a pound of cure.
Here are several ways we can protect ourselves against those pesky changes between waves.
- Semantic Variable Names. Use meaningful variable names. If you name your questions Q1, Q2, Q3...Q47, it is much easier to make a mistake in a new wave and recycle a question name where Q34 asks about Awareness in one wave and asks about Consideration in another. Short variable names are an artifact of the punch card era and SPSS limitations from the ’70s (the ’70s!) — We can do better. Give your variables meaningful, human-readable names like “awareness_nike”, “awareness_adidas”. This way you’re much less likely to have data issues.
Another tip is to rather than using "Q" all the time, you can use other letters to denote the "Section". You can have S1, S2, S3 for the screeners, D1, D2, D3 in the demographics section.
[Side note: Crunch.io's survey data analysis platform allows you to easily define arbitrary names for variables, as well as human-friendly names for analysis.] - Master List. Keep a master list of variables and brands. Brands will come and go out of the list that you’re asking about over time. The key is to have a single source of truth that you can turn to and list them all. This is not the same thing as scouring through past questionnaires and old .docx files.
Instead, create a shared, online spreadsheet, such as Google Sheets, that clearly lists all the brands and then flags which waves they were asked in. This way, you will always have the latest version on hand, saved to the cloud, as opposed to tracking down the right version sent via email. - Naming Schemes. Agree on a delimiter and capitalization scheme. It doesn’t matter that we drive on the right or left side of the road; it only matters that we all agree to drive on the same side. Similarly, we can write our variables as “awareness_nike” or “AwarenessNike” or “awareness.nike”,... we just need to agree upon a rule at the start of a tracker that “we’re going to use underscores between words with no capitalization (what I like and is called lower-snake-case)”. This way we don’t have one wave with “awareness_nike” and another with “awareness.nike”.
- Numbering Questions. Remember, there’s always more room on the number line. If you do have a question and category naming scheme that is sequential integers, then we’ve all been burned by the need to insert, e.g. a new question in the middle of the survey between Q13 and Q14. This seems small but remember that we don’t have to build from the last question in the survey. Instead of Q48, we can add in something like “Q1301”, or even better, “Q13_purchase_likelihood”.
Unavoidable changes
Now, you know how to prevent avoidable changes but some changes are going to happen no matter what you do.
Here are some examples:
- The brand in the study (e.g. Nike) changed the name of their products.
- A brand in the study dropped a product.
- A question was added e.g. about a new category. (meatless meat).
- Added a 7th choice to a multiple response question (e.g. a new product).
- Added Belgium.
- Added questions in French.
- Dropped a question.
So, you’ve completed wave 5. How can you double check for consistency? Here are actions you can take:
- Sandbox. Make a sandbox (testing environment) with the new wave, so you can go fast and not worry about breaking a live dataset.
- Dry/Partial Run. Get started early. Run checks on a partial sample. You shouldn’t need to wait until your fielding is 100% complete. If your survey software supports creating dummy respondents to create test data, you can do this dry run before the survey is even in the field.
- Compare. Set up a spreadsheet to do side-by-side comparisons between the waves. For example, the spreadsheet could contain file information for two different waves created from an SPSS file. Export the metadata and data to excel format and use any comparison tool to see the differences.
- Automate. Use a tool like the comparedatasets function in R, that can operate directly on the survey data in CrunchDB. It will be helpful to compare “categorical” variables. Comparing multi-response arrays or categorical arrays could be a lot of manual work.
Fixing it all up
Things did change and you identified what changed. Now, you have to clean it up.
Changing things by hand, going variable by variable, can be time consuming, and it’s hard to go back and start over if you make mistakes. When appending a tracker, automate and script using tools, such as R or Python, as much as possible. See below for examples.
Updating Survey Data Labels and Adding Descriptions
Use scripting to align the incoming data:
- Correct mis-spellings.
- Manage product name changes.
- Add question description wording.
- Change “1” to “one” (or some meaningful label).
- Change incoming variable names where needed to match the tracking data.
- Ensure brand choices in a multiple response question are consistent.
If your survey from the field comes in without metadata, such as name, labels, description, and notes, you can add the extra metadata later by addressing variables by their unique identifier using scripting.
Managing Newly Added Questions
In the event that a new question is mistakenly added at the top or middle of the survey, you’ll need to untangle their identifiers so they match.
To align trackers, you can create a map of existing identifiers and then rename, on-the-fly, the new wave’s variables so they match up and append smoothly to the existing ones.
With scripting, if the existing tracker already has compatible data structure and naming, the new values are appended without having to redefine anything in the new wave. Automation also provides many tools to redefine as needed when it becomes necessary.
Similarly, you can also use scripting to manage dropped questions.
Matching Added Choices to Question
If your tracker has a 3-point scale and you start in a new wave using a 5-point scale, you'll need to recode the extra two categories so they can align with the 3-point one — they could be combined with the neutral value, or combine with the values at the ends of the scale.’
Scripting will append the complete data for the new wave, but also still add the equivalently mapped 3-point scale for tracking with the older waves.
You can apply the same approach for added choices to a multiple response question.
Change is inevitable.
Despite your best efforts, changes between your tracker waves are bound to happen. Some are avoidable. Some are simply unavoidable. Even though you’re supposed to be tracking the same thing over time, the real world intervenes and you need to change the questionnaire. A key competitor goes bankrupt, or changes their name, or merges, or… a thousand things we’ve all been burned by. The reality is that it will change.
We hope this paper has provided guidance on how to handle the changes efficiently so you can start analyzing the latest trends and insights as soon as possible.
Appendix: five bonus tips!
- Design your tracker survey, with the end in mind.
Add variable labels and organize variables (put them in bins/folders). Utilize scripts to update these in your survey data analysis platform. - Store all data in one database/platform in the cloud, accessible to everyone.
- Store data at the respondent level, not summarized.
Use a tool that can aggregate data on the fly. You never know what rollups you are going to want to do. E.g. so you can make custom age categories, and even change them over time. - Automate end-to-end updates.
Use a survey data analysis platform that automates:- Reports
- PowerPoint slides
- Excel
- Dashboards
- Process data within your analytic tool of choice.
Do as much of the data processing as possible in the survey data analysis platform database to avoid having to export, manipulate, and re-import data.
About Crunch.io
Crunch.io is a modern survey data analysis platform developed by survey experts, market researchers, data scientists, and engineers.
We were frustrated by the current tools used to analyze, visualize and deliver survey data. So, we built Crunch.io to bring simplicity and usability to survey data analysis.
With Crunch.io, you can build crosstabs with a drag-and-drop, and a graph with one click. Export real PowerPoint objects. Build and deliver a dashboard in 5 minutes. Get rid of two-day turnarounds to answer questions. Analyze all your data in one place.