Data preparation is perhaps the most important step in any type of data visualization and data analysis, in general. Data Preparation Process refers to any activity designed to improve the quality, usability, accessibility, or portability of the final project data.
While I was working on gathering and analyzing data for my phase 2 project of data visualization class, I was tasked with working with a community partner who will provide the data required for the final visualization. At that time, it made sense to come up with a list of questions that I can discuss and seek clarifications from my community partner. It would help me understand the purpose of data collection, and other important issues concerning data before I start diving into it and using in my project.
The wide range of questions covered data collection process, methods of processing, difficulties and challenges faced with organizing the data, performance and optimization etc.
Based on that, I prepared a checklist of questions to ask when preparing data for visualization. Hopefully, this will help anyone optimize their data preparation process and make sure they have all the important steps and bases covered.
- What are some of the business questions? what does the business expects to see in the final output?
- what collection method were used to gather the data?
- What legwork undertaken by the research team for cleaning the original data?
- Does any part of the data need to be transformed manually?
- What are the steps taken to fix inconsistencies in data and remove duplicate information? Were they cleaned manually or any systematic approach followed?
- What is the size of the data and does it support scaling model?
- what processing method used to refine or consolidate the final dataset?
- What are some of the data structure supported by the final dataset?
- What are the requirement for selecting a database system? Any specific requirement or constraints to be taken into consideration beforehand?
- Does it require multiple database system if there are more different datasets or formats, such as 3D or motion capture data?
- What tools or services used to export the raw data to required format?
- What tools or services used to organize the final refined data?
- How these results from the data will be used by the company in future?
- Is the current dataset scalable to incorporate research inputs done further down the road?
- Does it require frequent maintenance to manage performance and optimization?
After one has gone through the entire checklist above, you’ll have identified the data, clearly understood the important elements and ready to dive in and move ahead confidently with further analysis and development of the project.