Phase 3 Final Presentation | Fall 2017

abstact-color-circle-blue3Phase 3 Final

Project Github

Multi- Foci Layout Visualization of Multimer Dataset:

Screen Shot 2017-12-14 at 6.14.48 PM

Force layout uses physical simulation for positioning visual elements. Nodes are rendered in groups to form a cluster each representing different sentiments. Nodes also represents sentiment value per person.

(i) Nodes scattered in cluster groups           (ii) Nodes assembled as whole

Measuring and interpreting physiological metrics through Sentiments :

Screen Shot 2017-12-14 at 6.15.37 PMOpen-mindedness: Associated with an individual being attracted to a topic, but not alarmed.

Fascination: Associated with a relaxed interest in a topic.

Stimulation: Associated with an individual being more attentive than they are relaxed.

Power / Intensity: Associated with the lasting impact of the experience.





Data Computation: The sum of values were in the range of 0-1000. Before mapping the input values with the nodes. The total value were scaled down to a small value in same ratio, by taking cube root of the total values (using Math.cbrt())

Paper Submission

Link to the Report from Associated Press:




Viacom Visit and Project Update

Geospatial Mapping with D3

Screen Shot 2017-12-14 at 7.28.06 PM.png

For the field trip presentation, I explored the latest advances  with Geospatial mapping, and concepts behind geospatial data and the world of cartography, which has been around for a long time.

D3 is a powerful visualization library with a ton of uses. It can be used for much more than just DOM manipulation, or to draw charts. D3.js is extremely powerful when it comes to handling geographical information and building a useful and informative web map,

Screen Shot 2017-12-14 at 7.28.12 PM.png

I also analyzed some of the latest tools and frameworks available to visualize geospatial data such as D3 Maps and Uber

Phase 3 Process Update 2

This week I continued on the prototype development and data computation process.

Screen Shot 2017-12-17 at 2.20.28 PM

It took several hours to filter the right amount of data required for the final project data. During processing the CSV files to fetch the total values of sentiment counts per person, I found out that the range required to set the scale for D3 layout would be huge if I decide to go ahead with the actual values (899, 351, 244, 16, 36, 9). The output would render a large number of nodes on screen that might increase the app payload and potentially impact the overall performance. It might also reduce the animation speed as the force layout will have to calculate and set the coordinates for all the nodes during user interaction. I discussed the issue with the community partner and came to a conclusion that it would be better to scale down the values proportionately to one-third of the total.

Before passing down the data and mapping the values with the nodes. I used Javascript’s cube root function (Math.cbrt) to take the cube root of the total sum and round up the number to remove the floating points.

Screen Shot 2017-12-17 at 2.19.52 PM

I also worked on setting up the layout for the final visualization. I chose a 3 column layout and center the visualization. The left column will display the annotations related to the sentiments represented by the data in the visualization. The right side will display the form controls to select various 360 degree video and device type associated with it. The right column also includes buttons to group and scatter the nodes. The list of 360 video titles used during the research is added to the bottom of the right column.

Working with Geospatial Data

Screen Shot 2017-12-19 at 11.43.13 AM.png

There are four concepts behind working with geospatial data: Shape files, GDAL, GeoJSON and TopoJSON

Shape Filles
The first one of these are shape files. Shape files is a geospatial vector format used by most of the GIS software packages.Basically, It is a type of data format used to actually draw things. Shape files can be used for many different things, but primarily, it’s all about drawing maps. This is a format that cartographers have come together on and standardized around, so that we can share this information for any type of mapping application.

GDAL is the geospatial data abstraction library. It has two kinds of formats, raster and vector formats. So if you’re familiar with graphics on the web, those are the two different ways that this can actually take shape files and other things and generate these formats to draw maps. This is the most commonly used platform for actually working with shape files and working with different format data to generate the actual maps that we use. It allow us to take a shape file and spit out the GeoJSON format.

It’s a format for encoding a variety of different geographic structures. Now it represents discrete geometry objects here. It has feature collections inside of there. These are usually set up with name value pairs just like arrays and other pieces of data you’re used to using in JSON format.

It is an extension of GeoJSON. It’s almost the sibling of GeoJSON. It actually encodes the topology. So it’s more than just the geometry itself. It actually has different layers to it, and you can add a lot of things on top of it. It has the topology encoding, and it has this thing called shared line segments, commonly referred as arcs. What that does is that eliminates the redundancy.

So if you have two countries or two states let’s say in a country that are bordering each other, in the GeoJSON file, you’re going to be drawing both shapes and just kind of smashing them together. In TopoJSON, you would actually eliminate that need to draw that border twice, and you would only draw it once. So what that means is it typically results in about an 80% reduced file size. And of course, on the web, we love small file sizes so we can render things much faster.

GeoJSON and TopoJSON are both inter-related. These are both different file formats that are JSON files. So they’re used on the web. They’re also supported by D3 library.

Checklist of Questions to Ask When Preparing Data for Visualization


Data preparation is perhaps the most important step in any type of data visualization and data analysis, in general. Data Preparation Process refers to any activity designed to improve the quality, usability, accessibility, or portability of the final project data.

While I was working on gathering and analyzing data for my phase 2 project of data visualization class, I was tasked with working with a community partner who will provide the data required for the final visualization. At that time, it made sense to come up with a list of questions that I can discuss and seek clarifications from my community partner. It would help me understand the purpose of data collection, and other important issues concerning data before I start diving into it and using in my project.

The wide range of questions covered data collection process, methods of processing, difficulties and challenges faced with organizing the data, performance and optimization etc.

Based on that, I prepared a checklist of questions to ask when preparing data for visualization. Hopefully, this will help anyone optimize their data preparation process and make sure they have all the important steps and bases covered.

  1. What are some of the business questions? what does the business expects to see in the final output?
  2. what collection method were used to gather the data?
  3. What legwork undertaken by the research team for cleaning the original data?
  4. Does any part of the data need to be transformed manually?
  5. What are the steps taken to fix inconsistencies in data and remove duplicate information? Were they cleaned manually or any systematic approach followed?
  6. What is the size of the data and does it support scaling model?
  7. what processing method used to refine or consolidate the final dataset?
  8. What are some of the data structure supported by the final dataset?
  9. What are the requirement for selecting a database system? Any specific requirement or constraints to be taken into consideration beforehand?
  10. Does it require multiple database system if there are more different datasets or formats, such as 3D or motion capture data?
  11. What tools or services used to export the raw data to required format?
  12. What tools or services used to organize the final refined data?
  13. How these results from the data will be used by the company in future?
  14. Is the current dataset scalable to incorporate research inputs done further down the road?
  15. Does it require frequent maintenance to manage performance and optimization?

After one has gone through the entire checklist above, you’ll have identified the data, clearly understood the important elements and ready to dive in and move ahead confidently with further analysis and development of the project.

Tips for Better Data Visualization

Screen Shot 2017-12-19 at 1.20.33 AM

Creating effective data visualization is both a science and an art.  It should not only generate hypothesis and reveal insights but also communicate information effectively. It must strike a balance between the art and science elements of it. This means visualization must both:

  • Be visually appealing
  • Maintain fidelity to the structure of the data

Accomplishing both can be a bit of a challenge. First data visualization isn’t just about displaying data. It’s about displaying data in a way that is easier to comprehend—that’s where the real value lies.

Finding the best way to visualize the data is often considered as an after thought rather than a critical piece of the process. Poor data visualization can lead to confused messages ultimately, poorly executed and ineffective decisions. So here are five key factors to consider when designing and developing data visualization that has desired impact on the intended audience whether it is to draw insight or make better informed decision and take appropriate actions.

1. Start with a clear strategy


Getting visualization right is much more a science than an art, which we can only achieve by studying human perception. – Stephen Few

As with every aspect of design and development, having a clear strategy and end goal in mind is essential when it comes to planning how you will use the visualization.

With visualization, the gaol will generally be to impart insight and knowledge gained through data wrangling and exploration to the right people, to use it and make a difference.

This strategizing should start right from the discovery phase of the project, as the first step of putting together a plan for data-driven transformation. Just as you are clear what the goals of your data gathering and analysis are, you start thinking about the form and format. followed by methods and techniques that will be most effective to present it. Further, it’s also critical to assess the technical feasibility of the technology stack you are planning to use, which includes its limitations and challenges such as performance, data processing and device compatibility.

2. Tell a clear story


Data storytelling is an essential part of getting your message and meaning. Like all stories, a data story will have a beginning, a middle and an end. And also like a lot of stories, they wont necessarily come in that order.

In fact, with a data story, it’s often best – essential even to start at the end. That’s because unlike a movie or novel story, we aren’t worried about giving away the ending. A data-driven story should be told more like a newspaper story – shouting your key findings in a headline at the top, and then backing it up with evidence as the reader gets drawn in.

3. Do not overdo amount of information

It can be very easy to overdo the amount of information you can cram into your graphs, infographics or dashboards. It important to identify the key messages in a data set and present them in a way that isn’t cluttered by extraneous and unnecessary details.

Overly busy graphics and visualization tire the eye and the brain – and they don’t sick in the mind nearly as those which makes a simple and concise point, backed up with relevant and up-to-date facts and stats.

4. Fit your visualization to your audience:

Data often tells different stories to different audience. Part of the skill of building a narrative with data is understanding audience. While a detailed breakdown of the different machinery parts will be valuable to an engineer, a business executive would needs a more concise but broader overview of the situation. Not if and when an individual machine might break down, but rather how the company’s machine are working as a whole, and if they are helping or hindering the company when it comes to hitting the goals.

Both kinds of information might be available in the dataset, but needs to be presented differently to meet the needs of each audience.

5. Set the Context

Usually, the story your data should be telling, is what the abstract graphs and statistics mean in the real world. This means your data must be grounded by its real-life impact – what difference will the data make to the lives of your customers or the audience you are presenting it to.

6. Consider Colors Carefully

Color is a great tool when used well. When used poorly, it can not just distract but misdirect the reader. Use it wisely in your data visualization design. Select colors appropriately. Some colors stand out more than others, giving unnecessary weight to that data. Instead, use a single color with varying shade or a spectrum between two analogous colors to show intensity. Make sure there is sufficient contrast between colors. If colors are too similar (light gray vs. light, light gray), it can be hard to tell the difference.


Phase 3 Process Update 1

I spent the last two week in planning and discovery phase of the phase 3 project. It started with identifying the best visualization form to extrapolate the study findings from AP-Multimer research study done by the community partner. (based on data collected)

I did the initial analysis of the three datasets received (EEG, heart rate, and motion) from the community partner. These datasets have been cleaned, parsed and analyzed using Python analysis libraries by the Multimer team. Initially, there were a copious amount of data and large file size when were broken down to the small chunk of files and optimize it.

I also worked on setting up the development workflow for the final project. I would be using Webpack and npm scripts and Node server to bundle the application code and related assets.

Based on the technology stack selected, D3 will be the choice of standard visualization library. I am planning on integrating this with the React ecosystem for several reasons.I picked up this approach because it supports component creation framework that lets you build self-contained elements (like div or svg:rect) that have custom rendering methods, properties, state, and lifecycle methods.

My approach to the visualization style would be a data dashboard that provides users with multiple perspectives into the data as well as the ability to filter between different categories and see individual data points. The major component would be a force directed graphs that show the correlation between different sentiments for each story type. I am currently exploring various features and limitations of the force directed graphs and what steps are required to transform the final project data suitable for this type of visualization. From an interaction standpoint, the dashboard would include category menu, form filters so that user can compare visualization for different story videos. It would also include tooltip overlays to provide additional information about the survey results.