Towards “Visual Data Science“ with Images and Structured Data
A Proposal to Learn from Data with New Visualisations
Professor Robert Calderbank of Duke University in North Carolina recently gave the Turing Lecture Fifty Years of Learning from Data. I blogged about his great overview and the video will be here.
At the end I introduced myself as a mathematician who used to diagnose software at CERN in Geneva and asked where he sees the trend going for courses on Data Science. He set Data+ up in 2013 to train undergraduates in interdisciplinary teams. For the Turing Institute and its clients, as a hub for super international leaders, he answered most encouragingly to create data stories which would include the line “we couldn’t have done this without this theoretical new idea.”
My new theoretical input has not been published, but is being integrated into a Smart Knowledge Engine as work in progress and pre-startup SME. It is being described on Smart Knowledge Space in a number of articles and expressed in three new visualisation styles:
- True Colour 3D – for re-visualising images as 3D movable objects;
- Visual 3D – for layering multi-dimensional data along a ‘visual z’-axis;
- Chronometric 3D – for juxtaposing short-, medium and long-term trend periods.
|Stem cells in True Colour 3D
illustrating the visualisation of Digital Colour Brightness
|2D images as input|
|3D movable objects as output|
|In Visual 3D we unclutter into Layers the series that Excel draws on top of each other|
|Excel||The same data layered – with controls to vary ‘visual comparability’|
|Three air pollutants
in daily, weekly and 2-weekly series
|The same data
layered with controls to ‘compare and contrast’ visually
|Five air pollutants
in daily and weekly series
|The same data in ten layers where the colour scheme can be varied|
To illustrate the interconnectedness of Data Science projects, Professor Calderbank groups them into projects that occupy a position in a matrix during Data+ courses. The results are so positive that I would like to propose them as a template for activities at the Turing Institute to specialise in research based on the use of our Smart Knowledge Engine.
The table below demonstrates the effect of the visualisation of numbers as generic building blocks. They require words and measuring units to communicate meaningful data stories. Using compelling and significant data sets, it will be possible to gain new insights by “compare and contrast”:
The left-hand column is taken from Professor Calderbank’s slide to answer “What is Data Science?” The right hand column contrasts “Visual Data Science”: generic visualisation styles looking for compelling data sets that make new correlations and priorities apparent and visible.
|Data Science||Visual Data Science for
Images, Multi-Dimensional Data
and Time Series
|1. Marshalling:||2. Domain Experts Select Vocabularies|
|1.1. Fast Checking
1.3. Merging/Collecting Data
|Topic e.g. Smart Cities/Image Science/Financial Trends
Domain e.g. Physics/Chemistry/Biology / Medicine
Scale e.g. Microscopic / Camera / Satellite / Telescopic
|3. Analysing:||First Visual Impressions|
1.5. Image Analysis
1.6. Geospatial Analysis
1.7. Network Analysis
1.8. Statistical and Mathematical Models
1.9. Machine Learning
|Experts can immediately see what’s going on.
Sorting similar data and images for finer comparisons will enable researchers and data scientists to make new discoveries.
By comparing with ‘reference images’ and
|4. Visualising and Communicating||Analysis and Preparation for Automation|
|1.10. Publications / Grants
1.11. Policy Briefs / White Paper
1.12. User Interface
1.13. Integrating into Systems.
|Experts set boundaries so that operators can deal with events that are “off limits”.|
Considering the generic nature of these visualisation styles, the task is to find representative data sets and images that can be used as benchmarks and references, whether for the quality of data capturing, the essence of climate change or the trends of financial markets, besides the comparison between healthy and sick cells, the progress towards health or illness or faulty and counterfeited pills.
In terms of the science of imaging, the challenge is to find the best technology for any given scale. Proprietary techniques translate light and colour into digital values for digital images and videos. To find the best numerical representation of light and colour for any given scale and application will result in new equivalents of the prototype meter. For the meter is now defined by the path travelled by light in a vacuum at 1 / 299 792 458 of a second.
In the spirit of my blog post about Professor Blum’s lecture Alan Turing and the Other Theory of Computation, our Smart Knowledge Engine is a new tool of investigation as a set of “software lenses” for “visual data science”, moving from computing to visualising or from numbers to pixels – the link between digital bits and physical atoms.