Analysis of biomedical data with multilevel glyphs (2024)

Journal List
BMC Bioinformatics
v.15(Suppl 6); 2014
PMC4158616

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

BMC Bioinformatics. 2014; 15(Suppl 6): S5.

Published online 2014 May 16. doi:10.1186/1471-2105-15-S6-S5

PMCID: PMC4158616

PMID: 25079119

Heimo Müller,¹ Robert Reihs,¹ Kurt Zatloukal,¹ and Andreas Holzinger^2,³

Author information Article notes Copyright and License information PMC Disclaimer

Associated Data

Supplementary Materials

Abstract

Background

This paper presents multilevel data glyphs optimized for the interactive knowledgediscovery and visualization of large biomedical data sets. Data glyphs are three-dimensional objects defined by multiple levels of geometric descriptions (levelsof detail) combined with a mapping of data attributes to graphical elements andmethods, which specify their spatial position.

Methods

In the data mapping phase, which is done by a biomedical expert, meta informationabout the data attributes (scale, number of distinct values) are compared with thevisual capabilities of the graphical elements in order to give a feedback to theuser about the correctness of the variable mapping. The spatial arrangement ofglyphs is done in a dimetric view, which leads to high data density, a simplified3D navigation and avoids perspective distortion.

Results

We show the usage of data glyphs in the disease analyser a visualanalytics application for personalized medicine and provide an outlook to abiomedical web visualization scenario.

Conclusions

Data glyphs can be successfully applied in the disease analyser for theanalysis of big medical data sets. Especially the automatic validation of the datamapping, selection of subgroups within histograms and the visual comparison of thevalue distributions were seen by experts as an important functionality.

Keywords: Visualization, Interactive Knowledge Discovery, Glyphs, Semantic Zoom

Background

Professionals in the biomedical domain are confronted with increasing masses of data,which require efficient and user-friendly solutions and the development of methods toassist them in knowledge discovery to identify, extract, visualize and understand usefulinformation from these large amounts of data [1]. The trend towards personalized medicine has resulted in a mass of clinical,laboratory and genome-scale

data and moreover, most data models are characterized by complexity, which makes manualanalysis very time-consuming and frequently practically impossible [2]. The major challenge is: How can an expert find knowledge in these terabytesof complex data? For example, to successfully search for novel hypotheses in largedatasets, we must look for unexpected patterns and interpret evidence in ways that framenew questions and suggest further explorations[3]. Consequently, methods from Knowledge Discovery and Visual Analytics methodsmay help us to

• Overview large data sets as the human visual sense is optimized for parallelprocessing

• Connect the global view with detail information

• Provide different contextual views (e.g. expert versus common user)

• Deal with inhom*ogeneous data sets and broad range of data quality.

As one solution to these goals, we developed a set of validated glyphs for interactiveexploration of biomedical data sets. With the ability to work with different level ofdetails, to arrange and order the glyphs in space and to synchronise differentvisualizations through coordinated multiple views (CMV) [4], an expert can in the truest sense of the word, travel through his dataspace.

Jacques Bertin's book Sémiologie graphique, published in 1967 (English translation1987 by J. Berg), provides the foundation for the analysis of visual elements to displayqualitative or quantitative data [5]. Bertin's practical experience as a cartographer led him to the question howto find rules to build proper graphics. His study of signs together with their"grammatical" rules is based on a clear and logical symbol scheme in which symbols canbe varied referring to visual variables. Visual variables include size of elements,their shape, orientation, brightness color, texture and position. Bertin called theseattributes also retinal variables, because they describe the quality characteristics ofthe human perception, in contrast to a technical description of a graphical element.Actually, this leads to semiotics - and we view informatics as semiotics engineering [6], because it is interesting to observe that the three main goals ofinformatics (correctness of algorithms, efficiency of programs, and usability ofsoftware systems) turn out to be nicely related to the three semiotic dimensions [7]: 1) Correctness is a matter of syntax to be answered by considering formalaspects only [8]; 2) Efficiency is a matter of semantics related to the object world [9]; and 3) Usability, taking interest and motivation of the end user intoaccount [10]; being our basic assumptions for the following details:

A visual variable is characterized according to Bertin by the kind of scale(nominal ordinal) and the length of the visual variable. The length of a variable is thenumber of distinguishable values that can be perceived by a viewer (for example how manyshades of grey or different hue values can be differentiated) Choosing different visualvariables for representing the same data variable greatly influence the perception andunderstanding of the glyph. It is therefore important to know and appropriately map datavariables to visual variables in the design of a glyph.

Our approach will make use of visual variables to describe the perceptual properties ofa glyph. Ropinski & Preim (2008) and Ropinski, Oeltze & Preim (2011) [11], [12] describe glyph-based visualization techniques in medical visualizations andgive a glyph taxonomy together with guidelines for the usage of glyphs. Ward (2002) [13] describes a taxonomy of glyph placement strategies, were he distinguishesbetween data-driven and structure-driven approaches. He also describes strategies toavoid overlapping problems and proposes a spacefilling layout for structured data.

A very specific type of glyphs was introduced by Chernoff (1973): the so-called Chernofffaces [14]. Chernoff faces are 2D glyphs, which employ human's ability to recognizefaces and small changes in facial characteristics. However the effectiveness of thisform of visualization is still being debated in the scientific community [15], [16].

Kraus & Ertl [17] present in a more technical approach a system for glyph generation (withminimal user interaction) which has been used in a visualization tool in the automotiveindustry.

An overview about the state of the art in the visualization of multi-variate data isgiven by Peng & Laramee (2009) [18] as well as Bürger & Hauser (2007), where they discuss how differenttechniques take effect at specific stages of the visualization pipeline and how theyapply to multi- variate data sets being composed of scalars, vectors, and tensors.Moreover they provide a categorization of these techniques in the aim for a betteroverview of related approaches

[19], with an update published 2009 [20]. Visual data exploration methods on large data sets were described by severalauthors, and particularly Keim (2001) [21], Hege et al. (2001) [22], Fayyad, Wierse & Grinstein (2002), [23], Fekete & Plaisant (2002) [24], and Santos & Brodlie (2004) [25] provide a good introduction to this topic. A recent state-of-the-art reporton glyph based visualization and a good overview on theoretic frameworks, e.g. on thesemiotic system of Bertin, was given by Borgo et. al. (2013) [26].

An interesting application of glyphs for a visual analytics approach for understandingbiclustering results from microarray data has been presented by Santamaria, Theron &Quintales (2008), [27] and another one by Gehlenborg & Brazma (2009), [28] and Helt et al (2009), [29] and a recent work by Konwar et al (2013), [30].

The closest work to use glyphs with an adaptive layout is the work of Legg et al. (2012) [31] in the application domain of sport analysis. Here the data space is eventbased, and the adaptive layout strategy is focused on overlapping events with so called"macro glyphs", which combine several glyphs into one. In the "macro glyph" approachonly scaling and no level of detail (LoD) suitable for different screen spaces areapplied. In the evaluation phase expert interviews at the work environment level. basedon methods described by Tory & Möller (2004) [32] and Plaisant (2004) [33] were done.

Methods

Data glyphs

Data glyphs are composed by (i) a mapping of data variables to visual primitives,e.g. lines, shapes, fonts. Each of the visual primitives is described by its visualcapabilities according to Bertin's visual variables (ii) combination of the visualprimitives into compound shapes, (iii) organization of he compound shaped into levelof details (LoD) and (iv) spatial positioning and rendering algorithms, see FigureFigure11.

Open in a separate window

Figure 1

Multilevel Data Glyphs (figure1.pdf). The overall principle ofmultilevel data glyphs.

Our previous work [34,35] in biomedical visualization resulted in an upper bound of 16 attributesfor the highest level of detail. This number is given be the attribute set in apathological finding, which is composed of patient information (age, sex, year ofbirth, year of death, cause of death, disease free survival), the pathologicalfinding (organ, size of the tumor, lymph nodes staging, metastasis staging, grading,receptor state ) and surgery attributes (origin of the sample, year of surgery,doctor, type of sample). In order to unveil hidden relations by the recognition ofunexpected patterns, as many variables as possible should be integrated within therendering of one glyph. 2D glyph designs are usually limited to up to 5 datavariables, therefore we chose the approach to model data glyphs as 3D objects. Thisresults on the one hand a high information density but on other hand we face theproblems of occlusion, perspective distortion and complex navigation and orientationin 3D space. Usability tests with very first prototypes have indicated that glyphsplacement in 3D space using a perspective projection and the possibility to freelymove within this space was overly burdensome for almost all users, especially formedical experts. To avoid the problems described above, we restricted the 3D space to2.5D or to a ¾ perspective view by applying dimetric (near isometric) projectiongrid, well known from technical illustrations and from some very successfulsimulation games of the 1990s (e.g Civilization ) In a diametric projection grid dataglyphs do not change size as they are moved, so no re-rendering of a glyph isnecessary to simulate a "¾ perspective view. With a dimetric projection gridsalso specific performance optimization strategies, e.g. bitmap caching and selectionhighlighting can be easily applied.

Level of detail

As we want visualize several millions data elements in the smallest level of detail,the screen size of a glyph can be as small as one pixel. Therefore only the visualvariable "value" (from light to dark) or "color" (changes in hue at a given value)can be the starting point. Note: If the maximal number of elements to be visualizedis in the range of several 10.000 elements, we can also choose the visual variableshape as starting point. To achieve well-graduated levels of details and visuallysmooth transition between leves we rely at the principle that the dominant visualvariable of level n is also the strongest visual variable in level n+1.

In previous work [35] several glyph designs were developed, but not evaluated. A systematicevaluation with medical expert (n = 12) resulted in a very clear results, (10/12)were in favour of "cubic glyphs", with the two main arguments: all graphical elementsare necessary and useful (no disturbing visual variables) and the transition betweenlevel is naturally (the form of a rectangular cubic glyph corresponds well to asquare pixels). An example cubic glyph can be seen in Figure Figure2,the2,the corresponding visual variables are summarized in Table Table1.The1.The 3 levels of the cubic glyphs are: (i)

Open in a separate window

Figure 2

Cubic Glyph (figure2.png). Example of a cubic glyph design

Table 1

Visual Variables of the Cubic Glyph

	Visual Variable	Level	Type	Scale	Length

1	primary color	1	color	nom/ord	short
2	height main cube	2	geometry size	ordinal	long
3	color cap	2	color	nom/ord	short
4	color base	2	color	nom/ord	short
5	size cap	2	geometry shape	ordinal	medium
6	height cap	2	geometry size	ordinal	long
7	shape cap	2	geometry size	ordinal	short
8	height west-element	3	geometry size	ordinal	long
9	color west-element base	3	color	nom/ord	short
10	color cap east-element	3	color	nom/ord	short
11	height east -element	3	geometry size	ordinal	long
12	color east -element base	3	color	nom/ord	short
13	color cap east -element	3	color	nom/ord	short
14	height south-element	3	geometry size	ordinal	long
15	color south-element base	3	color	nom/ord	short
16	color cap south-element	3	color	nom/ord	short

Open in a separate window

The pixel level, were one data attribute determines the color of the glypheither by direct mapping, a color gradient or a custom (algorithmic) mapping. Thiscolor will be the dominant color also in all higher levels. The pixel level isapplied, when the screen size of a glyph is below 2x2 pixels. At the pixel level auser can interact (filter, group, arrange, cluster) with several million glyphs. (ii)In the iconic level we add 6 additional visual variables. At the iconiclevel a user can interact (filter, group, arrange, cluster) with several thousandselements. And finally (iii) the detail level, were we add 9 geometric primitives tothe data glyph, which results in an overall number of maximal 16 data attributesmapped to a single glyph. A glyph is rendered in the detail view when its screen sizeis greater then 64x64 pixels. At the detail level a user can interact (filter, group,arrange, cluster) with several thousands elements

Glyph Placement

According to the taxonomy given by Ward [13] we support:

• User driven placement, in which case the user determines the position of aglyph through interaction tasks (selection, filtering, movement, grouping)

• Data driven placement, in which case data values are used to specify thelocation of the glyph. Our placement strategy supports value discretization andjittering strategies for the placement in an dimetric projection grid,

• Structure driven placement, in which case relationship between data pointsdetermines the location of a glyph. We support structure directly derivable from thedata values, e.g. grouping glyph representing cancer cases by year of surgery, sexand cancer staging, and glyph placements determined by interactive ant clusteringalgorithm.

Figure Figure33 shows a spatial arrangement of glyphs in iconic levelin an age pyramid. All male patients are on the left side and female patients on theright side. The vertical position of a glyph is determined by the patients age andthe horizontal position by the size of the tumor given by the T-staging of thepathological finding [36]. The T-staging is also the variable used in the mapping of the primarylevel.

Open in a separate window

Figure 3

Cubic glyphs arranged in an age pyramid (figure3.png). Spatialarrangement of iconic glyphs in a age pyramid. All male thyroid cancer patientsare on the left side and female patients on the right side. The verticalposition of a glyph is determined by the patients' age and the horizontalposition by the T-staging. The T-staging is also the variable used in themapping of the primary level.

Mapping validation

A data glyph can be configured through the mapping of data variables to theparameters of its geometric primitives. This is on the one hand a very powerful tool,as the user can map any data attribute to any geometric parameter, and even changethe mapping on the fly, on the other hand its also crucial, because the greatflexibility could easily lead to faulty mappings (e.g. mapping a nominal variable tothe position of a geometrical primitive) and in succession to misinterpretations ofthe visualizations results. In order to avoid those mismatches we provide anautomatic validation of the variable mapping.

In the automatic validation, we compare meta information about data variables - scaleof measurement (discrete, continuous, categorical, ordinal, interval, nominal) andthe number of distinct values - to the visual capabilities of the glyph elements. Theverification is done according to the following rules:

The shape of a geometric primitive is purely nominal and should thereforenever be mapped to ordinal data values. However we can recognize a almost infinitevariety of shapes (the shape variable is "very long").

The perceptual variable color (hue) is a nominal variable, even though thewavelength of light assigns an ordering to colors, the human perceptual system takesno notice of it. There is some cultural ordering imposed on hue (red is "hotter" thanblue), but it is weak because not all hues are related. A non-color deficient personcan distinguish between seven and ten million different colors. However, color is adeeply subjective attribute, and therefore not more than 10 to 20 carefully chosencolor values should be used in color mapping. A great tool for carefully designedcolormaps, which e.g. provides "colorblind safe" suggestions, can be found atcolorbrewer2.org[37]

Value (the brightness of an element) and the texture (with respect to the grain sizeof the texture) are ordered and can be mapped to an ordinal scale. Value and textureare short variables, i.e. roughly 10 values can be distinguished in an effectiveway.

The position of a glyph can be mapped to ordinal values, and is a very fine-grained(long) variable. The size of a geometric primitive, or even of the whole glyphelement can also be mapped to ordinal values, but it is "shorter" than the positionvariable.

Finally the orientation of a geometric primitive can be mapped to an ordinal datavalue, but this is a very short viusal variable, i.e. only very few differentorientations can be perceived.

Results

We use multilevel data glyphs in the disease analyser, a visual analyticapplication for the interactive exploration of a database containing approximately 1,4million cancer cases. Each record describes a comprehensive diagnosis of a cancerous(malignant) tumor case. The most used variables are patient age and sex, the ICDNclassification, the TNM staging, grading receptor states and information about the timeunder risk, disease free survival and overall survival together with surgeryinformation.

Figure Figure44 shows the mapping of the data variables to visualvariables of the data glyph. In this interface we use "traffic light" indicator to showthe validity of the mapping.

Open in a separate window

Figure 4

Variable Mapping (figure4.png). Mapping of the data variables to visualvariables of the data glyph. A "traffic light" visualization indicates thevalidity of the mapping.

• Green: All data scales fits to the scale of corresponding visuals variable thelength of all visual variables is equal/greater then the corresponding distinct datavalues.

• Yellow: All data scales fits to the scale of visuals variables and the length ofsome visual variable is smaller then the number of corresponding distinct datavalues.

• Red: There is a mismatch (minimal one) attribute scale and the scale of thecorresponding visual variable.

Figure Figure55 shows approx. 70.000 randomly selected entities from thedisease database. We took this high number of cases to get a proportionate sampling forall organs. For this high number of cases glyphs are rendered in the pixel level, i.e.the T-staging (size of the tumor) maps to the color of the. The spatial position of theglyphs in the starting view is just determined by the ordering of the cases within thedatabase.

Open in a separate window

Figure 5

70000 cancer cases randomly selected from the disease databaseDistribution of Teaching Types (figure5.png). Approx. 70.000 randomlyselected entities from the disease database. For this number of elements we usethe pixel level for the data glyph, i.e. only the color of the glyph is given byits primary mapping, the T-staging. The spatial position of the glyphs in thestarting view is just determined by the ordering of the cases within the database.In the lower part of the disease analyser histograms of the variablesused in the glyph mapping are shown

In the lower part of the disease analyser histograms of the attributes ofcancer findings are shown. Figure Figure66 shows the histograms for theexamination year, sex, age, disease free survival, T-staging, N-staging, M-staging andthe grading. In the next step an expert can divide cases into two subgroups, in ourexample by patient age. The histogram view shows the value distribution of the selectedcases (green area) in relation to the overall distribution of cases (blue area). Thespecification of subgroups (filtering by value ranges for each attribute) together withglyph highlighting and re-ordering can be done in real-time. The interface for thisfiltering task is embedded into the histograms (red sliders). See the supplement video"linked histogram sliders.mov".

Open in a separate window

Figure 6

Selection of Subgroups (figure6.png). Histograms for the examination year,sex, age, disease free survival, T-staging, N-staging, M-staging and the grading.The histogram view shows the value distribution of the selected cases (green area)in relation to the overall distribution of cases (blue area). See also theadditional file suppl_linked_sliders.mov

In the next example an expert compares cancer cases for different organs. Figure Figure77 shows 2109 thyroid cancer cases and 1782 lung cases, both arrangedin an age pyramid. The relatively low number of cases result in a screen size, thereforethe rendering of the glyphs is done at the iconic level. In Figure Figure88 we see the iconic glyphs in a zoomed state (upper part of the thyroidcancer). The visualization shows difference in gender distribution (much more men havelung cancer), difference in mortality (much more black caps in lung cancer then inthyroid cancer), high overall survival of a subgroup in thyroid cancer (glyph withoutblack cap). Beside of the overview and comparison of two medium size groups, outlierscan be identified easily (thyroid cancer cases with age of 0 and 100 years, which aredata input errors).

Open in a separate window

Figure 7

Comparison of 2109 thyroid and 1782 lung cancer cases. Selection ofSubgroups (figure7.png) 2109 thyroid cancer cases and 1782 lung cases, botharranged in an age pyramid. The relatively low number of cases result in biggerglyphs sizes, therefore the rendering of the glyphs is done at the iconiclevel.

Open in a separate window

Figure 8

Detail view of the thyroid cancer visualization with iconic glyphs(figure8.png) Zoom-in of the visualization of figure 10.

Figure Figure99 shows about 11.000 colon cancer cases rendered in thepixel level. The glyphs are grouped by the examination year (1984 to 2004). For eachyear the glyphs are arranged in an age pyramid. Here a medical expert can overview avery large number of cases and recognise in a trend analysis several aspects. For coloncancer cases the following observations were made. (i) There is a strong increase ofcases, (ii) a shift in age distribution and increase in small tumors through by earlywarning programs can be clearly seen and (iii) two outliners in the 1999/2000 for malepatients in the age group 75-80 were identified, with no explanation yet.

Open in a separate window

Figure 9

11.000 colon cancer cases grouped by the examination year. Thevisualization is done at the pixel level (figure9.png). 11.000 colon cancer cases,grouped by the examination year (1984 to 2004). For each year the glyphs arearranged in an age pyramid. A medical expert can overview a very large number ofcases and recognise in a trend analysis several aspects, e.g. the increase ofcases, shift in age distribution, increase in small tumors through by earlywarning programs, two outliners in the 1999/2000 for male patients in the agegroup 75-80 (no explanation yet).

Figure Figure1010 shows the regrouping of the colon cancer cases to 5year time periods. In the iconic view we can see additional information about the mortalstate and disease free survival period of a patient. In the period (1995-1999) it wasclearly identified, that the number of cases with not T-staging (white glyphs) is muchhigher for male patients as for female. There was no hypothesis to explain thisdifference. Further investigation explained this as wrong classification, as most of thwcases included a secondary finding about a colon tissue, which is done in combinationwith a prostate biopsy.

Open in a separate window

Figure 10

Colon cancer age pyramid of 5-year periods. The visualization is done withiconic glyphs (figure10.png). Regrouping of the colon cancer cases to 5 year timeperiods. In the iconic view we can see additional information about the mortalstate and disease free survival period of a patient.

A further zoom-in shows the glyphs in the detail view, see Figure Figure11.11. The user can now compare the N-staging, M-staging and the grading for asmall number of glyphs. The disease analyser shows the variable values of the currentselected element in the histogram view and the full text diagnosis is shown in a textwindow on the right side (blurred for anonymisation). Here the disease analyser is usedto manually select and compose subgroups for clinical studies. In our example twosubgroups of colon cancer tissues were selected, by maximum difference in grading anddisease free survival together with a preferably complete follow up diagnosis.

Open in a separate window

Figure 11

Colon cancer cases, manually grouped. The visualization is done with detailglyphs (figure11.png). Further magnification shows the glyphs in the detail view.The user can compare the N-staging, M-staging and the grading for a small numberof glyphs. The disease analyser depicts the variable values of thecurrent selected element in the histogram view. Additionally the full textdiagnosis of the selected element is shown in a text window.

Discussion

The utilization of multilevel data glyphs in the disease analyser was a valuable sourcefor the development of our glyph design criteria. In the design process we faced thefollowing challenges:

• Occlusion: 3D glyphs provide on the one hand high data density, but onthe other hand face the problem of occlusion. To minimize the occlusion effect we putthe main visual variable on top of the geometry (especially in the iconic view) andlimit the height of the data glyph. Perspective distortions are avoided by the use aparallel projection (2½D view of an object with forced depth). We use either adimetric projection or a cavalier or military projection when the glyphs should be seenfrom a higher point of view.

• Secondary colors: Multilevel glyphs consist of complex geometry, whereeach geometric primitive can be colored independently. This may result in undesirablesecondary (mixed) colors. To avoid this effect a good glyph design provides a cleargradation of visual variables, especially for color perception. Such a gradation can beachieved through well defined increments of the graphic primitives size and a restrictedcolor mapping for individual graphical primitives. In some special cases secondarycolors could be used intentionally, e.g. to visualize the coincidence of two values in alarge data set.

• Grid patterns: When data glyphs are arranged in a dense grid unwantedpatterns can occur. To avoid this, a good glyph design is based on a symmetricalskeletal structure. Especially in the iconic view it is crucial to model borders of theglyph, in order to provide a good visual differentiation. In the simplest case a bordercan be realized through a plinth as a neutral base element.

During beta testing the disease analyser was used by 12 experts working in the field ofbioinformatics, computational biology and medical research. The first group had a focuson data acquisition, automatic classification of medical records and data qualityissues. The focus of the second group was on data analysis, e.g. the development of thehealth care system, and hypothesis generation. The following observations and statementsdescribe their experience and provide valuable input for further developments:

• The disease analyser is very well suited to find outliers and "white spaces" inthe source data.

• Snapshot and bookmarking functionality is missing.

• The selection of subgroups within the histograms and the visual comparison ofthe value distributions were very much appreciated.

• In research tasks, the disease analyser was used to compare two to foursubgroups.

• Manual arrangement and sorting of cases was used often.

• The fast availability of the full diagnosis text for the selected data glyph isan important feature.

• When a hypothesis is generated there should be a report module to(statistically) compare the involved subgroups and to print out a report.

Conclusions

We developed multilevel data glyphs for the visualization of large medical data sets.The data glyphs provide

• three levels of detail (semantic zoom) suitable for a different screen space,and a

• validation of the data variable mapping.

We used multilevel data glyphs in the disease analyser, a visual analyticapplication for quality control and exploration of a comprehensive collection of cancerdisease records. Three concrete glyph designs and design rules resulted out of thehands-on- experience.

We plan to integrate the proposed data glyphs as a visual front end to the biobank ofthe Medical University Graz and for quality assurance tasks of data record related tocancer samples and to apply the visualization method for strategic planning and trendanalysis in the medical domain. In the undertaking we will use a lightweight (webGL)version of data glyphs, which can be used as visualization components in a webpageconnected to a local datagrid or through a web service to a central medicaldatabase.

There are a lot of studies to compare of 2D versus 3D visualization techniques for thevisualizations of spatial related data, e.g. medical renderings or geographic data.However there is now systematic evaluation known to the authors comparing 2D glyphs to3D and 2½D (isometric) techniques for abstract information. For abstractinformation no inherent mapping of the data either to the 3D shape of a glyph nor thespatial position is given, which would be a natural mental model for users of thevisualization results. Lie et al [38] have discussed design and realization aspects (occlusion, depth perceptionand visual cluttering) of glyph based 3D-data visualization with a focus on glyphplacement. Their work is a good starting point for a systematic evaluation of theshape/placement of 2½D glyphs providing high data density versus 2D shapes, whichare less challenging for the user perception.

A second open research question is how to build and evaluate smooth transitions betweendifferent levels of glyph abstraction. In the current work the glyph rendering methodwas changed due to the glyph size in the screen space. The configuration of "switchingpoints" was done with a heuristic approach, and carefully (manual) designed glyphgeometry resulted in a smooth visual transition. However a systematic study anddescription of the methodology of glyph transitions (fusion of semantic and graphicalzoom) has still to be done.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

HM and KZ conceived the idea in the analysis of the pathological finding database of theMedical University Graz. HM developed the medical glyph concept and defined togetherwith KZ and AH medical needs and usability criteria. The OpenGL/C++ application waswritten by HM and RR. The database was administered by RR. All authors read and approvedthe final manuscript.

Supplementary Material

Additional File 1:

Video of linked sliders (suppl_linked_sliders.mov). Linked histogramsliders for the selection of subgroups

Click here for file^{(684K, mov)}

Acknowledgements

This work was funded by the FIT-IT programme (813 398) and by the Austrian Fonds zurFörderung der wissenschaftlichen Forschung (FWF, L427-N15). Medical data wereprovided in the context of the Austrian Genome Programme GEN-AU and the CRIP project.Our thanks are due to all partners, for their contributions, critical reviews andvarious discussions. The work has been approved by the Ethical Committee of theMedical University of Graz.

Declarations

Publication for this article has been funded by the Christian Doppler laboratory forbiospecimen research and biobanking technologies.

This article has been published as part of BMC Bioinformatics Volume 15Supplement 6, 2014: Knowledge Discovery and Interactive Data Mining inBioinformatics. The full contents of the supplement are available online athttp://www.biomedcentral.com/bmcbioinformatics/supplements/15/S6.

References

Holzinger A, Zupan M. KNODWAT: A scientific framework application for testing knowledge discoverymethods for the biomedical domain. BMC Bioinformatics. 2013;14(1):191. [PMC free article] [PubMed] [Google Scholar]
Holzinger A. In: Multidisciplinary Research and Practice for Information Systems, SpringerLecture Notes in Computer Science LNCS 8127. Alfredo Cuzzocrea CK, Dimitris E. Simos, Edgar Weippl, Lida Xu, editor. Heidelberg, Berlin, New York: Springer; 2013. Human-Computer Interaction & Knowledge Discovery (HCI-KDD): What is thebenefit of bringing those two fields to work together? pp. 319–328. [Google Scholar]
Turkay C. Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, BigData. Springer Berlin Heidelberg; 2013. Hypothesis Generation by Interactive Visual Exploration of Heterogeneous MedicalData; pp. 1–12. [Google Scholar]
Baldonado M, Woodruff A, Kuchinsky A. in Proc of Advanced Visual Interfaces (AVI 2000) ACM Press; 2000. Guidelines for Using Multiple Views in Information Visualization; pp. 110–119. [Google Scholar]
Bertin J, Barbut M. In: Semiology of Graphics Diagrams Networks Maps. J. Berg, editor. University of Wisconsin Press; 1983. (French edn. 1967) [Google Scholar]
Holzinger A, Searle G, Auinger A, Ziefle M. In: Universal Access in Human-Computer Interaction Context Diversity, Lecture Notesin Computer Science, LNCS 6767. Stephanidis C, editor. Berlin, Heidelberg: Springer; 2011. Informatics as Semiotics Engineering: Lessons Learned from Design, Development andEvaluation of Ambient Assisted Living Applications for Elderly People; pp. 183–192. [Google Scholar]
Andersen PB. What Semiotics can and cannot do for HCI. Knowledge- Based Systems. 2001;14(8):419–424. [Google Scholar]
Hoare AR. Proof of correctness of data representations. Acta Informatica. 1972;1(4):271–281. [Google Scholar]
Nake F, Grabowski S. Human-Computer Interaction viewed as Pseudo- Communication. Knowledge-Based Systems. 2001;14(8):441–447. [Google Scholar]
Holzinger A. Usability engineering methods for software developers. Communications of the ACM. 2005;48(1):71–74. [Google Scholar]
Ropinski T, Preim B. SimVis - Simulation and Visualization: 2008; Magdeburg. SCS Publishing House; Taxonomy and Usage Guidelines for Glyph-based Medical Visualization; pp. 121–138. [Google Scholar]
Ropinski T, Oeltze S, Preim B. Survey of glyph-based visualization techniques for spatial multivariate medicaldata. Computer & Graphics. 2011;35(2):392–401. [Google Scholar]
Ward MO. A taxonomy of glyph placement strategies for multidimensional datavisualization. Information Visualization. 2002;1(3-4):194–210. [Google Scholar]
Chernoff H. Use of Faces to Represent Points in K-Dimensional Space Graphically. 342. Vol. 68. J Am Stat Assoc; 1973. pp. 361–368. [Google Scholar]
Morris CJ, Ebert DS, Rheingans PL. Experimental analysis of the effectiveness of features in Chernoff faces. 28th AIPR Workshop: 3D Visualization for Data Exploration and Decision Making:2000 International Society for Optics and Photonics: 12-17.
Lee MD, Reilly RE, Butavicius ME. Proceedings of the Asia-Pacific symposium on Information visualisation Volume24. Australian Computer Society, Inc; 2003. An empirical evaluation of Chernoff faces, star glyphs, and spatial visualizationsfor binary data; pp. 1–10. [Google Scholar]
Kraus M, Ertl T. Interactive Data Exploration with Customized Glyphs. International Conferences in Central Europe on Computer Graphics, Visualizationand Computer Vision (WSCG 2001): 2001; Pilzen (Czech Republic) pp. 20–23.
Peng Z, Laramee S. Higher Dimensional Vector Field Visualization. A Survey in Theory and Practice of Computer Graphics (TPCG '09); 2009. pp. 149–163. [Google Scholar]
Bürger R, Hauser H. Visualization of multi-variate scientific data. Proceedings of EuroGraphics; 2007. pp. 117–134. [Google Scholar]
Fuchs R, Hauser H. Computer Graphics Forum: 2009. Wiley Online Library; Visualization of Multi-Variate Scientific Data; pp. 1670–1690. [Google Scholar]
Keim DA. Visual exploration of large data sets. Communications of the ACM. 2001;44(8):38–44. [Google Scholar]
Hege H-C, Hutanu A, Kähler R, Merzky A, Radke T, Seidel E, Ullmer B. Progressive retrieval and hierarchical visualization of large remote data. Scalable Computing: Practice and Experience. 2001;6(3):60–72. [Google Scholar]
Fayyad UM, Wierse A, Grinstein GG. Information visualization in data mining and knowledge discovery. Morgan Kaufmann; 2002. [Google Scholar]
Fekete J-D, Plaisant C. Information Visualization, 2002 INFOVIS 2002 IEEE Symposium on: 2002. IEEE; Interactive information visualization of a million items; pp. 117–124. [Google Scholar]
Dos Santos S, Brodlie K. Gaining understanding of multivariate and multidimensional data throughvisualization. Computers & Graphics. 2004;28(3):311–325. [Google Scholar]
Borgo R, Kehrer J, Chung DH, Maguire E, Laramee RS, Hauser H, Chen M. Eurographics 2013-State of the Art Reports. The Eurographics Association; 2012. Glyph-based Visualization: Foundations, Design Guidelines, Techniques andApplications; pp. 39–63. [Google Scholar]
Santamaría R, Therón R, Quintales L. A visual analytics approach for understanding biclustering results from microarraydata. BMC Bioinformatics. 2008;9(1):247. [PMC free article] [PubMed] [Google Scholar]
Gehlenborg N, Brazma A. Visualization of large microarray experiments with space maps. BMC Bioinformatics. 2009;10(Suppl 13):O7. [Google Scholar]
Helt G, Nicol J, Erwin E, Blossom E, Blanchard S, Chervitz S, Harmon C, Loraine A. Genoviz Software Development Kit: Java tool kit for building genomicsvisualization applications. BMC Bioinformatics. 2009;10(1):266. [PMC free article] [PubMed] [Google Scholar]
Konwar KM, Hanson NW, Pagé AP, Hallam SJ. MetaPathways: a modular pipeline for constructing pathway/genome databases fromenvironmental sequence information. BMC Bioinformatics. 2013;14(1):202. [PMC free article] [PubMed] [Google Scholar]
Legg PA. Computer Graphics Forum. 3pt4. Vol. 31. Blackwell Publishing Ltd; 2012. MatchPad: Interactive Glyph Based Visualization for Real Time Sports PerformanceAnalysis; pp. 1255–1264. [Google Scholar]
Tory M, Moller T. Human factors in visualization research. Visualization and Computer Graphics, IEEE Transactions on. 2004;10(1):72–84. [PubMed] [Google Scholar]
Plaisant C. Proceedings of the working conference on Advanced visual interfaces:2004. ACM; The challenge of information visualization evaluation; pp. 109–116. [Google Scholar]
Müller H, Zatloukal K, Streit M, Schmalstieg D. Proceedings of the Conference on BioMedical Visualisation. London, UK; 2008. Interactive Exploration of Medical Data Sets; pp. 29–35. [Google Scholar]
Müller H, Reihs R, Sauer S, Zatloukal K, Streit M, Lex A, Schlegl B, Schmalstieg D. Proceedings of the13th International Conference on InformationVisualisation. Barcelona; 2009. Connecting Genes with Diseases; pp. 323–330. [Google Scholar]
Greene F. American Joint Committee on Cancer; New York : Springer; 2002. AJCC cancer staging handbook. [Google Scholar]
Harrower M, Brewer CA. ColorBrewer. org: an online tool for selecting colour schemes for maps. Cartographic Journal. 2003;40(1):27–37. [Google Scholar]
Lie AE, Kehrer K, Hauser H. Proceedings of the 25th Spring Conference on Computer Graphics. ACM; 2009. Critical design and realization aspects of glyph-based 3D data visualization; pp. 27–34. [Google Scholar]

Articles from BMC Bioinformatics are provided here courtesy of BMC