Big Data/BI Zone is brought to you in partnership with:

Eric is the Editorial Manager at DZone, Inc. Feel free to contact him at egenesky@dzone.com Eric has posted 805 posts at DZone. You can read more from them at their website. View Full User Profile

# Fumblings with Ranked Likert Scale Data in R

07.13.2012
| 2774 views |

The following article was originally written by Tony Hirst over at this blog, OUseful info.

The code is horrible and the visualisations quite possibly misleading, but I’m dead tired and there are a couple of tricks in the following R code that I want to remember, so here’s a contrived bit of fumbling with some data of the form:

 enjoyCompany tooMuchFamily 1 strongly agree strongly disagree 2 strongly agree strongly disagree 3 neither agree nor disagree strongly disagree … … …

That is, N rows, no identifiers, two columns; each column relates to a questionnaire question with a scaled response enumerated as ‘strongly agree’,'agree ‘,’neither agree nor disagree’,'disagree’,'strongly disagree’.

THe first thing I tried to do was some “traditional” Likert scale style stacked bar charts using ggplot2 (surely there must be a Likert scale visualisation library around? If so, how would it work with data in the above (and below) forms? Answers via the comments please…)

require(reshape)
require(ggplot2)
#My sample data doesn't have row based identifiers, so here's a hacked incremental index based ID
fd\$a=1
fd\$b=cumsum(fd\$a)
fd=subset(fd,select=c('enjoyCompany','tooMuchFamily','b'))
#melt the data into a dataframe with 3 cols: the id col, /b/; a /variable/ column that contains the original column heading; and a /value/ column that contains the original cell value for the corresponding row and column.
ff=melt(fd,id.var='b')
#Get rid of blank values
ff=subset(ff,value!='')
#Get rid of unused levels
ff\$value=factor(ff\$value)
##Check:
#levels(ff\$value)
#Reorder the levels into a meaningful order
ff\$value <- factor(ff\$value, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree')))
ggplot(ff)+geom_bar(aes(variable,fill=value))+ coord_flip()

A couple of notable issues with the resulting diagram:

- the colours aren’t that pleasing to look at;
- we have lost all sense of correlation between values. We may like to think that the agree/strongly agree ratings from one question are corrleated with the disagree/strongly disagree responses from the other, but there is nothing in that chart that says this for sure…

However, a pairwise comparison may help…

#Let's count how many times the different scale values occur with each other, and then plot some sort of correlation plot.
fs=as.data.frame(table(subset(fd,select=c('enjoyCompany','tooMuchFamily'))))
fs=subset(fs,enjoyCompany!='' & tooMuchFamily!='')
fs\$enjoyCompany <- factor(fs\$enjoyCompany, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree')))
fs\$tooMuchFamily <- factor(fs\$tooMuchFamily, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree')))
ggplot(fs)+geom_point(aes(x=enjoyCompany,y=tooMuchFamily,size=Freq

If I had rather more than two question columns, how would I generate a lattice of pairwise correlation charts to get a visual overview of the how all the question answers interact at the pairwise level?

Published at DZone with permission of its author, Eric Genesky. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:
"Starting from scratch" is seductive but disease ridden
-Pithy Advice for Programmers