Chorus is a free, evolving, data harvesting and visual analytics suite designed to facilitate and enable social science research using Twitter data.
If you want to reference the Chorus project, please use the following citation: Brooker, P., Barnett, J., & Cribbin, T. (2016). Doing social media analytics. Big Data & Society, 3(2). Download
The promise of social media as a resource and topic of social science research is thus far as frustrating as it is tantalising. It is widely recognised that social media may have much to offer academic research, yet acquiring and making effective use of this material – part of what some refer to as ‘the big data challenge’ – seems to sit just outside of the technical skill set of many social scientists. Computer science possesses a rich array of algorithms, libraries and tools that can and have have been applied to the problem of mining big social datasets. Ongoing work is also making progress on the specific problems (e.g scale, noise and sparsity) associated with new social data like micro-blogs. However, the majority of these computational techniques have failed to filter through to social science in any significant way. The chief barrier to this is accessibility – many researchers simply don’t have the knowledge, time nor the interest required to master complex data mining tools, much less develop their own custom-coded solutions.
And why should they? Social scientists are experts in social theory and research methods whilst computer scientists have their own expertise in areas like machine learning and visualisation. Instead, we need to create a mutually productive relationship which requires both a level of technical understanding from social scientists, and a sensitivity to the methodological and analytic interests of social research on the part of computer scientists.
Tackling this barrier head-on, Chorus is a software development project that aims to facilitate social media research for social science by bringing together the existing algorithms and metrics from the computer sciences with the requirements and methodologies of the social sciences. The Chorus initiative began in 2011, having its origins in two projects that were being undertaken at Brunel University – MATCH (www.match.ac.uk), a research programme investigating various issues around medical device manufacture, and FoodRisC (www.foodrisc.org), a European initiative directed towards improving risk communication around food issues. The team is an interdisciplinary collaboration of programmers, web developers, and social scientists in the role of requirements engineers. Hence, the Chorus project is an attempt to utilise a broad array of expertises to furnish social science with a bespoke social media data capture and analysis tool, for both quantitative and qualitative research, and to find a way of making the technical world of algorithms more user-friendly for an audience unused to dealing with them.
The Chorus package currently comprises of two distinct programs:
Firstly, we have Chorus-TCD (TweetCatcher Desktop). Tweetcatcher allows users to sift Twitter for relevant data in two distinct ways: either by topical keywords appearing in Twitter conversation widely (i.e. semantically-driven data) or by identifying a network of Twitter users and following their daily ‘Twitter lives’ (i.e. user-driven data).
Secondly, we have Chorus-TV (TweetVis), which is a visual analytic suite for facilitating both quantitative and qualitative approaches to social media data in social science. Visual analytics (VA) is an interdisciplinary computing methodology combining methods from data mining, information visualization, human-computer interaction and cognitive psychology. The VA approach is highly relevant to the aims of Chorus, enabling exploratory analysis of social media data in an intuitive and user-friendly fashion. Two main views are available within Chorus-TV. The Timeline Explorer (below) provides users an opportunity to analyse Twitter data across time and visualize the unfolding Twitter conversation according to various metrics (including tweet frequency, sentiment, semantic novelty and homogeneity, collocated words, and so on).
[Click on image to enlarge]
By contrast, the Cluster Explorer (below) allows users to delve into the semantic and topical makeup of their dataset in a way that is significantly less reliant on the chronological ordering of topics. Cluster explorer represents semantic similarity on a 2D map, which displays the semantic similarity of intervals, tweets and terms as their proximity to each other in the cluster map. This provides access to interval-level, tweet-level and term(word)-level visualisations and provides a means for users to explore the different topics prevalent within their dataset and trace relationships between them via ‘topical nodes’ (which may form central ‘hub topics’ from which other sub-topics branch outwards).
[Click on image to enlarge]
Our choice of Twitter as an initial case is based on its status as a ‘simplest case’ of social media data, due to it essentially consisting of short text and links to other media. However, one of the challenges for the future of Chorus will be to conceive of analytically useful ways of visualising data other than short text, including images and sounds, which would allow for an expansion of the software into other social media platforms (such as blogs, Facebook, Tumblr, Instagram, SoundCloud, FourSquare, and so on). More widely, the chief ongoing challenge for social media research as a field will be in the continued development of a research-supporting software infrastructure (and accompanying methodologies that enable social scientists to make sensible use of software such as Chorus) in such a way as to be both intuitive to use and flexible enough to be tailored to a wide range of specific and unspecified research questions.
Since being released into as freeware in September 2013, Chorus has gained over 500 registered users from a wide range of countries, disciplines, and institutions. The decision to offer Chorus as a free tool suite was driven by two wishes: firstly, to encourage hesitant social researchers to dip their toes into the rich landscape of Twitter discourse and, secondly, to help us, as developers, to better understand what works (and what doesn’t) with the current implementation and to set the agenda for future research and development on the other. The technical development of software such as Chorus (and the continued feedback we hope to get from social science-trained users) is, we hope, the first step towards formalising a robust social science research programme that can take advantage of the possibilities of social media data in an empirically defensible way. To that end, we welcome any queries about our project and about gaining access to our tools, and are eager to hear the thoughts and comments of interested users via the email address listed below.
We intend to update this website on a regular basis, primarily with software updates but also with tutorial and case study posts. We also encourage Chorus users to share their own experiences with us, either by commenting on our blogs or by contributing their own posts. For instance, we particularly welcome the submission of short (ideally <1000 words) case study articles. Contact us at team(at)chorusanalytics.co.uk if you want to contribute something.
We look forward to hearing from you!
Note. This introduction is a modified version of Phil Brooker’s piece as published on the Digital Methods NMI blog in April 2013.