Cornell University
more options

Data Vision: Demystifying Algorithms and Analysis

Like art, algorithms have a way of skirting a clear, consensus definition.

Algorithms are indeed the foundational tools at the heart of much of our everyday computing – the ephemeral inner gears that, amongst other things, filter search results and tailor Facebook News Feeds. For computer and information scientists, they are math-based rules that help transform a huge amount of data into actionable insights. Beyond that, the level of understanding varies significantly. Even for the very professionals who create and use algorithms, explaining the process of turning data into evidence is complicated and escapes the dry and tidy “Methods” and “Conclusions” sections of research papers and presentations.

This is both a problem and an opportunity, according to Samir Passi, an information science PhD candidate at Cornell University.

He, along with coauthor Info Sci Professor Steven Jackson, is out to study the learning, research, and practice of data analysis to highlight the oft-invisible human work of creating and using algorithms, drawing attention to the contingent and discretionary aspects of algorithmic work. The duo’s paper, “Data Vision: Learning to See Through Algorithmic Abstraction,” received Best Paper at the Computer-Supported Collaborative Work and Social Computing Conference, CSCW for short, the field’s premiere conference held in late February.

“It’s true that algorithms have rules, but it is not as if they are entirely bound by them,” Passi said. “Algorithms are, in fact, rule-based and not rule-bound, and this is what opens up spaces for discovery and creativity – an argument that we make in the paper both theoretically and empirically.”

To showcase this central aspect of data analytics, Passi conducted ethnographic research in two graduate-level machine-learning and text-mining classes as well as a series of three digital humanities workshops. In one case, Passi observed how workshop students and organizers analyzed text from English gothic novels through a series of algorithms to better understand the linguistic characteristics of the genre. To get what they wanted, students had to think on their feet about what they were asking their algorithms to do regarding the unique nature of the data that they were working on. In another case, Passi observed how machine-learning students learned to see algorithms as a series of mechanical rules that, if followed correctly, lead to desired outcomes.

Highlighting how these two seemingly contradictory approaches to data analytic algorithms are built up, Passi and Jackson underscore a central point in their paper, making the case that practical issues and empirical contingencies are reminders “that the world requires a great deal of discretionary work for it to conform to high-level data analytic learning, expectations and analyses.”

“Algorithms are not simply plug-and-play tools, and if you were a data analyst you would know that already. But often the way data results are communicated to the wider audience, there is no way that you would know this unless you had some amount of technical competence,” he said. “You see big, bold numbers, colorful charts and graphs, but you don’t really know the process behind the construction of those numbers and results. It would allow people to contextualize and evaluate algorithmic knowledge better if there was a way to showcase parts of the data analytic process itself.”

With Data Vision, the authors see an opportunity for scholars to, as Passi says, “open up the black box of data analysis to better understand the strengths and limitations of algorithmic knowledge.”

“The discretionary character of data analytic practice doesn’t mean that seeing and approaching algorithms as a set of rules is wrong or incorrect. It’s actually a powerful way of computational thinking and a way of organizing the world with data and algorithms to produce new forms of knowledge," he said. "What we argue is that being a data analyst requires one to not only organize and manipulate the world through data and algorithms, but also master forms of discretion and improvisation around established methods in the face of empirical contingency.”

This juggling of seemingly mechanical rules with discretionary problem-solving is – as Passi and Jackson point out – at the heart of data vision.

Louis DiPietro is the communications coordinator for Cornell's Department of Information Science.