Earlier this year I gave a talk on data journalism at a conference at Stanford University that focused on the right to information and transparency in the digital age. The talk focused on sourcing practices in data journalism and was based on a research project that I am currently working on. The project examines sourcing practices and knowledge production at the Guardian, the New York Times and ProPublica, based on interviews with journalists and analysis of data journalism projects.
Below are the slides from my talk.
One of the points I made was that sourcing practices in data journalism are often closer to traditional sourcing methods: negotiations with human sources, interviews with experts, and so on, than popular literature may portray them.
Another interesting aspect regarding sourcing in data journalism is the production of original data. Journalists resort to gathering their own data when no data is publicly available about an issue and they judge that the topic is very relevant for the public, or they feel that the subject would make a good story. For example, when the government had not yet published a list of banks which received bailouts, ProPublica, a US-based investigative outlet, compiled a database by following and extracting information from press releases. The production of original datasets is seen as a way to counter-balance the predominance of a limited number of government sources.