@SNOW/WWW, 2013, by Matthias Gallé, Jean-Michel Renders and Eric Karstens
Who was the first outlet to report on a news event? Given that ”breaking news” is one of the golden goals of many journalists, we were surprised of not finding any large-scale data-drive study to answer this question. We therefore set out to do it ourselves.
But before describing how we did it and what we found, and given the scope of this workshop, let us argue why we considered only traditional news media and not social media. Concerning blogs, a 2009 study  on high-profile events (those characterisable by a meme) showed that only 9% of these generated in the blogosphere, the remaining coming from traditional sources. Of course Twitter changed all this, and it is becoming more and more normal to have average citizen on the ground to report first-hand on an event. Two recent surveys however show that while this may be interesting for an active news-seeker, the average user still gets his news from traditional websites. The Pew Research Centre’s Project for Excellence in Journalism  reported recently in their annual report that only 9% of those US adults that get news on digital device do so through Twitter or Facebook recommendations. The rest gets it either directly from news websites (36%), searching (32%) or through a news-aggregator (29%). A New York Times survey  confirms this trend with the discovery that, no matter from which source people heard about a story, 60% of the people turned to an established outlet to confirm it.
Traditional media still seem to have their important place in the new-sphere, and being the first one to report on an event may arguable be even more important in this social media age, as it may act as an approval of social posts.
In our study we wanted to answer two questions regarding a hypothetical existence of a group of “gatekeepers”: news-outlets that either
1. were the only ones providing original news content,
2. were a mandatory passage to get media attention.
To get useful data to be able to answer these questions in a global way we faced a challenging problem of a news-aggregator that should not focus only on the most salient news, but recover also those that were of minor interest. At the same time we didn’t want to get biased by events that were only of local importance. In the paper we provide the technical details of how we achieved this balance, but in a nutshell we proceeded in a two-stage approach. In the first stage we focused only on a subset of (major, international) sources and clustered the articles produced by them. If these cluster passed a diversity and quantity test they were considered to be events. With these events we then considered all the remaining articles, considering also those produced up to 48hs in the past looking for articles that commented on this event.
We crawled 59 news-sources over a period of a year, gathering 800K articles on which we discovered 10752 events. Using the publication timestamps of the articles it was then easy to find out which sources were the first to report on each one of these events. Unsurprisingly
Reuters scores first (but AP appears only in position 21), and big international news outlet like France24 and BBC are also at the top. Surprisingly, less well known outlets like AllAfrica and The Globe and Mail appear also in the top 4.
At this stage, it doesn’t seem clear-cut how to answer Question 1. In general it seems that newspapers around the world do not rely exclusively on big agencies to do all the news discovery work, differently from what happened in the past (Hester reported that in
1971 “half of all the daily papers [of the United States] use only one of the major wire services — the Associated Press” ). At the same time a great bulk of events continue to be broken by them.
In order to be able to answer our second question, we needed some way of defining “media attention”. It is well known that the publication dates of articles on a given event are not uniformly distributed. Here is a randomly selected range of events, where each dotted line correspond to one event, and a dot at point (x,y) means an article at time x talking about event y:
It can be seen that some events have a moment were they seem to be better at capturing global attention, with a high rate of articles published in a short timestamp. In order to retrieve these, we used an algorithm that detects such “bursts” of emissions rate. Applying this algorithm resulted in 6880 bursts, each one scored accordingly to its
prominence and duration.
We then looked at the first article in each one of these burst, and added for each source the score of this burst. The top of this list is much easier to interpret than the previous one. The outlets there are either major international outlets (Reuters, 1st; CNN, 3rd) or major
national or regional outlets (The Globe and Mail, 2nd; Al Jazeera, 4th; France24, 5th; RIAN, 6th). There are several different possible interpretations of the prominence score associated to each source: on the one hand, it is natural that news agencies rank high because it is
their purpose to actively push news to third parties for re-dissemination and to do so at global scale. On the other hand, news outlets may look at local sources to filter news coming from the regions covered by them. The high rank of some local sources (like those covering Russia, the Arab world and Canada) may then be explained as an approval stamp of the news-sphere regarding their authenticity and (non-)bias. Finally, it may just be that the
non-agencies sources just have a good “journalistic nose”, being able to distinguish those news that will become trendy and publishing on them at the right time.
Analyzing all those news events at a massive scale permits to gain insight in how traditional media is adapting to the current time, and this understanding could help data and citizen journalist to find their place in the news-sphere.
 J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 497-506. ACM, 2009.
 AL Hester International Communication Gazette May 1974 vol. 20 no. 2 82-98