Reinventing ‘the past’ in the digital world

As Mario Tedeschini-Lalli contemplates his life in journalism, he realises that the Internet has a problem with time. More than half his work is lost in the vagaries of websites that no longer exist.

by WAN-IFRA Staff executivenews@wan-ifra.org | January 8, 2016

This post was originally published on Mario’s medium page, on Dec. 18, 2015.

Wait a moment… that’s CNN right? I mean, in ITALIAN? Exactly, but there is no way you may know that CNN had an Italian language website (which I happened to edit), unless you actually read it between 1999 and 2003. The CNNitalia.it domain is still there, but it points to a single bilingual page explaining that “CNNitalia.it is no longer in service”. Which is like saying that its full four years of coverage – including 9/11 – are nowhere to be found, just like almost all of the journalism produced by myself or by the digital newsrooms I managed in the last 18 years. The Internet has a problem with time.

I’m on the edge of retirement after 40 years as a journalist, the last part of which spent on digital stuff. Yet while my future admirers will be able to check the first half of my professional life walking into a good library, and browsing the newspapers I worked for, the many intrigued by my most recent work will be left wanting, since most of it – maybe 90% – it’s gone, lost in the vagaries of websites that no longer exist, databases no longer maintained, interfaces no longer available, and coding languages no one speaks anymore (old digital news hands will appreciate the “Flash Plugin required” in the image above).

How could that be possible? [You may skip the list of personal, although telling stuff]

July 1997-September 1999 I edited Repubblica.it, Italy’s first and still largest news website. For almost a year we didn’t use any Content Management System, uploading HTML files straight to the server via FTP. When we got a basic but functioning CMS, everything we published before was lost, ditto for a successive “improvement”. Result: nothing published in 1997 is still available, and just few stories from 1998 are. Of course, much of what can be found from stories published in the following years is full of broken links to past stories or to other websites that were supposed to offer context — something akin to finding a copy of the New York Times in a library with much of the background paragraphs cut out.

Semptember 1999-November 2002 I edited CNNitalia.it, that functioned for another year with another partner, then disappeared. The same happened to other “language sites” of CNN targeting Brasil, Norway, Sweden, Denmark, and the hispanic countries of Latin America. Actually a web site for CNN en Español was resurrected a few years ago, but as far as I know nothing of what was published ten years before is available online. (The Arabic and Japanese editions are still online, and I don’t know if their databases go back to 2001–2002).

December 2002 – September 2005 I edited Kataweb.it, a large portal (remember the word?) with a heavy news component. Most of its “verticals” (still a word from the webby past) have been since either closed or merged with other titles. All of its News and video components are gone — actually no, not exactly, a lot of it still sits in some server out in the cloud, but since there is no “website”, the stuff is not linked anywhere, thus it’s not indexed anymore by search engines.

October 2005 – December 2008 I managed a unit within Kataweb.it that experimented with multimedia and non-linear storytelling. All the material we produce went with the rest of the portal. [I’ve since been working on digital strategy and training, which means that I was not responsible for any journalistic product].

Wasn’t the Internet supposed to be “forever”? For all the talk in Europe about a “right to be forgotten”, what we have at hand is actually the problem of keeping our digital past alive — or, at least, in animated suspension. So that in the future it could be accessed, enjoyed, maybe even studied by historians.

In her contribution to the “Predictions for Journalism 2016” by NiemanLab, digital strategist and journalist Amy Webb mentions the need to be aware of our “Digital frailty”. She puts herself in the shoes of an observer who in 2046 reflects on what happened in the preceeding 30 years:

“Libraries archived printed material, but there was no effort to build a central repository for all of the digital content that was created in the past three decades. Those early New York Times experiments with virtual reality are now lost. So are the Vox explainers, the Washington Post blogs, and the richly interactive maps produced by so many news organizations. Some old timers remember the tremendous journalism done years ago (…) But there’s no way to access them now, since there was no concerted effort at news organizations to ensure that content was made forwards-compatible as Internet standards evolved. Of course, there was probably no need to save every listicle and quiz. But with the benefit of hindsight, someone probably ought to have initiated a public discussion about whether it was in the public interest to allow certain news content to go dark forever.”

This public discussion is certainly overdue, but not an easy one to have. There are technical issues, business constraints (how will cash-strapped news publishers find the money to finance something with so little return on investment?), but also a legacy culture that makes it difficult to even see the problem: in our daily work we tend to talk about digital “archives”, which are anything but.

An archive is traditionally a place where we used to put documents and stuff that we believed were no longer current. In case we needed them, we would actually go to the archive, dig them up, consult them, and put them back in place for future users and scholars. Newspapers had “libraries” or “morgues”, where past issues or clips from newspapers were organized for safekeeping. Other than that – as a cliché-prone editor eagerly informed me 40 years ago – “today’s newspaper wraps tomorrow’s fish”. Or rather wrapped.

We all know that this is no longer true in the digital world, where today’s news are – at the very least – tomorrow’s sidebars, always there for the users to discover and consume, making sense of tomorrow’s world with yesterday’s journalism. In other words: any news item (or any digital item) sitting in a data base, it’s potentially “current”; there is no technological or UX difference between what was published today, yesterday or last year.

Allright, I’ve been preaching for years that in the digital universe, much like in the physical universe, space and time are no longer what they used to be. But since our own lives proceed (mostly) on a linear timeline, it makes sense to re-invent the past, the digital past.

We need to build “places” (I’m using the word for lack of a better one, suggestions welcomed) where the past can be safe, saved for future purposes. An actual “archive”.

Of course the people at the Internet Archive are already doing a fantastic job, it’s thanks to them that I was able to show the image above. But what they do is copying and archiving an extremely limited – albeit immense – part of the web, for example I was not able to find any page or story produced by CNNitalia.it on the actual day of September 11, 2001.

Somebody actually began to think about the issue. A year and a half ago, on March 2, 2014, Knight-Mozilla OpenNews, the Newseum and Pop Up Archive organized a one day conference in Washington DC about “Preserving interactive news projects”:

“How to preserve the new breed of complex interactive projects that are becoming more prevalent in news. While print newspapers are relatively well-preserved, we as an industry do a poor job of preserving interactive databases and online data visualizations, and they are in danger of being lost to history.”

The discussion centered mostly on a framework for any future project, and they came up with a Conceptual Model for Interactive Database Projects in News. I’ve asked Tyler Fischer, the author of the report, if anything more came out of it: “Sadly, there really hasn’t been any follow-up, despite this being a tremendously important issue”, he wrote me.

But what about simple texts+picture items? On a much, much more limited scale one organization actually thought about the problem, and provisionally built a “place for the past”: the Catholic church. At the beginning of 2003 they had already experimented for a while with a Twitter account in the name of the pope, when Benedict XVI resigned, and pope Francis was elected they had to decide what to do with the account — and with past tweets. The account was fast renamed, so that it could be used as the voice of any pope, present or future. As for what was tweeted in the name of Benedict, they built a “Twitter Archive for BXVI”, so that the pope’s teachings may be preserved, whatever may become of Twitter’s APIs.

I suppose it helped that the Catholic church is a global organization whose life span is counted in centuries, and it’s institutionally prone to think in terms of eternity. Journalists, publishers, librarians, archivists, historians and – yes – engineers could nevertheless take a hint from them, and re-invent “past” for future’s sake.

Update Dec. 30, 2015: Worth reading about this Raiders of the Lost Web: If a Pulitzer-finalist 34-part series of investigative journalism can vanish from the web, anything can, by Adrienne Lafrance, published in October on The Atlantic
(Thanks to Steve Buttry)

Mario Tedeschini-Lalli is Deputy Director, Innovation and Development, at the Italian media company Gruppo Editoriale L’Espresso. He writes the new media blog Giornalismo d’altri.

WAN-IFRA Staff

executivenews@wan-ifra.org