With a history dating back to 1851 and over 125 Pulitzer Prizes under its belt, the New York Times has amassed a mountain of photos. Between five and seven million of them. They’re all stored in the “morgue” under their Times Square office. Packed into countless drawers and cupboards, they’re now working with Google to digitise the entire collection.
Google says that many of these photos have been stored in folders and not even been looked at for years. Some of them that date back as far as the late 19th century. There is a card catalogue, which provides an overview of the archive’s contents, but there is much that has gone unseen for a long time down in that basement.
With 5-7 million photos, simply scanning and storing them is not enough. That doesn’t really give photo editors anything that they can easily search for and use. So, Google and the NYT are turning to AI to process the images. It recognises things like text, handwriting and other details in the image to help create a more valuable index.
Storing the images is only one half of the story. To make an archive like The Times’ morgue even more accessible and useful, it’s beneficial to leverage additional GCP features. In the case of The Times, one of the bigger challenges in scanning their photo archive has been adding data regarding the contents of the images. The Cloud Vision API can help fill that gap.
One example Google posted that demonstrates how they’re using the AI to help provide context is this one showing the front and back of a photo of Penn Station shot in 1942. The photo is a great record of what was going on at that time but without any context, there isn’t really anything to say what it contains or the reason for its creation.
When the back of the photo was fed into Google’s Cloud Vision API, it returned the following, which it could then associate with the photograph.
NOV 27 1985
JUL 28 1992
Clock hanging above an entrance to the main concourse of Pennsylvania Station in 1942, and, right, exterior of the station before it was demolished in 1963.
PUBLISHED IN NYC
RESORT APR 30 ‘72
The New York Time THE WAY IT WAS – Crowded Penn Station in 1942, an era “when only the brave flew – to Washington, Miami and assorted way stations.”
Penn Station’s Good Old Days | A Buff’s Journey into Nostalgia
( OCT 3194
PHOTOGRAPH BY The New York Times Crowds, top, streaming into the old Pennsylvania Station in New Yorker collegamalan for City in 1942. The former glowegoyercaptouwd a powstation at what is now the General Postadigesikha designay the firm of Hellmuth, Obata & Kassalariare accepted and financed.
Pub NYT Sun 5/2/93 Metro
THURSDAY EARLY RUN o cos x ET RESORT
EB 11 1988
RECEIVED DEC 25 1942 + ART DEPT. FILES
The New York Times Business at rail terminals is reflected in the hotels
OUTWARD BOUND FOR THE CHRISTMAS HOLIDAYS The scene in Pennsylvania Station yesterday afternoor afternoothe New York Times (Greenhaus)
It’s not perfect, of course, but Google says that it’s the fastest and most cost-effective method when compared to the alternatives with this quantity of images.
But Google says that this is only the beginning of what’s possible with computer vision. For example, the front side of the photograph above with logo detection recognised that it was shot in Pennsylvania Station. The Cloud Natural Language API can be used to help clean up any recognised text, too, to make it more syntactically correct and human-readable (and searchable).
It’s a mammoth task, and it’s easy to understand why it’s one that’s been put off so long. It’s only now that we’re starting to get the level of technology to really be able to index this quantity of content easily.
If you want to find out more, watch the video above, and check out the Google Cloud Blog.
[via The Verge]