This AI turns sound recordings into accurate pictures of streets

Dunja Đuđić

Dunja Djudjic is a multi-talented artist based in Novi Sad, Serbia. With 15 years of experience as a photographer, she specializes in capturing the beauty of nature, travel, concerts, and fine art. In addition to her photography, Dunja also expresses her creativity through writing, embroidery, and jewelry making.

Helsinki, Finland. Tram Departs From A Stop On Street Aleksanterinkatu In Helsinki

Researchers at The University of Texas have achieved a breakthrough by using generative artificial intelligence to transform audio recordings into vivid, street-view images. This fascinating project shows how AI can replicate human-like connections between sound and visual perception of environments.

The findings, published in the journal Computers, Environment and Urban Systems, detail how the research team trained an AI model with pairs of audio and visual data collected from diverse urban and rural settings. “Our study found that acoustic environments contain enough visual cues to generate highly recognizable streetscape images that accurately depict different places,” said Yuhao Kang, an assistant professor of geography and environment at UT and co-author of the study. “This means we can convert the acoustic environments into vivid visual representations, effectively translating sounds into sights.”

Translating sound into sight

The team used 10-second audio clips paired with still images from YouTube videos shot across cities in North America, Asia, and Europe to train the AI model. Afterward, the AI generated high-resolution images from audio inputs, which were then compared to their real-world counterparts. The researchers evaluated the results using computer analysis and human judgment.

“Traditionally, the ability to envision a scene from sounds is a uniquely human capability, reflecting our deep sensory connection with the environment. Our use of advanced AI techniques supported by large language models (LLMs) demonstrates that machines have the potential to approximate this human sensory experience,” Kang explained.

Results and implications

The comparisons showed remarkable accuracy in the proportions of greenery, buildings, and sky between the AI-generated and real-world images. Human participants successfully identified AI-generated images corresponding to the original audio clips with 80% accuracy. What’s more, the generated images often captured architectural styles, object spacing, and lighting conditions reflective of the soundscapes, such as sunny, cloudy, or nighttime environments.

The observations deepen the understanding of how sounds contribute to the perception of places. “When you close your eyes and listen, the sounds around you paint pictures in your mind,” said Kang. “For instance, the distant hum of traffic becomes a bustling cityscape, while the gentle rustle of leaves ushers you into a serene forest.”

A future of multisensory AI

Kang’s work in geospatial AI explores the intricate relationship between humans and their environments. This research could open doors to advanced AI systems that enhance understanding of how people experience and interact with different places. In a separate study published in Nature, Kang and his co-authors explored how AI could help capture the unique identities of cities.

I saw a comment on this news saying that it would be cool to turn images into sounds. It immediately made me think of NASA’s sonifications. These projects convert data into sound, transforming space photos into weird and beautiful songs. It’s definitely interesting to see how different mediums intertwine in the AI realm – although I certainly hope it won’t overtake the human-made visual impressions of sounds, or songs inspired by images. It would be a shame.

[via PetaPixel]


Filed Under:

Tagged With:

Find this interesting? Share it with your friends!

Dunja Đuđić

Dunja Đuđić

Dunja Djudjic is a multi-talented artist based in Novi Sad, Serbia. With 15 years of experience as a photographer, she specializes in capturing the beauty of nature, travel, concerts, and fine art. In addition to her photography, Dunja also expresses her creativity through writing, embroidery, and jewelry making.

Join the Discussion

DIYP Comment Policy
Be nice, be on-topic, no personal information or flames.

Leave a Reply

Your email address will not be published. Required fields are marked *

One response to “This AI turns sound recordings into accurate pictures of streets”

  1. Martin Hackenberg Avatar
    Martin Hackenberg

    Hello the top picture is probably not Ai generated. It is a street view of Alexanterin katu in Helsinki in direction Mannerheimintie. I think I have to go there with my recorder: now aI should add christmas decoration.