Preface: This is NOT a boo-hoo post about how people watch too much television and can’t spell.
I think of myself fundamentally as a writer. I love to read, I like books, I like magazines. On espn.com, I ignore the video clips and read the articles. The written word is for me the fundamental and primary way to exchange information.
Now it’s not that difficult to imagine a future in which writing is used mostly for labeling. Video and audio will be fundamental and primary, and writing will be used for short-form annotation of the video and audio. Writing will be metadata. Labeling.
In the past video and audio weren’t used in this way so much because of limitations of (obviously) transmission and storage, but also less obviously of editability and searchability.
The first two issues have largely been solved: broadband, mobile, gigabyte hard drives and flash memory. Internet, cable modems, satellite, Youtube, iPads, Android.
The third issue, editing, is mostly solved when you think about simple tasks; video and audio editing software comes standard on lots of computers now. Not everybody wants to develop Final Cut Pro skills, but it’s available. “Editing” encompasses more than that, though. If you are trying to organize and convey a large amount of information, or update existing information without reshooting the whole video, writing still offers advantages in speed and flexibility. In fact, writing in the computer age is so flexible that I often write before I organize. I start putting down various thoughts and then gradually re-order them as I go. Sometimes I stop and do an outline AFTER I’m halfway done writing. I don’t think this was an efficient approach in the typewriter era.
The fourth issue, search, is where it gets really interesting. Right now, automated information retrieval is text-based– think Google. But it doesn’t have to stay that way.
Dragon Naturally Speaking is voice-recognition software. It converts audio into text. It’s very limited even after decades as a commercial product. We have tried it out for interviews – can it transcribe automatically while the interview takes place? It can’t. It has to be “trained” to recognize a speaker’s words correctly. You have to work with it to get accurate output of your own voice, let alone some random soprano interviewee with a Scottish accent and a mild stutter.
BUT the people who make the software say that’s a limitation imposed by computing power. They say as computers get faster and faster, this barrier will inevitably fall. And that means that audio-based search is also inevitable. Imagine: You have 200 voice mails saved on your iPhone, as nobody uses email anymore. You need to find the ones about the next budget meeting. You say to the phone: “Search mail – keyword budget.” Up pop the six messages that you audiotagged “budget”, and two others in which the word budget was used during a conversation.
Here’s where the written word appears in the scenario: the list. It’s metadata. Labeling. Just like I said. Because you the human can scan the list faster than it can be read aloud. (Note here a distinction between scanning – visually processing a list of metadata, which you do very quickly – versus searching through a large corpus of data to find all objects matching a specified pattern, which the computer does very quickly.)
Anyway, you look at the list and say “play first David” and hear the top audio message from David. You say “skip – keyword supplies” and the audio track skips ahead to the part of about purchasing supplies. And so on.
This is all possible once the processing power is in place to convert your voice commands and to search the audio files at a high speed to find the right matches.
Same with video. Surveillance companies now embed “video analytics” in their products. You can, for example, have a camera in a parking garage that doesn’t stream video to the guard office until it 1) detects motion and 2) recognizes that the motion is cause by a person (as automatically distinguished from a mouse, or a gust of wind). With enough horsepower, you’ll be able to tell your phone, “find Fred tae kwon do” and it will bring up all the videos of your son’s TKD practices, or the clips on the instructor’s website showing how to correctly do the Tae Guck Il Jang form. Or search the entire web for semantic constructions like “find footage of crocodile attacks”.
I wrote about some advanced application of video analytics almost four years ago for my website/magazine in The age of analytics , noting how massive computing power changes the game – there’s a chess connection, for my chess junkie friends.
Now, ironically, I see the broader implications. Maybe my next “age of analytics” and “death of writing” pieces will be on YouTube.