In recent years, to my delight, captions have been showing up in more and more places in the United States. While I’ve been using captioning on my home TV for decades, now I see open captioning on TVs in public places, many internet videos, and most recently, in academic presentations. Everyone benefits from good captioning, not just deaf/HoH or folks with an auditory processing disorder. Children and non-native English speakers, for example, can hone their English skills by reading captions in English. And nearly everyone has trouble making out some dialogue now and then. But not all captioning is the same. While one form of captioning may provide fabulous access for deaf/HoH, another is useless. To ensure that our materials optimize inclusion, we need to figure out how to invest in the former and avoid the latter.
To unpack this a bit, I’m going to distinguish between 4 types of captioning that I’ve had experience with: 1) captions, 2) CART (communication access real-time translation), 3) auto-craptions, and 4) real-time auto-captions with AI. The first two are human produced and the last two are computer produced.
Captions: Captions are word-for-word transcriptions of spoken material. Open captions are automatically displayed, while closed captions require the user to activate the captions (click the CC option on a TV). To make these, a human produced script is added to the video as captions. Movies and scripted TV shows (i.e. not live shows) all use this method and the quality is usually quite good. In a perfect world, deaf/HoH academics (including students) would have access to captioning of this high quality all the time. Stop laughing. It could happen.
CART:This real-time captioning utilizes a stenotype-trained professional to transcribe the spoken material. Just like the court reporters who document court proceedings, a CART professional uses a coded keyboard (see image at right) to quickly enter phonemes that are matched in the vocabulary database to form words. The CART transcriptionist will modify the results as they go to ensure quality product. While some CART transcriptionists work in person (same room as the speakers), others work remotely by using a microphone system to listen to the speakers. Without a doubt, in-person CART provides way better captioning quality than remote CART. In addition to better acoustics, the in-person service can better highlight when the speaker has changed and transcriptionists can more easily ask for clarification when they haven’t understood a statement. As a cheaper alternative to CART, schools and universities sometimes use C-Print for lectures, where the non-steno-trained translators capture the meaning but not word-for-word translation. In professional settings, such as academic presentations, where specific word choice is important, CART offers far better results than C-Print but requires trained stenographers.
Some drawbacks of CART are that the transcription lags, so sometimes the speaker will ask “Any questions?” but I and other users can’t read this until the speaker is well into the next topic. Awkward, but eventually the group will get used to you butting in late. CART also can be challenging with technical words in academic settings. Optimally, all the technical vocabulary is pre-loaded, which involves sending material to the captionist ahead of time for the topics likely to be discussed. Easy-peasy? Not so fast! For administrative meetings of over 10 people, I don’t always know in advance where the discussion will take us. Like jazz musicians, academics enjoy straying from meeting agendas. For research presentations, most of us work on and tweak our talks up until our presentation. So getting advance access to materials for a departmental speaker can be… challenging.
Craptions:These are machine-produced auto-captions that use basic speech recognition software. Where can you find these abominationsless-than-ideal captions? Many YouTube videos and Skype use this. We call them ‘crap’tions because of the typical quality. It is possible that craptions can do an okay job if the language is clear and simple. For academic settings, these auto-craptions with basic speech recognition software are pretty much useless.
The picture at right shows auto-craption for a presentation at the 2018 American Geophysical Union conference about earthquakes. I know, right]Yes, the speaker was speaking in clear English… about earthquakes. The real crime of this situation is that I had requested CART ahead of time, and the conference’s ADA compliance subcontractor hired good quality professional transcriptionists. Then, the day before the conference, the CART professionals were told they were not needed. Of course, I didn’t know this and thought I was getting remote CART. By the time the craptions began showing up on my screen, it was too late to remedy the situation. No one that I talked with at the conference seemed to know anything about the decision to use craptions instead of CART; I learned all of this later directly from the CART professionals. The conference contractor figured that they could ‘save money’ by providing auto-craption instead of CART. Because of this cost-saving measure, I was unable to get adequate captioning for the two sessions of particular interest to me and for which I had requested CART. From my previous post on FM Systems, you may remember that all of my sessions at that conference were in the auxiliary building where the provided FM systems didn’t work. These screw-ups meant it was a lousy meeting for me. Five months have passed since the conference, and I’m still pretty steamed. Mine is but one story; I would wager that every deaf/HoH academic can tell you other stories about material being denied to them because quality captioning was judged too expensive.
Real-time auto-caption with AI: These new programs use cloud-based machine learning that goes farbeyond the stand-alone basic speech recognition of craptioning software. The quality is pretty good and shows signs of continuous improvement. Google slides and Microsoft office 365 PowerPoint both have this functionality. Link to a video of Google Slides auto-caption in action.You need to have internet access to utilize the cloud-based machine learning of these systems. One of the best features is that the lag between the spoken word and text is very short. I speak quickly and the caption is able to keep up with me. Before you start singing hallelujah, keep in mind that it is not perfect. Real-time auto captioning cannot match the accuracy of captioning or CART for transcribing technical words. Keep in mind that while it might get many of the words, if the captions miss one or two technical words in a sentence, then deaf/HoH still miss out. Nevertheless, many audience members will benefit, even with these missing words. So, we encourage all presenters to use real-time auto caption for every presentation. However, if a deaf/HoH person requests CART, real-time auto caption, even it is pretty darn good, should never be offered as a cheaper substitution. Their accommodation requests should be honored.
An offshoot of the real-time auto-caption with AI are apps that work on your phone. Android phones now have a Google app (Live Transcribe) that utilizes the same speech recognition power used in Google Slides. Another app that works on multiple phone platforms is Ava. I don’t have an Android phone and have only tried Ava in a few situations. It seems to do okay if the phone is close to the speaker, which might work in small conversation but poses problems for meetings of more than 3 people or academic presentations. Yes, I could put my phone up by the speaker, but then I can’t see the captions. So yeah, no.
What are your experiences with accessing effective captions in academic settings? Have you used remote captioning with success? For example, recently, I figured out that I can harness google slides real time auto-caption to watch webinars by overlapping two browser windows. For the first time, I can understand a webinar. I’ve got a lot of webinars to catch up on! Tell us what has worked (or not worked) for you.