Pablo Romero-Fresco (2011). Subtitling through Speech Recognition: Respeaking.

Manchester: St Jerome, 194 pp. ISBN 978-1-905763-28-3

JosTrans 17, Isssue 17 (January 2012)

https://doi.org/10.26034/cm.jostrans.2012.477

This recently published monograph on subtitling through speech recognition, i.e., respeaking, marks the real and long deserved beginning of a new area of its own within Audiovisual Translation. It is the first book published on the field, it is as thorough as it could be expected and it contains a DVD with very valuable extra information and practice. Its contents can appeal to researchers, academics and translation students alike, given the comprehensive mixture of theory, practice and research included in its pages.

The volume can be said to be divided into three differentiated sections: an introduction (chapters 1 to 4), a detailed description of the skills that need to be learned in order to become a good respeaker (chapters 5 to 10) and some concluding remarks on respoken subtitles reception and future expectations (chapters 11 and 12). Every single chapter contains a final section that offers the reader extra practice and discussion points (generally dependent on the DVD) that mainly help to both confirm and expand the knowledge acquired throughout the previous pages.

The first section provides a very clear first approach to respeaking. In chapter 1, the author defines the term and its creation, and he leaves chapters 2 and 3 to explore the origins and development of the professional practice as such. These three first chapters represent a very sound introduction to the field, both for the newcomers and for the experienced users, given the depth of analysis provided: the term in different languages, the different methods of live subtitling used over the years, the main technical issues involved, the differences among countries in terms of working conditions, and even the various tendencies as regards training at university level. The last chapter of this section offers the most clarifying definition of respeaking I have found in the monograph: “In many ways respeaking is to subtitling what interpreting is to translation, namely a leap from the written to the oral without the safety net of time” (p. 45). The pages that follow present the different skills that any respeaker must develop and master, so as to be thoroughly analyzed in the chapters to come.

Thus, the second section of the book can be said to represent its core, since it contains six chapters used to thoroughly analyze the technique per se. Chapter 5 is intended to explain the skills needed to be developed before the process of respeaking and that are basically devoted to getting familiar with the software and the technique; to that end, the author delves into the origins and the state of the art of such technology, describing the functioning of different software available and their potential future progress. Chapter 6 keeps focusing on the skills needed before starting to respeak, analyzed in greater detail (from the choice of the microphone to dictation commands and acoustic and language models refining), but this time the description centers on one software in particular: Dragon NaturallySpeaking. The following chapter (7) deals with the most important skills, the ones that need to be implemented during the process of respeaking, namely split-attention, punctuation, rhythm and speed. These aspects are explored in such a detail one that one can end up feeling that, only by reading them, a fair command of the technique can be attained. In this case, though, the different aspects explained are accompanied by so much practice that the reading of this chapter needs to be accompanied by the use of the DVD, for the various features described to be really grasped. Chapters 8 and 9 also tackle the process of respeaking, but instead of dealing with skills as such, they describe different facets of the technique depending on the genre (sports, news and interviews, debates and chat shows) and the setting (museums and art venues, classrooms, conferences, churches, live webcasts and telephones); these two chapters include a great deal of practical exercises too, that need to make use of the accompanying DVD. The last chapter (10) of this section tackles the skills that are needed after the process; these are mainly related to the assessment of accuracy in respeaking; various methods used until now are described and a new approach is suggested so as to appropriately meet the basic requirements for error calculation in respeaking.

The final section delves into the reception of respoken subtitles on the one hand (chapter 11) and on the possible further developments of the field in the years to come (chapter 12). These pages are more centered on research (several experiments on reception of respoken subtitles on the UK are summarised), so as to conclude the book suggesting what is needed to keep moving forward within this field: more experimentation and analysis.

One of the aspects of this monograph that can surprise readers the most is the aforementioned vast amount of practical activities included at the end of each chapter, all of them surprisingly pertinent and specific. Most of this practice refers to elements included in the DVD: the perfect companion to such a thorough first approach to a field of this audiovisual and multitasking magnitude; it would not be complete without it. I am pretty sure that those readers who manage to cover all the activities, watch all the videos and read all the articles, can consider themselves almost experts on respeaking. In fact, it would be interesting to count the amount of hours necessary to accomplish all of this practice, since I honestly doubt Masters courses very often contain so many hours devoted to practice.

It is difficult to find drawbacks to such a comprehensive, updated and relevant volume. If anything, I have sometimes felt that the degree of detail goes too far in a few respects; for example, the section on respeaking training at university (40-43) may be too specific for some readers. Occasionally, the instructions for some activities can also be considered too exhaustive, guiding the students to the level of making the exercise a bit too long and closed in nature; a clear instance of this can be found in exercise 8.4.1 (132-134), an activity with three pages of instructions. However, maybe this was the aim of the author, to be as specific as possible, so as to cover every single aspect that he considered necessary to fully depict this young but fast-growing field study.

To conclude, Subtitling through Speech Recognition: Respeaking has much to recommend it. In this new era where accessibility is a must, technologies are taking over and the need of having everything readily available here and now is increasing everywhere, respeaking should have a much more relevant presence in the audiovisual world. It is a prominent field that needs to be exploited to the maximum and here we are undoubtedly facing the first real step forward towards for it to achieve the place it deserves.

Noa Talaván Zanón,
Universidad Nacional de Educación a Distancia (UNED), Spain
ntalavan@flog.uned.es