“A Spanish version of EastEnders”: a reception study of a telenovela subtitled using MT

This article presents the results of three AVT reception experiments with over 200 English-speaking participants who watched a 20-minute clip of a Mexican telenovela in three different translation modalities: human-translated (HT), post-edited (PT) and machine-translated (MT). Participants answered a questionnaire on narrative engagement, enjoyment, and translation reception of the subtitles. The results show that viewers have a higher engagement with PE than HT, but there is only a statistically significant difference when PE is compared to MT. When it comes to enjoyment, the differences are more pronounced, and viewers enjoy MT significantly less than PE and HT. Finally, in translation reception, the gap is even more pronounced between MT vs. PE and HT. However, the high HTER scores demonstrate that a substantial amount of edits are necessary to render the automatic MT subtitles publishable. It is not clear that results would be comparable were subtitlers not given sufficient time or remuneration for the post-editing task.


Introduction
In the wake of successful translated series such as Squid Games and Money Heist, the media has drawn attention to controversies regarding the production of subtitles (Groskop 2021;Lange 2021).The use of MT for subtitling alongside reduced remuneration and restrictive work practices has become highly controversial, causing concerns about sustainability.Reports of a "talent crunch" as translators exit the industry come at a time when entertainment platforms are very successful.Companies maintain that low remuneration is not the reason for the shortage of professionals (Iyuno SDI Group 2022) while in the ELIS 2022 survey (ELIS 2022) respondents suggest that better rates and salaries could help tackle the shortage issue.The European Federation of Audiovisual Translators, AVTE, published a Machine Translation Manifesto (AVTE 2022) that proposes best practices when using MT, while the French (ATAA 2021) and the Spanish Associations of Audiovisual Translators (ATRAE 2021) have released statements urging content producers not to use MT post-editing (PE), but rather to rely on human translators (HT).
It is, therefore, of utmost importance to know how and when to use MT in the AVT sector, where translation is becoming multidirectional 1 .Recent studies on AVT investigate the gains in productivity and the improving quality of subtitles translated using MT, and concluding that this is a viable solution, given the appropriate quality conditions (Bywood, Georgakopoulou, and Etchegoyhen 2017;Matusov, Wilken, and Georgakopoulou 2019;Koponen et al. 2020a ).Additionally, studies have explored subtitlers' satisfaction with PE (Koponen et al. 2020b;Karakanta et al. 2022).However, there is currently no research that looks at the impact of MT in the translation workflow on the viewer of audiovisual content.
In this article, we seek to fill this gap by looking at the reception of subtitles translated into different modalities.Based on a methodology already tested on literary translation (Guerberof-Arenas and Toral 2020), we set up three experiments to measure the narrative engagement (Busselle and Bilandzic 2009), enjoyment (Hakemulder 2004) and translation reception of subtitles (the viewers' opinion on the translation and the language) from a clip of a Mexican telenovela translated into English in three different modalities: HT, PE and MT.In the following sections, we firstly review the state of the art, secondly, we present the methodology used and the participants' profile, thirdly, we analyse the results obtained from over 200 English-speaking participants in three experiments, and finally, we reflect on the use of MT in this type of content, and indicate future lines of research.

AVT reception and machine translation
AVT research has devoted a significant amount of attention to reception due to the constrained nature of AVT, the high relevance of viewers, and the widespread use of subtitling for entertainment and language learning.Since the 1980s, researchers have been looking at subtitle reading, particularly using eye-tracking methods (d 'Ydewalle, Muylle, and van Rensbergen 1985;d'Ydewalle, Rensbergen, and Pollet 1987).With the increasing interest in reception, the scope of AVT studies widened to include qualitative and mixed-methods research designs to provide a more comprehensive understanding of viewers' engagement (Orrego-Carmona 2018).
Subtitle reception studies have shown different layers of engagement and provided information on viewers' processing and reactions.For example, eye-tracking studies have shown that viewers are not too sensitive to subtitles overlapping shot changes (Szarkowska, Krejtz, and Krejtz 2017) and that poor segmentation might affect reading but do not seem very relevant for comprehension (Perego et al. 2010;Rajendran et al. 2013;Gerber-Morón, Szarkowska, and Woll 2018).However, when asked about their preferences, viewers have a clear preference for syntactically segmented subtitles (Gerber-Morón, Szarkowska, and Woll 2018) and identified segmentation as a major problem with automatic/MT subtitles (Koponen et al. 2020a).
With the growing use of MT in subtitle production (Koponen et al. 2020b;Karakanta et al. 2022), it becomes essential to explore how viewers respond to MT and PE subtitles in contrast with HT subtitles.Ortiz Boix (2016) examined two conditions (HT and PE) for voice-over translation of wildlife documentaries.The results of a panel of experts and 56 end users established no significant differences between the two conditions.Hu, O'Brien, and Kenny (2020) compared the comprehension of and attitude towards PE, MT, and HT subtitles for MOOCs.In this experiment, the HT subtitles were prepared by a non-professional translator and the PE subtitles were post-edited by a professional.Hu and colleagues found that the PE condition scored highest in their reception metrics, and that participants had a positive attitude towards all subtitles, regardless of production conditions.

Combined methods to measure reception
To explore the translation of subtitles in different modalities, we focus on narrative engagement, enjoyment, comprehensibility, and reception (Guerberof-Arenas and Toral 2020) when viewing a 19' 26" clip from Episode 55 of the Mexican telenovela Te doy la vida (Cataño and Acosta, 2020).The programme is a drama/soap opera that reveals the relationships, loyalties, and enmities between family members centred around a car workshop in Mexico.The telenovela was previously chosen for a comparative test between AppTek and Google Translate for subtitling, with AppTek being the preferred system (Santilli 2021).The clip for the current study was also provided by AppTek 2 , who also kindly provided the Latin-American Spanish to English MT output from their AVT-customised neural system.Since the data suggested that the AppTek engine performed better than Google Translate and we could avail of a "real" clip, the decision was made to test using this telenovela.

Changes in the design through the pilot studies
In order to refine our methodology, we conducted two pilot studies to compare the reception of HT, MT, and PE subtitles.AppTek also provided the first version of the HT subtitles by a highly experienced translator based in Argentina and we engaged a Colombia-based subtitler to post-edit the MT 3 .Details of the results of these pilot studies are presented in Section 4. Based on these pilot studies, some changes were made to the methodology: most pertinently, the translation and PE were redone as described in Section 3.2.In addition, the first post-editor made substantial changes to spotting that we felt would be restricted in a subtitling workflow due to the widespread use of templates (Oziemblewska and Szarkowska 2022).We therefore amended the PE guidelines to limit spotting changes.

Preparation of translations and video files in the main experiment
In preparation for translation and PE, two SRT subtitle files were created.File 1 contained the source text in ES-MX to be translated into English (subtitles 1 to 182) and MT to be post-edited (subtitles 183 to 357).File 2 had the reversed order.Two translators with similar experience and language profiles used the tool Ooona, a cloud-based tool (García-Escribano, Díaz-Cintas, and Massidda 2021) to complete the project.The translators were paid at their requested rate.They both received the video, full source text subtitles and the prepared "Pretranslated-Target" file.Guidelines for translation and PE (of publishable quality) are in Appendix A 4 .Once the target SRT files were received, the final HT, PE, and MT subtitles were assembled (File 1 and File 2 from each translator were split according to the modality in which they were processed) and a video file with embedded subtitles was created for each condition.This meant that the HT and PE versions were translated and post-edited by the same two translators, guaranteeing that a preference for HT or PE was not due to a preference for the style of a given translator.The MT version was the original output received from AppTek.

Measuring MT subtitle quality using HTER
The Human-targeted Translation Edit Rate (HTER; Snover et al. 2006), metric was used to measure the number of PE edits 5 , with an overall value 44.23 6 for the whole clip (40.46 for Translator A, 47.21 for Translator B), demonstrating that the translators performed a high number of edits to render the subtitles publishable 7 .In professional settings, this would have repercussions for remuneration if PE payment rates are reduced on the basis that MT requires little correction.According to Parra Escartín and Arcedillo (2015), a HTER of 20.98 represents a discount equivalent to a 75-84% fuzzy match in a translation memory.More than double this level of editing was required in this study for translators to produce publishable subtitles.Further, large corporations using TER to pay post-editing suggest that values above 30 are not acceptable for use without editing (Schmidtke and Groves 2019) and from industry experience, we are aware that postediting work with HTER above 40 is usually paid at the full rate (similar to a 0% match from translation memory).
Here are some examples of the issues found in the MT output sent to the subtitlers: A) Proper noun errors in the MT output: named entity recognition is a known challenge for MT development.For example, the HT retains the original restaurant name in the segment "How about Las Tortas del Finito?", the MT output reads "It could be the finite cakes".B) Gender: inconsistent use in the MT segments, e.g., "-She's a romantic woman.He loves the messages I send him."The HT for this segment is "She's a romantic.This change from sincronizadas to quesadillas is not the only example of domestication in the HT: elsewhere, tacos are surprisingly replaced by "lunch" despite tacos being a commonly used food term outside of Mexico.However, in general, the HT and PE conditions both largely conform to our participants' expectations, as may be seen in Section 4. A segment-bysegment analysis of subtitle quality is beyond the scope of this study, but the quality of both HT and PE appear to be satisfactory as judged by the participants.

Subtitlers' post-task questionnaire
One of the translators responded to our subtitlers' post-task questionnaire with some positive views on post-editing 8 .She recognises it can be faster than translating from scratch but does not think this applies to the translation of subtitles.She wrote: Although post-editing speeds up my work, I don't find it as enjoyable as translating from scratch.I also don't think it's easier than translating from scratch.There are specific types of projects where I prefer to use post-editing [rather] than translate from scratch, but subtitling dialogue isn't one of them.The language is inherently colloquial and that just doesn't work very well with MT in my opinion.
Regardless of this, the translator stated she was extremely satisfied with the results of her HT and PE tasks.

Viewing conditions
Unbeknown to them, all viewers were randomly assigned a condition.WATCHA corresponded to the telenovela with PE subtitles, WATCHB to MT subtitles, and WATCHC to the HT condition.In this article, we use PE, MT and HT for continuity.

Questionnaire
An online questionnaire in English was distributed to participants using Qualtrics (www.qualtrics.com).Participants were told that they would watch a Mexican telenovela and fill in a user experience questionnaire.After this, the participants first read the information brochure and consent form and, if they decided to participate in the experiment, they were taken to the following sections 9 :

Demographics and Viewing Frequency
This section contains 11 questions on demographics and viewing patterns (e.g."How often have you watched a programme with subtitles in the last 24 months?How much do you enjoy watching television programmes with subtitles?","How many subscriptions to streaming platforms do you have?")

Comprehension Questions
After watching the clip, the participants answered 10 four-choice questions to ensure basic comprehension.There was no minimum number of correct answers to continue because we wanted to analyse comprehension with the full range of responses (1 to 10) depending on the modality.

Narrative Engagement
Participants were then presented with a 12-item Narrative Engagement scale (Busselle and Bilandzic 2009) with 7-point Likert-type responses.The questionnaire includes four categories: Narrative understanding (e.g., "At points, I had a hard time making sense of what was going in the programme."),Attentional focus (e.g., "I found my mind wandering while the programme was on."),Narrative presence (e.g."The programme created a new world, and then that world suddenly disappeared when the programme ended."),and Emotional engagement (e.g., "I felt sorry for some of the characters in the programme.").

Enjoyment
Participants were then asked to answer two questions to address enjoyment: "How much did you enjoy watching the clip?", "Would you recommend this clip to a friend?"(Dixon et al. 1993;Hakemulder 2004).

Translation reception
This was a 7-item scale to measure the reception of the translated subtitles (e.g."How easy were the subtitles to understand?", "I thought the subtitles were very well written", "I found words or sentences that were difficult to understand.").Participants were asked to use a 7-point Likert-type scale to rate these questions/statements (Guerberof-Arenas and Toral 2020).

Debriefing and payment questions
At the end of the questionnaire, participants were debriefed on the nature of the research.Only then were they informed about the translation modality assigned (either MT or PE or HT).Following this, and only if their modality was MT, they were asked to rate the quality of the MT, and to indicate their translation preference.

Qualtrics and Prolific
As mentioned, Qualtrics was used to create the questionnaire.For the pilot study, the questionnaire was distributed through several social media channels (i.e., Twitter, LinkedIn, and Facebook groups).However, since this process did not generate a satisfactory number of participants and since posting in the researchers' social media skewed the results, for the main experiment, we decided to use an existing platform that provides a base from which to gather research participants and pay for experimental research.
Prolific (www.prolific.com)allowed us to post the Qualtrics questionnaire while at the same time specifying the participant profile.Our screening conditions consisted of location in the UK (to avoid language variation), English as a mother tongue (to avoid language understanding variation), and at least one subscription to a streaming platform (to avoid participants who do not watch audiovisual content but are looking for payment).After the pilot studies, we refined our profile due to non-compliance and added: exclusion of participants in the previous studies, a 100% approval rate in the platform, which means that the participants' performance in previous studies was always approved by researchers, and a minimum of 15 and maximum of 150 submissions in the platform, which meant that participants had sufficient experience working in the platform.Although payment was to the platform and not to participants directly, we were informed that the participants were paid an average of 9 sterling pounds per hour (the average duration was around 35 minutes).
Although we found the platform very effective, and it allowed us to discard those participants who did not meet the criteria or did not fully complete the experiment, some participants in these platforms engage as part of a job and are not occasional contributors who participate in an experiment out of personal interest while receiving a practically nominal fee.We feel that this motivation is an important consideration.However, because of the nature of this particular experiment (we are looking for a wider audience that avails from a standard type of entertainment) we consider the results to be valid and generalizable.

Calendar and process for the projects
Table 1 shows the time periods for the three project iterations, the platforms, the number of participants, and their distribution:

Reception of a Mexican telenovela
Since we conducted three iterations, we first summarise the results for the two pilot experiments, and the issues encountered, indicating the motivation for each new phase and the improvements made.We then present the results for the main experiment, which has the highest number of participants and the most refined experimental design.

Summary of the pilot experiment using social media
Since the engagement methodology had previously been successfully used with literary texts, we ran a pilot experiment using snowball sampling, distributed via social media.Twenty-three participants (17 female and 6 male), between 18 and 44 years old, participated in the pilot.Native languages were mostly English, with two Italian speakers and one speaker each of Spanish, Portuguese, and Dutch.16 had professions related to language and 9 unrelated.15 participants had moderate-to-little knowledge of Spanish and 8 had a high level or were bilingual.Table 2 shows a summary of the findings using a mean value 10 .Values for all categories other than Comprehension range from one (strongly disagree) to seven (strongly agree).The Comprehension figure is the number of questions answered correctly from a total of ten.HT has the highest values for Narrative Engagement, Enjoyment, and Translation Reception, but also the highest preference for programmes with subtitles among participants.We found that participants were able to follow the questionnaire, watched the video (except in certain cases depending on the browser), and responded easily to the questionnaire.However, there are many issues with the data, the most important perhaps that few participants had English as their mother tongue.The second issue was that the majority of participants had professions related to language (because the questionnaire was distributed by the researchers).This meant that they were accustomed to subtitles and they were perhaps more strict when it came to judging translations.After this initial experiment, we decided to use the Prolific platform.

Summary of pilot experiment in Prolific
In this next pilot experiment, our main aim was to test the Prolific platform.
In this first instance, 74 participants (57 female, 16 male and 1 non-binary), between 18 and 54 years old with English (UK) as their mother tongue, took part.Seventy-three had no knowledge, a little or moderate knowledge of Spanish and one had Very good knowledge.Only five had a profession related to language.
Table 3 shows a summary of the findings using a mean value to illustrate each category.In this case, because the number of participants was higher, we ran a Kruskal-Wallis H 11 test for non-parametric data and post-hoc comparisons using the Conover-Iman test with the Holm-Bonferroni correction.These results are shown in the row Significance 12 .Again, ranges are from one to seven other than one to ten for Comprehension.PE has the highest values in Narrative Engagement, Enjoyment and Translation Reception, but also viewers reported the highest preference for programmes with subtitles.However, there are no significant differences between the modalities except in Translation reception, where participants ranked PE significantly more than MT (Z = 7.45; p = .00)and MT significantly less than HT (Z = -5.81;p = .00) 13.We also see that the viewing frequency of participants in the MT modality is lower, and there are significant differences overall, but post-hoc comparisons show no significant differences.
When looking at the results, we considered that the translator could be a confounding variable as the PE and HT subtitles were created by different subtitlers.We amended the methodology for the main experiment so that this was accounted for and recruited a larger cohort of participants to avoid different viewing frequencies.

Main experiment
In the main experiment, a larger cohort of 119 participants was presented randomly with PE, MT or HT subtitles; they provided valid responses to the questionnaire 14 .

Participants
Table 4 shows a summary of the demographics and characteristics of this group.Participants are mostly women aged 18 to 34 with tertiary education, little or no knowledge of Spanish, and whose work is not related to languages.We were curious to know if there was an uneven distribution of languagerelated work per condition, as this could account for a different user experience.A Chi-Square test revealed no significant differences between these groups.

Comprehension questions
Table 5 shows the descriptive statistics for the comprehension questions per condition.No minimum number of correct responses was set for participants to be able to continue the questionnaire.If we consider the mean and median values, participants perform better in the HT and PE condition, although the MT condition does show mean values above 5, i.e. more than half of the questions were answered correctly.The variable Comprehension was explored according to the translation condition of the subtitles using the Kruskal-Wallis test.Statistically significant differences were found between conditions (H(2) = 9.59, p < .01)with a mean rank score of 69.70 for HT, 62.33 for PE and 47.09 for MT.Post-hoc comparisons show statistically significant differences between MT and HT (Z = 3.06; p = .00)but not between PE and MT.Therefore we can say that the condition HT was a factor in participants responding correctly to a higher number of questions if compared with MT, but not with PE.

Viewing frequency
Based on the pilot experiments, we wanted to check if the viewing frequencies among participants differ across translation conditions, as this might affect other variables such as engagement, enjoyment or even translation reception.It is preferable if these frequencies are balanced among the viewers in the three conditions.
Two questions addressed this variable: "How often have you watched a programme with subtitles in the last 24 months?" and "How much do you enjoy watching television programmes with subtitles?Please consider the last 24 months".The participants had to rank the responses from 1 (Never) to 5 (Daily).The Viewing_frequency variable was then the average value of these two questions.

Narrative engagement
We calculated the average value for the 12-item Narrative Engagement scale presented to the participants.Figure 1 shows these results (N = 119).The Cronbach's alpha reliability coefficient (α) 15 is 0.90 for all the items in the scale, which is considered an excellent reliability score.Figure 1 shows that the narrative engagement overall is highest for PE, followed by HT and lastly by MT, i.e. viewers report higher engagement when watching the telenovela with PE subtitles.These results are similar to the pilot experiment in Prolific.
To understand the data better, firstly, the variable Narrative_Engagement was explored according to the translation condition of the subtitles using the Kruskal-Wallis test.Statistically significant differences were found between conditions (H(2) = 9.29, p < .01)with a mean rank score of 71.68 for PE, 59.910 for HT and 47.84 for MT.Post-hoc comparisons show statistically significant differences between PE and MT (Z = 3.15; p = .00)but not between HT and MT.Secondly, and since viewers had different viewing patterns, we ran a linear regression model 16 to see the interaction between the dependent variable Narrative_Engagement and the independent variables Modality and Viewing_frequency.A significant regression was found (F(3,115) = 7.90, p<0.00), with an R squared of 0.15.The estimated mean for PE was 3.37, the predicted narrative engagement decreases by 0.62 points in MT and 0.28 in HT and increases by 0.37 according to the viewing frequency.MT and the viewing frequency are statistically significant.
Therefore, viewers that watch subtitles that are post-edited have engaged significantly more than those with MT subtitles, also those that have watched programmes with subtitles and enjoyed them more in the last 24 months have a statistically significant higher engagement than those who have a lower viewing frequency.

Narrative engagement per category
The Narrative Engagement scale contains four distinct categories.Narrative Understanding relates to the ease of comprehension of a programme.Participants ranked their agreement with the following statements from 1 to 7: "At points, I had a hard time making sense of what was going on in the programme", "My understanding of the characters is unclear", "I had a hard time recognizing the thread of the programme".There were significant differences between modalities in this category.The Kruskal-Wallis test shows statistically significant differences between conditions (H(2) = 10.66,p < .00).Post-hoc comparisons show statistically significant differences between PE and MT (Z = 3.38; p = .00)but no statistically significant differences between HT and MT, nor PE and HT.
Attentional Focus is the state of being engaged and not distracted.Participants reacted to the following statements: "I found my mind wandering while reading the programme", "While reading, I found myself thinking about other things", "I had a hard time keeping my mind on the programme".There are no statistically significant differences in this category.
Narrative Presence is the feeling that one has entered the world of the programme.Participants reacted to these statements: "During the reading, my body was in the room, but my mind was inside the world created by the programme", "The programme created a new world, and then that world suddenly disappeared when the programme ended", "At times during the reading, I was closer to the situation described in the programme than the realities of here-and-now".The Kruskal-Wallis H test shows statistically significant differences between conditions (H(2) = 7.65, p < .02).Post-hoc comparisons show statistically significant differences between PE and MT (Z = 2.65; p = .01)but not between HT and MT, nor PE and HT.
Emotional Engagement is feeling for and with the characters.Participants reacted to these statements: "During the narrative, when a main character suffered, I felt sad", "The programme affected me emotionally", "I felt sorry for some of the characters in the programme".There are no statistically significant differences in this category.
The categories affected by the use of MT are Narrative Understanding (the ease of comprehension of the programme), as in previous research with literary texts (Guerberof-Arenas and Toral 2020), but also Narrative Presence (the feeling of immersion in the programme).It appears MT has a disconnecting effect for viewers of this telenovela that does not happen in the HT or PE conditions.

Enjoyment
Figure 2 shows the results of the average scores given for this 2-item scale (α = 0.83) for the two languages (N = 119).Figure 2 shows that differences between conditions are more pronounced than in narrative engagement.Therefore, and as before, we look at the variable Enjoyment according to the translation condition using the Kruskal-Wallis test.Statistically significant differences were found between conditions (H(2) = 10.03,p < .01)with a mean rank score of 69.48 for PE, 63.88 for HT and 45.84 for MT.Post-hoc comparisons show statistically significant differences between PE and MT (Z =3.15; p = .00)and between MT and HT (Z = -2.48;p = .02).
We ran a linear regression model to see the interaction between the dependent variable Enjoyment based and the independent variables Modality and Viewing_frequency.A significant regression was found (F(3,115) = 9.35, p<0.00), with an R squared of 0.18.The estimated mean for PE was 2.70, the predicted Enjoyment decreases by 0.90 points in MT and 0.25 in HT and increases by 0.57 according to the viewing frequency.MT and the viewing frequency are of significant value.Therefore, we can say that viewers who view post-edited or translated subtitles enjoy the telenovela significantly more than those with MT subtitles.We also see that those who have watched programmes with subtitles and enjoyed them more in the last 24 months have a statistically significantly higher enjoyment than those that have a lower viewing frequency.

Translation reception
Figure 3 shows the results of the average scores given for this 5-item scale (α = 0.85) for translation reception (N = 119).Figure 3 shows that differences between the conditions are even more pronounced than in enjoyment and narrative engagement.It appears that viewers perceived issues in the MT output.As before, the variable Translation_reception was analysed according to the translation modality using the Kruskal-Wallis test and statistically significant differences were found between conditions (H(2) = 63.52,p < .00)with a mean rank score of 79.58 for PE, 74.74 for HT and 23.49 for MT.Post-hoc comparisons show statistically significant differences between PE and MT (Z =10.52; p = .00)and between MT and HT (Z = -9.67;p = .00).Also as before, a linear regression model was run to see the interaction between the dependent variable Translation_reception based and the independent variables Modality and Viewing_frequency.However, the assumptions for homoscedasticity and normality 17 are not met for the model, and therefore results are not presented.
Regarding the technical aspects of the subtitles, some participants commented on the subtitles being too fast.One PE viewer wrote that the "only negative for me was reading the subtitles quickly enough and feeling as if I was missing the expressions on the actors' faces while I was busy reading".A HT viewer also wrote that "I found that I missed the end of some sentences due to me looking at the characters".Most participants who commented said that they enjoyed the programme and would like to see what happened next.One viewer of PE subtitles asked "W[h]ere can I watch the rest, Did they have a boy or girl??", another wrote "I will probably find this now to continue watching properly as I was hooked!!", and a viewer of HT said "[i]t was like a [S]panish version of [E]astenders".The least favourable comments were from viewers who did not like soap operas, such as the PE viewer who wrote that the "acting was very cheesy and that is why I could not empathise with the characters -it was nothing to do with the language or use of subtitles".Aside from complaints about speed, negative comments about subtitles came only from MT viewers (see Section 4.3.7).One wrote that the "subtitles drew me away from the scenes" and "made it more difficult to follow what was going on".Therefore, viewers who watch this subtitled telenovela in PE or HT conditions are significantly more positive about the translation than those who receive MT subtitles.We cannot confirm if the viewing frequency was a factor in translation reception.

MT rating
When participants were debriefed about the nature of the experiment, those assigned to the MT (38) modality were asked if they were aware that the subtitles were machine translated, to rate the quality in a scale from 1 (Extremely bad) to 5 (Extremely good), and finally to choose their preferred translation modality from three options: Original Spanish, Translated by professionals, MT corrected by professionals.The reason why this group was asked about MT was because it was the only group exposed to this modality and we were interested in knowing the quality of the MT engine according to the viewers.
From the 38 participants, 6 said they had realised they were watching the telenovela with subtitles translated using MT, 15 "at times", and 17 reported not knowing.This is interesting because although MT was rated the lowest in all categories, not all of them necessarily associated MT quality with their low rating.This is an indication that viewers might show a lower user experience when watching AVT content without necessarily knowing that this is partially due to using MT for subtitling.Figure 4 shows the values given by participants to the quality of the MT output.Figure 4 shows from left to right that 3 participants rated MT as Extremely bad; 19, the majority, rated MT as Slightly bad; followed by 6 as Neither good nor bad; then 9 as Slightly good, and 1 as Extremely good.The mean value for the quality of the MT output is 2.63.
Regarding preferences in translation, 24 participants prefer subtitles translated by professionals, 13 want MT corrected by professionals and one prefers the Original Spanish (this participant declared having a moderate knowledge of Spanish).
Participants who viewed MT used free comment space to highlight the problems with gender in MT (mentioned by 13 participants) or inconsistent translation of names (mentioned by 4 participants).Comprehension seemed to be difficult at times.One participant wrote that "there were words used that were not english at some point" [sic], and another found that a "whole scene was very hard to follow".Yet another reported that "the subtitles did not relate to the conversation and did not make sense".

Conclusions
The use of MT in subtitling workflows has become common practice to reduce costs and turnaround time (Georgakopoulou 2021).Our study, by borrowing methodology from previous reception studies in literary texts using MT in the translation process (Guerberof-Arenas and Toral 2020 and 2022), aims to see if AVT reception changes depending on the translation modality.
Our results show that viewers show higher engagement with PE than HT, but there is only a significant difference when PE is compared to MT.The categories where the difference is significant are Narrative Understanding and Narrative Presence.This is interesting because it shows that MT prevents viewers from understanding the story line and from being present in the story.When it comes to enjoyment, the differences are more pronounced, and viewers enjoy MT significantly less than PE and HT.Finally, in translation reception, the gap is even more pronounced between MT vs. PE and HT.In brief, measures showed that post-edited subtitles were just as well received (and scored higher values) and understood as unaided human translation with MT scoring significantly lower in all the scales measured.
These results might suggest that, for this genre and language pair at least, semi-automated translation using PE is a viable option for subtitling, as shown in previous literature (Bywood, Georgakopoulou, and Etchegoyhen 2017;Koponen et al. 2020b;Matusov, Wilken, and Georgakopoulou 2019).However, and this to us is critically important, the surprising finding is that the HTER scores in Section 3.3 demonstrate that a substantial amount of edits are necessary to render the automatic subtitles publishable, and this is the case in a genre (a telenovela) that has, in theory, an uncomplicated style.In our case, translators were not constrained by time nor by the rate paid (they were paid the rate quoted beforehand), so they could indeed edit the subtitles until they were happy with them to achieve publishable quality and this, in turn, resulted in an improved viewer experience.In combination, this suggests that high-quality PE for creative subtitling should be paid at full rate rather than reduced rates, or at the very least that the assumption that PE should entail reduced rates is not well supported by our findings.Time to translate and post-edit was not measured in our experiment, and time savings are an important factor when deciding on price.
We wonder if results would be comparable to those in this study were subtitlers not given sufficient time or acceptable remuneration for the postediting task.It is important, therefore, that companies that use subtitling with MT in the workflow are transparent, and also that they allow researchers to carry out similar experiments to the one presented here in an open-data setting.Add to this the recommendation from, for example, Cadwell, O'Brien and Teixeira (2018) that implementation of partial automation of translation should be participatory rather than unilateral, and we can see how misgivings from translators and translator associations about the role of MT in contemporary subtitling workflows might be rooted in the considerable amount of work that MT requires so that it can be published.
There are many lessons from this iterative series of experiments that can be of value to other researchers or companies when assessing AVT reception.Translators used contemporary tools and the translation and PE tasks were mixed, which removed the confounding translator style effect and partially replicates the industry practice of splitting programmes into sections to reassemble post-hoc (Moorkens 2020).Care was taken so that participants were demographically similar, with a comparable interest in subtitled programming and level of proficiency in the source language.Experiment conditions were randomised and recruitment was via an independent platform, as snowball recruitment by researchers may attract participants with better knowledge of language and who are more exposed to subtitled media than other users.
While the methodology and results may be valuable, there are limitations to this study.Findings are limited to the ES-MX to EN-UK language pair and the genre of drama or soap opera.It would be valuable to measure translators' effort in different modalities, ideally using a selection of language pairs and AV genres.Finally, due to time and financial constraints, the final subtitles are not evaluated to assess their quality or their level of creativity, and this can be a determining factor when looking at user experience.

Figure 1
Figure 1 Narrative engagement per modality in the main experiment

Figure 2 :
Figure 2: Enjoyment according to modality in the main experiment

Figure 3 :
Figure 3: Translation reception according to modality in the main experiment

Figure 4 :
Figure 4: Participants' MT ratings Some preferential changes were also introduced as a result of PE, for example where the MT and HT reads "Well you'll have to learn.", the PE reads "Well, you'll have to learn how."There are also examples of changes to spotting in PE, despite guidelines to the contrary.The PE output tended to retain non-standard commas at the end of some subtitles from MT output, and once, italics were added, as in the following example (indicated by the tags <i> and </i>): HT: I made quesadillas.Are you hungry?PE:I made<i> sincronizadas.</i>Are you hungry?MT: I did sync, are you hungry?