Abstract
This study quantifies the linguistic variations between translated and non-translated sports news by using thirteen lexical and syntactic indices from quantitative linguistics and corpus linguistics, with the additional aim of testing the translation universals hypothesis. Through a 40,000-word comparable corpus, Random Forest analysis and statistical tests, this study identifies key linguistic indices that distinguish these two text types. The results reveal that 1) Writer’s View, Activity, R1, RRmc and ATL are the most significant predictors 2) translated texts exhibit significantly lower lexical density and diversity, reflected by lower R1 and RRmc values, but include significantly longer words (longer ATL), which supports the simplification hypothesis partially; 3) non-translated texts display higher Writer’s View and Activity scores, indicating greater authorial control over structural organisation and dynamic style compared with translators; and 4) the similar levels of MTLD, HL, Lambda, Entropy, MSL, MTL, MCL and Verb Distance observed in translated and non-translated texts align with the normalisation hypothesis. This study firstly presents a quantitative linguistic approach to examining the distinctive lexical and syntactic features of sports news translation. The findings support the simplification and normalisation hypotheses and reveal the self-organised and static styles of translated language, enriching the understanding of genre-specific translation universals.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2026 Xinlei Jiang
