The present-day pandemic manufactured videoconferencing an indispensable portion of our doing the job life.
In buy to aid men and women, who converse various languages, proficiently connect, a modern paper on arXiv.org proposes a videoconferencing remedy with dwell translation captions.
There, members can see an overlaid translation of other participants speech in their desired language. The incoming speech sign is processed in a streaming mode, transcribed in the speakers language, and employed as input to a equipment translation process. The researchers use several functions to enable a superior person practical experience as smooth pixel-clever scrolling of the captions or fading textual content that is most likely to improve.
A comprehensive analysis suite is applied to precisely compute metrics like latency, caption flicker, and accuracy and motivate quick development in accordance to these metrics.
We existing MeetDot, a videoconferencing process with dwell translation captions overlaid on display. The process aims to facilitate conversation in between men and women who converse various languages, therefore lowering conversation barriers in between multilingual members. At present, our process supports speech and captions in 4 languages and brings together automatic speech recognition (ASR) and equipment translation (MT) in a cascade. We use the re-translation strategy to translate the streamed speech, resulting in caption flicker. On top of that, our process has very stringent latency demands to have satisfactory simply call quality. We apply several functions to enrich person practical experience and lower their cognitive load, these types of as smooth scrolling captions and lowering caption flicker. The modular architecture permits us to combine various ASR and MT products and services in our backend. Our process presents an integrated analysis suite to enhance vital intrinsic analysis metrics these types of as accuracy, latency and erasure. Last but not least, we existing an impressive cross-lingual word-guessing game as an extrinsic analysis metric to measure end-to-end process efficiency. We program to make our process open-supply for research uses.
Investigation paper: Arkhangorodsky, A., MeetDot: Videoconferencing with Stay Translation Captions, 2021. Backlink: https://arxiv.org/stomach muscles/2109.09577