Automatic Arabic translation of English educational online content using neural machine translation

Massive Open Online Courses (MOOCs) offer valuable and high quality learning opportunities and educational content in several disciplines to many students, to a large extent regardless of their background, location, and personal circumstances. However, language represents a major barrier for them, keeping non-native English speakers from benefiting from these online educational resources, since online content most available is in English. Given there are over 300 schools in Qatar covering all topics in Arabic, in order to make online educational resources more available to students in them, we designed and implemented an automatic machine translation solution based on deep learning techniques. It aims to make high-quality Arabic translations of subtitles available in English. We focused on the case of Khan Academy which provides a personalized learning experience that is mainly focused on videos. These videos have subtitles that are generally generated by volunteers for different languages. Our system covers several subjects ranging from physics and mathematics to programming and arts and humanities, with a focus on high school level students. Our system was trained using a high-quality parallel corpus from the education domain developed by the Qatar Computing Research Institute (QCRI). Furthermore, the system underwent intrinsic evaluation by comparing its output to a high-quality reference translation, as well as extrinsic evaluation in a pilot study, where we aimed at testing the quality of the system’s output in schools to evaluate its contribution to student understanding.

 

View poster   |  Watch video

  • Author

    Imane Bendou

  • Advisor

    Houda Bouamor