MULTILINGUAL & INCLUSIVE AI
In my research, I analyze the diversity in social and linguistic aspects of human-human and human-machine interactions (e.g., open-domain dialogue systems, HRI) across media (e.g. face-to-face, digital environments) through both qualitative and computational methods of analyses. I have worked extensively on multilingualism & linguistic diversity around the world (e.g., Europe, Asia, South America, Africa) from a dynamic point of view including variation and change as major components for communication between humans and authored several interdisciplinary research outputs (see below).
Research Interests: Multilingualism & Linguistic Diversity (e.g., our papers in ACL-IJNLP’2021, LREC’2022, EMNLP’23), social aspects of human-human and human-machine communication, Diversity and Inclusion in AI, Language Contact & Change for humans and machine intelligence, Human-Machine Interactions, Personalization & Recommendation Systems, Digital Humanities, Computational Social Science & NLP, Open-Domain Chatbots (e.g., our papers in SIGDIAL’2021, SIGDIAL’2023) Building & Maintaining High Quality & Diverse Data for Low Resource Languages (e.g., our paper in Language Resources and Evaluation Journal’2022), Representativeness, Bias and Ethical Issues for Data Curation & Quality (e.g., our paper at EMNLP’2023), Reproducibility of research artifacts (e.g., our paper in Interspeech2023).
Updates
- Co-organizer and Program Chair (PC) for Digital Humanities Benelux-2025 (Amsterdam).
- EMNLP’24 Best Paper Award Evaluation Committee Member.
- Outstanding Senior Area Chair Award EMNLP’2023.
- Senior Area Chair for ACL’2023, EMNLP’2023, LREC-COLING’2024.
- Expert Evaluator for EC-Horizon (ERA)
- Committee Member & Assessor/Expert Evaluator for NWO (Netherlands Organization for Science), FNRS (Le Fonds de la Recherche Scientifique), NSF (National Science Foundation), ERA (European Research Area).
- SIGLEX-MWE (ACL Special Interest Group on Lexicon/Natural Language Processing) Standing Committee Member and nominated officer (2023-2025).
- SemEval (International Workshop on Semantic Evaluation) Co-Chair & Co-Organizer (2023-2024).
- Steering Committee Member and Executive Committee Member for Digital Humanities-Benelux.
- PhD Thesis Evaluation Committee Member & Jury:
- Lothritz, C. (2023). “NLP de Luxe. Challenges for Natural Language Processing in Luxembourg”, PhD Dissertation, University of Luxembourg, Interdisciplinary Center for Security, Reliability and Trust, University of Luxembourg.
- Adebara, I. (2024). Towards Afrocentric Natural Language Processing. University of British Columbia, Canada.
Keynote Speaker:
- 2024 SPELL’24 (Speech & Language Technology for Low Resource Languages) Conference/India.
- 2022 TALN’22 (Traitement Automatique des Langues Naturelles/Nationwide Annual Conference for Computational Linguistics in France), Avignon, France.
- 2017 Computer Mediated Communication and Corpora Conference, European Academy (EURAC), Bolzano, Italy.
- 2013 EMPIRIKOM (Scientific Network: Empirical Research on Internet Based Communication, DFG, German Research Foundation), Hamburg, Germany.
Publications
- Khan, A., Shipton, M., Anugraha, D., Duan, K., and Hoang, P. and Eric Khiu, E., Doğruöz, A.S., En-Shiun, A.L., (2025). URIEL+: Enhancing Linguistic Inclusion and Usability in a Typological and Multilingual Knowledge Base. COLING’24, Abu Dhabi/UAE.
- Skianis, K., Doğruöz, A.S., Pavlopoulos, J. (2024). Leveraging LLMs for Translating and Classifying Mental Health Data. Multilingual Representation Workshop, Emprical Methods to Natural Language Processing (EMNLP’2024), USA.
- Ojha, A., Doğruöz, A.S., Madabushi, H.T., Martino, G., Rosenthal, S., and Rosá,A. 2024. Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024). Association for Computational Linguistics, NAACL’2024, Mexico City, Mexico (Co-organizer & Co-chair for SemEval2024).
- Bhatia, A., Bouma, G., Doğruöz, A.S., Evang, K., Garcia, M., Giouli, V., Han,L., Nivre, J., and Rademaker, A. 2024. Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024. ELRA and ICCL, Torino, Italia (Co-organizer & Co-chair for MWE-UD’2024).
- Toossi, H., Huai, G., Liu, J., Khiu, E., Doğruöz, A.S., and Lee, E. 2024. A Reproducibility Study on Quantifying Language Similarity: The Impact of Missing Values in the URIEL Knowledge Base. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL2024/SRW), p. 233–241, Mexico City, Mexico. Association for Computational Linguistics.
- Adelani, D., Doğruöz, A.S., Coneglian, A., Ojha, A., (2024). Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages. 4th Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP24) at NAACL’2024.
- Mihajlik, P., Mady, K., Kohari, A., Fruzsina, S., Kiss, G., Grazci, T.E., Doğruöz, A.S. (2024). Is Spoken Hungarian Low-resource?: A Quantitative Survey of Hungarian Speech Data Sets. LREC-COLING’2024.
- Jin, M., Preoțiuc-Pietro, D., Doğruöz, A.S., Aletras, N. (2024). Who is bragging more online? A large scale analysis of bragging in social media. LREC-COLING’2024.
- Archna Bhatia, Gosse Bouma, A. Seza Doğruöz, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Joakim Nivre, and Alexandre Rademaker. 2024. Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024. ELRA and ICCL, Torino, Italia, edition. (co-organizer of the MWE-UD workshop & co-editor of the proceedings).
- Khiu, E., Toossi, H., Liu, J., Li, J., Anugraha, D., Flores, J.A.P.F., Roman, L.A., Doğruöz, A.S., Lee, A., (2024). Predicting Machine Translation Performance on Low Resource Languages: The Role of Domain Similarity. EACL’2024, Malta.
- Doğruöz, A.S., Sitaram, S. Yong, Z. (2023). Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation. EMNLP’2023 Findings. Singapore.
- Our AACL’23 Tutorial: Current Status of NLP in Southeast Asia with Insights from Multilingualism & Linguistic Diversity, Indonesia.
- Arvan, M., Doğruöz, A.S., Parde N. (2023). Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective. INTERSPEECH’2023, Dublin, Ireland.
- Skantze, G. & Doğruöz, A.S., (2023). The Open-domain Paradox for Chatbots: Common Ground as the Basis for Human-like Dialogue. SIGDIAL’23.
- Ojha, A. Kr., Doğruöz, A.S., da San Martino, G., Madabushi, H.T., Kumar, R., Sartori, E. (2023). Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), Annual Meeting of the Association for Computational Linguistics’23, Toronto/Canada (co-chair for SemEval and co-editor for the proceedings).
- Doğruöz, A.S. & Sitaram, S. (2022). Language Technologies for Low Resource Languages: Sociolinguistic and Multilingual Insights. Proceedings of SIGUL at LREC’22. European Language Resources Association.
- Çöltekin, Ç., Doğruöz, A. S., & Çetinoğlu, Ö. (2022). Resources for Turkish Natural Language Processing: A critical survey. Language Resources and Evaluation (LRE).
- Doğruöz, A.S. (2022). Issues about analyzing multilingual communication in immigrant contexts, In: Salah, A.A. and Korkmaz, E.E. and Bircan, T. (eds.), Data Science for Migration and Mobility, British Academy / Oxford University Press, London/UK.
- Jin, M., Preoțiuc-Pietro, D., Doğruöz, A.S., Aletras, N. (2022). Automatic Identification and Classification of Bragging in Social Media, Proceedings of The Annual Meeting of Computational Linguistics (ACL’2022), Dublin, Ireland.
- Doğruöz, A.S., Skantze, G. (2021). How open are the conversations with open-domain chatbots? A proposal for speech-event based evaluation. Proceedings of the 22nd Annual Meeting of Special Interest Group on Discourse and Dialogue (SIGDIAL2021), Singapore.
- Doğruöz, A.S., Sitaram, S., Bullock, B.E., Toribio, A.J. (2021). A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies, Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), Bangkok, Thailand. (Nominated for the Best Paper Award).
- Parida, S., Panda, S., Dash, A.R., Villatoro-Tello, E., Doğruöz, A.S., Ortega-Mendoza, R.M., Hernandez, A., Sharma, Y., Motlicek, P. (2021). Open Machine Translation for Low Resource South American Languages (Americas NLP 2021 Shared Task Contribution). First Workshop on NLP for the Indigenous Languages of Americas. 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’21).
- Frey, C., Stemle, E.W., Doğruöz, A.S. (2019). Comparison of Automatic vs. Manual Language Identification in Multilingual Social Media Texts. In Building Computer Mediated Corpora for Sociolinguistic Analysis (Eds. C.R. Wigham & E.W. Stemle), CLRL (Cahiers Labotoire de Rescherche sur de Langage) Series, University of Clermont Publications, France.
- Lison, P. & Doğruöz, A.S. (2018). Detecting Machine-Translated Subtitles in Large Parallel Corpora. Language Resources and Evaluation Conference (LREC’2018), Miyazaki, Japan.
- Devineni, P., Papalexakis, E., Koutra, D., Doğruöz, A.S., Faloutsos, M.(2017). One size does not fit all: Profiling personalized time-evolving user behaviors. International Conference on Advances in Social Networks and Mining (ASONAM’17), Sydney, Australia.
- Başkaya, O., Yıldız, E., Tuna, D., Eren, M.T., Doğruöz, A.S. (2017). Integrating meaning into Quality Evaluation of Multilingual Machine Translation. European Association for Computational Linguistics (EACL’17), Valencia, Spain
- Nguyen, D., Doğruöz, A.S., Rose, C., de Jong, F. (2016). Computational Sociolinguistics: A Survey. Computational Linguistics, 42 (3), 491-525.
- Bamman, D., Doğruöz, A.S., Eisenstein, J., Hovy, D., Jurgens, D., O’Connor, B., Oh, A., Tsur, O., Volkova, S. (2016). I. Workshop on Natural Language Processing and Computational Social Science, EMNLP, Austin, USA.
- Papalexakis, E., Doğruöz, A.S. (2015). Understanding Multilingual Social Networks in Online Immigrant Communities. WWW’15, MWA Workshop, Florence, Italy.
- Doğruöz, A.S. & Nakov, P. (2014). Predicting Dialect Variation in Immigrant Contexts Using Light Verb Constructions. Empirical Methods for Natural Language Processing Conference (EMNLP), Qatar.
- Papalexakis, E., Nguyen, D., Doğruöz, A.S. (2014). Predicting code-switching in Multilingual Communication for Immigrant Communities. Empirical Methods for Natural Language Processing Conference (EMNLP), Qatar.
- Nguyen, D., Trieschnigg, D., Doğruöz, A.S., Gravel, R., Theune, M, Meder, T., de Jong, F. (2014). Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment. International Conference on Computational Linguistics (COLING), Dublin, Ireland.
- Pellerini, M., Doğruöz, A.S., Gadde, P., Adamson, D., Rose, C. (2014). Modelling the Use of Graffiti Style Features to Signal Social Relations within Multi-Domain Learning Paradigm. European Association for Computational Linguistics (EACL), Göteborg, Sweden.
- Nguyen, D. & Doğruöz, A.S. (2013). Word Level Language Identification in Online Multilingual Communication. Empirical Methods in Natural Language Processing (EMNLP), Seattle, USA.
- Multilingualism, Language Contact & Change, Turkish
- Doğruöz, A.S. (2014). Borrowability of Subject Pronoun Constructions in Turkish-Dutch Contact. Constructions & Frames, 6, 143-69.
- Doğruöz, A.S. & Gries, S. Th. (2012). Spread of on-going change in an immigrant speech community: Turkish in the Netherlands. Review of Cognitive Linguistics, 10, 401-426.
- Doğruöz, A.S. (2012). Analyzing Language Change in Syntax and Multi-word Expressions: A case study of Turkish Spoken in The Netherlands. Proceedings of First Workshop on Language Resources and Technologies for Turkic Languages, 12th International Conference on Language Resources and Evaluation (LREC’12).
- Backus, A., Doğruöz, A.S. & Heine, B. (2011). Salient Stages in Contact-induced Grammatical Change: Evidence from Synchronic vs. Diachronic Contact Situations. Language Sciences, 35 (5), 738-752.
- Doğruöz, A.S. & Backus, A. (2010). Turkish in the Netherlands: Development of a new variety? In M. Norde, B. Jonge & C. Hasselblat (Eds.) Language Contact: New Perspectives, 51-82, John Benjamins: Amsterdam.
- Doğruöz, A.S. & Backus, A. (2009). Innovative constructions in Dutch Turkish: An assessment of on-going contact-induced change, Bilingualism: Language and Cognition, 12 (1), 41-63. (selected & promoted as “Editor’s Pick” on Cambridge Journals Online based on originality and impact).
- Doğruöz, A.S. & Backus, A. (2007). Postverbal elements in immigrant Turkish: Evidence of change? International Journal of Bilingualism, 11 (2), 185-220.
- Doğruöz, A.S. (2006). Comparison of communication strategies used by monolingual and bilingual EFL speakers in Turkey, In: A. Kavvadia, M, Joannopoulou & A. Tsagalidis (eds.) New Directions in Applied Linguistics. Proceedings from the 13th International Conference of the Greek Applied Linguistics Association, Thessaloniki: University Studio Press, 305-315.
- Doğruöz, A.S. (2005). Is there something wrong with Turkish in the Netherlands? A case study on unconventional constructions. Toegepaste Taalwetenschap in Artikelen, 74, 189-201.
- Multi-Media & Recommendation Systems
- Lehinevych, T., Kokkinis-Ntrenis, N. Siantikos, G., Doğruöz, A.S., Giannakopoulos, T. & Konstantopoulos, S. (2014). Discovering similarities for content-based recommendation and browsing in multimedia collections. Signal-Image Technology and Internet-Based Systems (SITIS’14).
- Multilingual Machine Translation
- Lison, P. & Doğruöz, A.S. (2018). Detecting Machine-Translated Subtitles in Large Parallel Corpora. Language Resources and Evaluation Conference, Miyazaki, Japan.
- Başkaya, O., Yıldız, E., Tuna, D., Eren, M.T., Doğruöz, A.S. (2017). Integrating meaning into Quality Evaluation of Multilingual Machine Translation. European Association for Computational Linguistics (EACL), Valencia, Spain.
- Social Robotics
- Konstantapoulos, S. Dagioglou, M., Doğruöz, A.S., Kirstein, F. (2014). Human-Robot Interaction Strategies for Unobtrusively Acquiring Health Related Data. MOBIHEALTH, International Conference on Wireless Mobile Communication in Health Care, Athens, Greece.