Mowjaz Multi-Topic Labelling Task

Mowjaz is an Arabic topical content aggregation mobile application for news, sport, entertainment and other topics from top publishers that users can follow. Mowjaz search engine and recommendation system is built on top of NLP/NLU machine learning APIs that distinguish it from any other Arabic content applications available, mainly focusing on the users having the best experience and receiving content that is of their interest. Mowjaz is a subsidiary of, the world's biggest Arabic website in terms of number of visitors.

One of Mowjaz’s top AI powered models is Topic Multi-Labelling, which is the focus of this shared task. This model is basically used to classify articles based on their topics. Additionally, the model predicts multiple topics in one article and is categorized to all possible topics that are present within its content. Mowjaz's topics are classified into ten categories and an article can be classified under as many topics as it covers. This model helps users get and display the most relevant news to their interests. The enhanced user experience that Mowjaz offers makes one news article be classified and shown under all the different topics that it holds.

Mowjaz Topics:

The 10 topics that are predefined at Mowjaz are the following:

1- Arts & Celebrities فن ومشاهير

2- Economy اقتصاد

3- Kitchen مطبخ

4- Technology تكنولوجيا

5- Islam & Religions إسلام و أديان

6- News أخبار

7- Sports رياضة

8- Health صحة

9- Weather طقس

10- Others منوعات أخرى

Participating systems are expected to select one or more of the above topics for each given article. They are evaluated based on their effectiveness and efficiency. More details are given in the Dataset & Evaluation section. This task is hosted by ICICS 2021. Participating systems are invited to submit a Working Notes (System Description) papers following the guidelines of ICICS 2021.

Related Works

Several papers on multi-label text classification exist. Examples include:

Herrera, F., Charte, F., Rivera, A. J., & Del Jesus, M. J. (2016). Multilabel classification. In Multilabel Classification (pp. 17-31). Springer, Cham.

Al-Natsheh, H.T., Martinet, L., Muhlenbach, F., Rico, F., & Zighed, D.A. (2018). Metadata Enrichment of Multi-disciplinary Digital Library: A Semantic-Based Approach. Digital Libraries for Open Knowledge, (pp.32-43). Springer International Publishing.

Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine learning, 85(3), 333.

Read, J., Reutemann, P., Pfahringer, B., & Holmes, G. (2016). Meka: A Multi-label/Multi-target Extension to Weka. Journal of Machine Learning Research, 17, 1-5.

Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., & Vlahavas, I. (2011). Mulan: A java library for multi-label learning. The Journal of Machine Learning Research, 12, 2411-2414.

Madjarov, G., Kocev, D., Gjorgjevikj, D., & Džeroski, S. (2012). An extensive experimental comparison of methods for multi-label learning. Pattern recognition, 45(9), 3084-3104.

Nam, J., Kim, J., Mencía, E. L., Gurevych, I., & Fürnkranz, J. (2014, September). Large-scale multi-label text classification—revisiting neural networks. In Joint european conference on machine learning and knowledge discovery in databases (pp. 437-452). Springer, Berlin, Heidelberg.

Chen, G., Ye, D., Xing, Z., Chen, J., & Cambria, E. (2017, May). Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 2377-2383). IEEE.

Burkhardt, S., & Kramer, S. (2019). A survey of multi-label topic models. ACM SIGKDD Explorations Newsletter, 21(2), 61-79.

However, most of these papers are for the English language. For the Arabic language, the number of papers is rather limited with the following as recent examples.

Omar, A., Mahmoud, T. M., Abd-El-Hafeez, T., & Mahfouz, A. (2021). Multi-label Arabic text classification in Online Social Networks. Information Systems, 101785.

El-Alami, F., El Alaoui, S., & En Nahnahi, N. (2021). Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization. Journal of King Saud University - Computer and Information Sciences.

Elnagar, A., Al-Debsi, R., & Einea, O. (2020). Arabic text classification using deep learning models. Information Processing & Management, 57(1), 102121.

Aljedani, N., Alotaibi, R., & Taileb, M. (2020). HMATC: Hierarchical multi-label Arabic text classification model using machine learning. Egyptian Informatics Journal.

Bdeir, A. M., & Ibrahim, F. (2020, May). A Framework for Arabic Tweets Multi-label Classification Using Word Embedding and Neural Networks Algorithms. In Proceedings of the 2020 2nd International Conference on Big Data Engineering (pp. 105-112).

Al-Salemi, B., Ayob, M., Kendall, G., & Noah, S. A. M. (2019). Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms. Information Processing & Management, 56(1), 212-227.

Alzu'bi, S., Badarneh, O., Hawashin, B., Al-Ayyoub, M., Alhindawi, N., & Jararweh, Y. (2019, October). Multi-label emotion classification for arabic tweets. In 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp. 499-504). IEEE.

Hmeidi, I., Al-Ayyoub, M., Mahyoub, N. A., & Shehab, M. A. (2016). A lexicon based approach for classifying Arabic multi-labeled text. International Journal of Web Information Systems.

Shehab, M. A., Badarneh, O., Al-Ayyoub, M., & Jararweh, Y. (2016, July). A supervised approach for multi-label classification of Arabic news articles. In 2016 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1-6). IEEE.