Enhancing Abstractive Multi-Document Summarization with Bert2Bert Model for Indonesian Language

Authors

  • Aldi Fahluzi Muharam Department of Informatics, Faculty of Science and Technology, UIN Sunan Gunung Djati, Bandung
  • Yana Aditia Gerhana Department of Informatics, Faculty of Science and Technology, UIN Sunan Gunung Djati, Bandung, Indonesia and Information and Communication Technology, Asia e University, Selangor
  • Dian Sa'adillah Maylawati Department of Informatics, UIN Sunan Gunung Djati Bandung
  • Muhammad Ali Ramdhani ment of Informatics, Faculty of Science and Technology, UIN Sunan Gunung Djati, Bandung
  • Titik Khawa Abdul Rahman Information and Communication Technology, Asia e University, Selangor

DOI:

https://doi.org/10.14421/jiska.2025.10.1.110-121

Keywords:

Bert2Bert, Abstractive, Multi-document, Summarization, Transformer

Abstract

This study investigates the effectiveness of the proposed Bert2Bert and Bert2Bert+Xtreme models in improving abstract multi-document summarization for the Indonesian language. This study uses the transformer model as a basis for developing the proposed Bert2Bert and Bert2Bert+Xtreme models. The results of the model evaluation with the Liputan6 dataset using ROUGE-1, ROUGE-2, ROUGE-L, and BERTScore show that the proposed models have slight improvements over previous research models with Bert2Bert being better than Bert2Bert+Xtreme. Despite the challenges posed by limited reference summarization for Indonesian documents, content-based analysis using readability metrics, including FKGL, GFI, and Dwiyanto Djoko Pranowo revealed that the summaries generated by Bert2Bert and Bert2Bert+Xtreme are at a moderate readability level, which means they are suitable for adult readers and in line with the target audience of the news portal.

References

Abka, A. F., Azizah, K., & Jatmiko, W. (2022). Transformer-based Cross-Lingual Summarization using Multilingual Word Embeddings for English - Bahasa Indonesia. International Journal of Advanced Computer Science and Applications, 13(12). https://doi.org/10.14569/IJACSA.2022.0131276

Alquliti, W. H., Binti, N., & Ghani, A. (2019). Convolutional Neural Network based for Automatic Text Summarization. International Journal of Advanced Computer Science and Applications, 10(4), 200–211. https://doi.org/10.14569/IJACSA.2019.0100424

Biddinika, M. K., Lestari, R. P., Indrawan, B., Yoshikawa, K., Tokimatsu, K., & Takahashi, F. (2016). Measuring the readability of Indonesian biomass websites: The ease of understanding biomass energy information on websites in the Indonesian language. Renewable and Sustainable Energy Reviews, 59, 1349–1357. https://doi.org/10.1016/j.rser.2016.01.078

Bing, L., Li, P., Liao, Y., Lam, W., Guo, W., & Passonneau, R. J. (2015). Abstractive Multi-Document Summarization via Phrase Selection and Merging. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 1587–1597. https://doi.org/10.3115/v1/P15-1153

Dangol, R., Adhikari, P., Dahal, P., & Sharma, H. (2023). Short Updates-Machine Learning Based News Summarizer. Journal of Advanced College of Engineering and Management, 8(2), 15–25.

Devi, K. U. S., & Suadaa, L. H. (2022). Extractive Text Summarization for Snippet Generation on Indonesian Search Engine using Sentence Transformers. 2022 International Conference on Data Science and Its Applications (ICoDSA), 181–186.

Devianti, R. S., & Khodra, M. L. (2019). Abstractive Summarization using Genetic Semantic Graph for Indonesian News Articles. International Conference of Advanced Informatics : Concepts, Theory and Applications (ICAICTA).

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Eprint ArXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805

Dewi, K. E., & Widiastuti, N. I. (2022). The Design of Automatic Summarization of Indonesian Texts Using a Hybrid Approach. Jurnal Teknologi Informasi Dan Pendidikan. https://api.semanticscholar.org/CorpusID:253313594

Fadziah, Y. N. (2017). Penerapan Algoritma Enchanced Confix Stripping dalam Pengukuran Keterbacaan Teks Menggunakan Gunning Fog Index. Universitas Pendidikan Indonesia.

Goldstein, J. (n.d.). Multi-Document Summarization By Sentence Extraction.

Gunawan, D., Harahap, S. H., & Rahmat, R. F. (2019). Multi-document summarization by using textrank and maximal marginal relevance for text in Bahasa Indonesia. 2019 International Conference on ICT for Smart Society (ICISS), 7, 1–5.

Gunawan, Y. H. B., & Khodra, M. L. (2020). Multi-document Summarization using Semantic Role Labeling and Semantic Graph for Indonesian News Article. 2020 7th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA), 1–6.

Jin, H., & Wan, X. (2020). Abstractive Multi-Document Summarization via Joint Learning with Single-Document Summarization. Findings of the Association for Computational Linguistics: EMNLP 2020, 2545–2554. https://doi.org/10.18653/v1/2020.findings-emnlp.231

Koto, F., Lau, J. H., & Baldwin, T. (2020). Liputan6: A Large-scale Indonesian Dataset for Text Summarization. Proceedings Ofthe 1st Conference Ofthe Asia-Pacific Chapter Ofthe Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 598–608. https://doi.org/10.48550/arXiv.2011.00679

Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. The 28th International Conference on Computational Linguistics, 757–770. https://doi.org/10.18653/v1/2020.coling-main.66

Kurniawan, K., & Louvan, S. (2018). IndoSum: A New Benchmark Dataset for Indonesian Text Summarization. Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018, 215–220. https://doi.org/10.1109/IALP.2018.8629109

Kuyate, S., Jadhav, O., & Jadhav, P. (2023). AI Text Summarization System. International Journal for Research in Applied Science and Engineering Technology, 11(5), 916–919. https://doi.org/10.22214/ijraset.2023.51481

Laksana, M. D. B., Karyawati, E. A., Putri, L. A. A. R., Santiyasa, I. W., ER, N. A. S., & Kadnyanan, I. G. A. G. A. (2022). Text Summarization terhadap Berita Bahasa Indonesia menggunakan Dual Encoding. Jurnal Elektronik Ilmu Komputer Udayana, 11, 2.

Lamsiyah, S., Mahdaouy, A. El, Ouatik, S. E. A., & Espinasse, B. (2023). Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning. Journal of Information Science, 49(1), 164–182. https://doi.org/10.1177/0165551521990616

Li, W., & Zhuge, H. (2021). Abstractive multi-document summarization based on semantic link network. IEEE Transactions on Knowledge and Data Engineering, 33(1), 43–54. https://doi.org/10.1109/TKDE.2019.2922957

Lin, C.-Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004).

Lucky, H., & Suhartono, D. (2022). Investigation Of Pre-Trained Bidirectional Encoder Representations From Transformers Checkpoints For Indonesian Abstractive Text Summarization. Journal of Information and Communication Technology, 21(1), 71–94. https://doi.org/10.32890/jict2022.21.1.4

Maylawati, D. S., Kumar, Y. J., Kasmin, F. B., & Raza, B. (2019). Sequential pattern mining and deep learning to enhance readability of indonesian text summarization. International Journal of Advanced Trends in Computer Science and Engineering. Https://Doi. Org/10.30534/Ijatcse/2019/78862019.

Maylawati, D. S., Kumar, Y. J., Kasmin, F., & Ramdhani, M. A. (2024). Deep sequential pattern mining for readability enhancement of Indonesian summarization. International Journal of Electrical and Computer Engineering (IJECE), 14(1), 782. https://doi.org/10.11591/ijece.v14i1.pp782-795

Mursyadah, U. (2021). Tingkat Keterbacaan buku sekolah elektronik (bse) pelajaran biologi kelas X SMA/MA. TEACHING: Jurnal Inovasi Keguruan dan Ilmu Pendidikan, 1 (4), 298–304.

Pranowo, D. D. (2011). Instrument of Indonesian Texts Readability.

Readability Formulas. (2020). The Flesch Grade Level Readability Formula.

Rothe, S., Narayan, S., & Severyn, A. (2019). Leveraging Pre-trained Checkpoints for Sequence Generation Tasks. https://doi.org/10.1162/tacl_a_00313

Sari, M. P., & Herri, H. (2020). ANALISA KONTEN SERTA TINGKAT KETERBACAAN PERNYATAAN MISI DAN PENGARUHNYA TERHADAP KINERJA PERBANKAN INDONESIA. Menara Ilmu: Jurnal Penelitian Dan Kajian Ilmiah, 14(1).

Severina, V., & Khodra, M. L. (2019, September 1). Multidocument Abstractive Summarization using Abstract Meaning Representation for Indonesian Language. Proceedings - 2019 International Conference on Advanced Informatics: Concepts, Theory, and Applications, ICAICTA 2019. https://doi.org/10.1109/ICAICTA.2019.8904449

Shen, C., Cheng, L., Nguyen, X.-P., You, Y., & Bing, L. (2023, May 15). A Hierarchical Encoding-Decoding Scheme for Abstractive Multi-document Summarization. Eprint ArXiv:2305.08503. https://doi.org/10.48550/arXiv.2305.08503

Shinde, K., Roy, T., & Scispace, G. (2022). An Extractive-Abstractive Approach for Multi-document Summarization of Scientific Articles for Literature Review. Proceedings of the Third Workshop on Scholarly Document Processing, 204–209. https://github.com/allenai/ mslr-shared-task

Sing Goh, O., Che Fung, C., Depickere, A., & Wai Wong, K. (2007). Using Gunnnig-Fog Index to Assess Instant Messages Readability from ECAs 1. Third International Conference on Natural Computation (ICNC 2007), 5, 480–486. https://doi.org/10.1109/ICNC.2007.800

Solnyshkina, M. I., Zamaletdinov, R. R., Gorodetskaya, L. A., & Gabitov, A. I. (2017). Evaluating Text Complexity and Flesch-Kincaid Grade Level. Journal of Social Studies Education Research, 8(3), 238–248. www.jsser.org

Sugiri, Eko Prasojo, R., & Alfa Krisnadhi, A. (2022). Controllable Abstractive Summarization Using Multilingual Pretrained Language Model. 2022 10th International Conference on Information and Communication Technology (ICoICT), 228–233. https://doi.org/10.1109/ICoICT55009.2022.9914846

Sutskever, I., Vinyals, O., & Le, Q. V. (2014, September 10). Sequence to Sequence Learning with Neural Networks. Proceedings of the 27th International Conference on Neural Information Processing Systems. https://doi.org/10.48550/arXiv.1409.3215

Świeczkowski, D., & Kułacz, S. (2021). The use of the gunning fog index to evaluate the readability of Polish and english drug leafletin the context of health literacy challenges in medical linguistics: An exploratory study. In Cardiology Journal (Vol. 28, Issue 4, pp. 627–631). Via Medica. https://doi.org/10.5603/CJ.a2020.0142

The Gunning’s Fog Index (or FOG) Readability Formula. (2020).

Utami, S. D., Dewi, I. N., & Efendi, I. (2021). Tingkat Keterbacaan Bahan Ajar Flexible Learning Berbasis Kolaboratif Saintifik. Bioscientist: Jurnal Ilmiah Biologi, 9(2), 577–587.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. http://arxiv.org/abs/1706.03762

Verma, P., & Om, H. (2019). A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā, 44, 1–15.

Verma, P., Pal, S., & Om, H. (2019). A comparative analysis on Hindi and English extractive text summarization. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(3), 1–39.

Widjanarko, A., Kusumaningrum, R., & Surarso, B. (2018). Multi document summarization for the Indonesian language based on latent dirichlet allocation and significance sentence. 2018 International Conference on Information and Communications Technology (ICOIACT), 520–524.

Wijayanti, R., Khodra, M. L., & Widyantoro, D. H. (2021). Indonesian Abstractive Summarization using Pre-Trained Model. 3rd 2021 East Indonesia Conference on Computer and Information Technology, EIConCIT 2021, 79–84. https://doi.org/10.1109/EIConCIT50028.2021.9431880

Zhang, J., Tan, J., & Wan, X. (2018a). Adapting Neural Single-Document Summarization Model for Abstractive Multi-Document Summarization: A Pilot Study. Proceedings of the 11th International Conference on Natural Language Generation, 381–390. https://doi.org/10.18653/v1/W18-6545

Zhang, J., Tan, J., & Wan, X. (2018b, April 24). Towards a Neural Network Approach to Abstractive Multi-Document Summarization. Eprint ArXiv:1804.09010. https://doi.org/10.48550/arXiv.1804.09010

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019, April 21). BERTScore: Evaluating Text Generation with BERT. ICLR2020. http://arxiv.org/abs/1904.09675

Downloads

Published

2025-01-31

How to Cite

Muharam, A. F., Gerhana, Y. A., Maylawati, D. S., Ramdhani, M. A., & Rahman, T. K. A. (2025). Enhancing Abstractive Multi-Document Summarization with Bert2Bert Model for Indonesian Language. JISKA (Jurnal Informatika Sunan Kalijaga), 10(1), 110–121. https://doi.org/10.14421/jiska.2025.10.1.110-121