Enhancing Abstractive Multi-Document Summarization with Bert2Bert Model for Indonesian Language

Authors

  • Aldi Fahluzi Muharam UIN Sunan Gunung Djati
  • Yana Aditia Gerhana UIN Sunan Gunung Djati
  • Dian Sa'adillah Maylawati UIN Sunan Gunung Djati
  • Muhammad Ali Ramdhani UIN Sunan Gunung Djati
  • Titik Khawa Abdul Rahman Asia e University

DOI:

https://doi.org/10.14421/jiska.2025.10.1.110-121

Keywords:

Bert2Bert, Abstractive, Multi-document, Summarization, Transformer

Abstract

This study investigates the effectiveness of the proposed Bert2Bert and Bert2Bert+Xtreme models in improving abstract multi-document summarization for Indonesians. This research uses the transformer model to develop the proposed Bert2Bert and Bert2Bert+Xtreme models. This research uses the Liputan6 data set which contains news data along with summary references for 10 years from October 2000 to October 2010 and is commonly used in many automatic text summarization research. The model evaluation results using ROUGE-1, ROUGE-2, ROUGE-L, and BERTScore show that the proposed model has a slight improvement over previous research models, with Bert2Bert being better than Bert2Bert+Xtreme. Despite the challenges posed by limited reference summaries for Indonesian documents, content-based analysis using readability metrics, including FKGL, GFI, and Dwiyanto Djoko Pranowo, revealed that the summaries produced by Bert2Bert and Bert2Bert+Xtreme are at a moderate readability level, meaning they are suitable for mature readers and aligns with the news portal's target audience.

References

Abka, A. F., Azizah, K., & Jatmiko, W. (2022). Transformer-based Cross-Lingual Summarization using Multilingual Word Embeddings for English - Bahasa Indonesia. International Journal of Advanced Computer Science and Applications, 13(12). https://doi.org/10.14569/IJACSA.2022.0131276

Alquliti, W. H., & Binti, N. (2019). Convolutional Neural Network based for Automatic Text Summarization. International Journal of Advanced Computer Science and Applications, 10(4), 200–211. https://doi.org/10.14569/IJACSA.2019.0100424

Biddinika, M. K., Lestari, R. P., Indrawan, B., Yoshikawa, K., Tokimatsu, K., & Takahashi, F. (2016). Measuring the readability of Indonesian biomass websites: The ease of understanding biomass energy information on websites in the Indonesian language. Renewable and Sustainable Energy Reviews, 59, 1349–1357. https://doi.org/10.1016/j.rser.2016.01.078

Bing, L., Li, P., Liao, Y., Lam, W., Guo, W., & Passonneau, R. (2015). Abstractive Multi-Document Summarization via Phrase Selection and Merging. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1587–1597. https://doi.org/10.3115/v1/P15-1153

Dangol, R., Adhikari, P., Dahal, P., & Sharma, H. (2023). Short Updates- Machine Learning Based News Summarizer. Journal of Advanced College of Engineering and Management, 8(2), 15–25. https://doi.org/10.3126/jacem.v8i2.55939

Devi, K. U. S., & Suadaa, L. H. (2022). Extractive Text Summarization for Snippet Generation on Indonesian Search Engine using Sentence Transformers. 2022 International Conference on Data Science and Its Applications (ICoDSA), 181–186. https://doi.org/10.1109/ICoDSA55874.2022.9862886

Devianti, R. S., & Khodra, M. L. (2019). Abstractive Summarization using Genetic Semantic Graph for Indonesian News Articles. Proceedings - 2019 International Conference on Advanced Informatics: Concepts, Theory, and Applications, ICAICTA 2019. https://doi.org/10.1109/ICAICTA.2019.8904361

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/arXiv.1810.04805

Dewi, K. E., & Widiastuti, N. I. (2022). The Design of Automatic Summarization of Indonesian Texts Using a Hybrid Approach. Jurnal Teknologi Informasi Dan Pendidikan, 15(1), 37–43. https://doi.org/10.24036/jtip.v15i1.451

Fadziah, Y. N., Rasim, R., & Fitrajaya, E. (2018). Penerapan Algoritma Enchanced Confix Stripping dalam Pengukuran Keterbacaan Teks Menggunakan Gunning Fog Index. Jurnal Aplikasi Dan Teori Ilmu Komputer, 1(1), 14–22. https://doi.org/10.17509/jatikom.v1i1.25143

Goh, O. S., Fung, C. C., Depickere, A., & Wong, K. W. (2007). Using Gunnnig-Fog Index to Assess Instant Messages Readability from ECAs. Third International Conference on Natural Computation (ICNC 2007) Vol V, 480–486. https://doi.org/10.1109/ICNC.2007.800

Goldstein, J., Mittal, V., Carbonell, J., & Kantrowitz, M. (2000). Multi-document summarization by sentence extraction. NAACL-ANLP 2000 Workshop on Automatic Summarization -, 4, 40–48. https://doi.org/10.3115/1117575.1117580

Gunawan, D., Harahap, S. H., & Fadillah Rahmat, R. (2019). Multi-document Summarization by using TextRank and Maximal Marginal Relevance for Text in Bahasa Indonesia. 2019 International Conference on ICT for Smart Society (ICISS), 7, 1–5. https://doi.org/10.1109/ICISS48059.2019.8969785

Gunawan, Y. H. B., & Khodra, M. L. (2021). Multi-document Summarization using Semantic Role Labeling and Semantic Graph for Indonesian News Article. https://doi.org/10.48550/arXiv.2103.03736

Jin, H., & Wan, X. (2020). Abstractive Multi-Document Summarization via Joint Learning with Single-Document Summarization. Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, 2545–2554. https://doi.org/10.18653/v1/2020.findings-emnlp.231

Koto, F., Lau, J. H., & Baldwin, T. (2020). Liputan6: A Large-scale Indonesian Dataset for Text Summarization. Proceedings Ofthe 1st Conference Ofthe Asia-Pacific Chapter Ofthe Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 598–608. https://doi.org/10.48550/arXiv.2011.00679

Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. Proceedings of the 28th International Conference on Computational Linguistics, 757–770. https://doi.org/10.18653/v1/2020.coling-main.66

Kurniawan, K., & Louvan, S. (2018). Indosum: A New Benchmark Dataset for Indonesian Text Summarization. 2018 International Conference on Asian Language Processing (IALP), 215–220. https://doi.org/10.1109/IALP.2018.8629109

Kuyate, S., Jadhav, O., & Jadhav, P. (2023). AI Text Summarization System. International Journal for Research in Applied Science and Engineering Technology, 11(5), 916–919. https://doi.org/10.22214/ijraset.2023.51481

Laksana, M. D. B., Karyawati, A. E., Putri, L. A. A. R., Santiyasa, I. W., Sanjaya ER, N. A., & Kadnyanan, I. G. A. G. A. (2022). Text Summarization terhadap Berita Bahasa Indonesia menggunakan Dual Encoding. JELIKU (Jurnal Elektronik Ilmu Komputer Udayana), 11(2), 339. https://doi.org/10.24843/JLK.2022.v11.i02.p13

Lamsiyah, S., Mahdaouy, A. El, Ouatik, S. E. A., & Espinasse, B. (2023). Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning. Journal of Information Science, 49(1), 164–182. https://doi.org/10.1177/0165551521990616

Li, W., & Zhuge, H. (2021). Abstractive Multi-Document Summarization Based on Semantic Link Network. IEEE Transactions on Knowledge and Data Engineering, 33(1), 43–54. https://doi.org/10.1109/TKDE.2019.2922957

Lin, C.-Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out, 74–81. https://aclanthology.org/W04-1013/

Lucky, H., & Suhartono, D. (2021). Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization. Journal of Information and Communication Technology, 21(1), 71–94. https://doi.org/10.32890/jict2022.21.1.4

Maylawati, D. S. (2019). Sequential Pattern Mining and Deep Learning to Enhance Readability of Indonesian Text Summarization. International Journal of Advanced Trends in Computer Science and Engineering, 8(6), 3147–3159. https://doi.org/10.30534/ijatcse/2019/78862019

Maylawati, D. S., Kumar, Y. J., Kasmin, F., & Ramdhani, M. A. (2024). Deep sequential pattern mining for readability enhancement of Indonesian summarization. International Journal of Electrical and Computer Engineering (IJECE), 14(1), 782. https://doi.org/10.11591/ijece.v14i1.pp782-795

Mursyadah, U. (2021). Tingkat Keterbacaan Buku Sekolah Elektronik (BSE) Pelajaran Biologi Kelas X SMA/MA. TEACHING : Jurnal Inovasi Keguruan Dan Ilmu Pendidikan, 1(4), 298–304. https://doi.org/10.51878/teaching.v1i4.774

Pranowo, D. D. (2011). Alat ukur keterbacaan teks berbahasa Indonesia.

Rothe, S., Narayan, S., & Severyn, A. (2020). Leveraging Pre-trained Checkpoints for Sequence Generation Tasks. Transactions of the Association for Computational Linguistics, 8, 264–280. https://doi.org/10.1162/tacl_a_00313

Sari, M. P., & Herri, H. (2020). Analisa Konten Serta Tingkat Keterbacaan Pernyataan Misi dan Pengaruhnya Terhadap Kinerja Perbankan Indonesia. Menara Ilmu: Jurnal Penelitian Dan Kajian Ilmiah, 14(1), 96–106. https://doi.org/10.31869/mi.v14i1.2003

Scott, B. (2024). Learn How to Use the Flesch-Kincaid Grade Level Formula. ReadabilityFormulas.Com. https://readabilityformulas.com/learn-how-to-use-the-flesch-kincaid-grade-level/

Scott, B. (2025). The Gunning Fog Index (or FOG) Readability Formula. ReadabilityFormulas.Com. https://readabilityformulas.com/the-gunnings-fog-index-or-fog-readability-formula/

Severina, V., & Khodra, M. L. (2019). Multidocument Abstractive Summarization using Abstract Meaning Representation for Indonesian Language. 2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA), 1–6. https://doi.org/10.1109/ICAICTA.2019.8904449

Shen, C., Cheng, L., Nguyen, X.-P., You, Y., & Bing, L. (2023). A Hierarchical Encoding-Decoding Scheme for Abstractive Multi-document Summarization. https://doi.org/10.48550/arXiv.2305.08503

Shinde, K., Roy, T., & Ghosal, T. (2022). An Extractive-Abstractive Approach for Multi-document Summarization of Scientific Articles for Literature Review. Proceedings of the Third Workshop on Scholarly Document Processing, 204–209. https://aclanthology.org/2022.sdp-1.25/

Solnyshkina, M. I., Zamaletdinov, R. R., Gorodetskaya, L. A., & Gabitov, A. I. (2017). Evaluating Text Complexity and Flesch-Kincaid Grade Level. Journal of Social Studies Education Research, 8(3), 238–248. http://www.jsser.org/index.php/jsser/article/view/225

Sugiri, Eko Prasojo, R., & Alfa Krisnadhi, A. (2022). Controllable Abstractive Summarization Using Multilingual Pretrained Language Model. 2022 10th International Conference on Information and Communication Technology (ICoICT), 228–233. https://doi.org/10.1109/ICoICT55009.2022.9914846

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. http://arxiv.org/abs/1409.3215

Świeczkowski, D., & Kułacz, S. (2021). The use of the Gunning Fog Index to evaluate the readability of Polish and English drug leaflets in the context of Health Literacy challenges in Medical Linguistics: An exploratory study. Cardiology Journal, 28(4), 627–631. https://doi.org/10.5603/CJ.a2020.0142

Utami, S. D., Dewi, I. N., & Efendi, I. (2021). Tingkat Keterbacaan Bahan Ajar Flexible Learning Berbasis Kolaboratif Saintifik. Bioscientist : Jurnal Ilmiah Biologi, 9(2), 577. https://doi.org/10.33394/bioscientist.v9i2.4246

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. https://doi.org/10.48550/arXiv.1706.03762

Verma, P., & Om, H. (2019). A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā, 44(5), 110. https://doi.org/10.1007/s12046-019-1082-4

Verma, P., Pal, S., & Om, H. (2019). A Comparative Analysis on Hindi and English Extractive Text Summarization. ACM Transactions on Asian and Low-Resource Language Information Processing, 18(3), 1–39. https://doi.org/10.1145/3308754

Widjanarko, A., Kusumaningrum, R., & Surarso, B. (2018). Multi document summarization for the Indonesian language based on latent dirichlet allocation and significance sentence. 2018 International Conference on Information and Communications Technology (ICOIACT), 520–524. https://doi.org/10.1109/ICOIACT.2018.8350668

Wijayanti, R., Khodra, M. L., & Widyantoro, D. H. (2021). Indonesian Abstractive Summarization using Pre-trained Model. 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), 79–84. https://doi.org/10.1109/EIConCIT50028.2021.9431880

Zhang, J., Tan, J., & Wan, X. (2018). Towards a Neural Network Approach to Abstractive Multi-Document Summarization. https://doi.org/10.48550/arXiv.1804.09010

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating Text Generation with BERT. 8th International Conference on Learning Representations, ICLR 2020. https://doi.org/10.48550/arXiv.1904.09675

Downloads

Published

2025-01-31

How to Cite

Muharam, A. F., Gerhana, Y. A., Maylawati, D. S., Ramdhani, M. A., & Rahman, T. K. A. (2025). Enhancing Abstractive Multi-Document Summarization with Bert2Bert Model for Indonesian Language. JISKA (Jurnal Informatika Sunan Kalijaga), 10(1), 110–122. https://doi.org/10.14421/jiska.2025.10.1.110-121