Enhancing Abstractive Multi-Document Summarization with Bert2Bert Model for Indonesian Language
DOI:
https://doi.org/10.14421/jiska.2025.10.1.110-121Keywords:
Bert2Bert, Abstractive, Multi-document, Summarization, TransformerAbstract
This study investigates the effectiveness of the proposed Bert2Bert and Bert2Bert+Xtreme models in improving abstract multi-document summarization for Indonesians. This research uses the transformer model to develop the proposed Bert2Bert and Bert2Bert+Xtreme models. This research uses the Liputan6 data set which contains news data along with summary references for 10 years from October 2000 to October 2010 and is commonly used in many automatic text summarization research. The model evaluation results using ROUGE-1, ROUGE-2, ROUGE-L, and BERTScore show that the proposed model has a slight improvement over previous research models, with Bert2Bert being better than Bert2Bert+Xtreme. Despite the challenges posed by limited reference summaries for Indonesian documents, content-based analysis using readability metrics, including FKGL, GFI, and Dwiyanto Djoko Pranowo, revealed that the summaries produced by Bert2Bert and Bert2Bert+Xtreme are at a moderate readability level, meaning they are suitable for mature readers and aligns with the news portal's target audience.
References
Abka, A. F., Azizah, K., & Jatmiko, W. (2022). Transformer-based Cross-Lingual Summarization using Multilingual Word Embeddings for English - Bahasa Indonesia. International Journal of Advanced Computer Science and Applications, 13(12). https://doi.org/10.14569/IJACSA.2022.0131276
Alquliti, W. H., & Binti, N. (2019). Convolutional Neural Network based for Automatic Text Summarization. International Journal of Advanced Computer Science and Applications, 10(4), 200–211. https://doi.org/10.14569/IJACSA.2019.0100424
Biddinika, M. K., Lestari, R. P., Indrawan, B., Yoshikawa, K., Tokimatsu, K., & Takahashi, F. (2016). Measuring the readability of Indonesian biomass websites: The ease of understanding biomass energy information on websites in the Indonesian language. Renewable and Sustainable Energy Reviews, 59, 1349–1357. https://doi.org/10.1016/j.rser.2016.01.078
Bing, L., Li, P., Liao, Y., Lam, W., Guo, W., & Passonneau, R. (2015). Abstractive Multi-Document Summarization via Phrase Selection and Merging. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1587–1597. https://doi.org/10.3115/v1/P15-1153
Dangol, R., Adhikari, P., Dahal, P., & Sharma, H. (2023). Short Updates- Machine Learning Based News Summarizer. Journal of Advanced College of Engineering and Management, 8(2), 15–25. https://doi.org/10.3126/jacem.v8i2.55939
Devi, K. U. S., & Suadaa, L. H. (2022). Extractive Text Summarization for Snippet Generation on Indonesian Search Engine using Sentence Transformers. 2022 International Conference on Data Science and Its Applications (ICoDSA), 181–186. https://doi.org/10.1109/ICoDSA55874.2022.9862886
Devianti, R. S., & Khodra, M. L. (2019). Abstractive Summarization using Genetic Semantic Graph for Indonesian News Articles. Proceedings - 2019 International Conference on Advanced Informatics: Concepts, Theory, and Applications, ICAICTA 2019. https://doi.org/10.1109/ICAICTA.2019.8904361
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/arXiv.1810.04805
Dewi, K. E., & Widiastuti, N. I. (2022). The Design of Automatic Summarization of Indonesian Texts Using a Hybrid Approach. Jurnal Teknologi Informasi Dan Pendidikan, 15(1), 37–43. https://doi.org/10.24036/jtip.v15i1.451
Fadziah, Y. N., Rasim, R., & Fitrajaya, E. (2018). Penerapan Algoritma Enchanced Confix Stripping dalam Pengukuran Keterbacaan Teks Menggunakan Gunning Fog Index. Jurnal Aplikasi Dan Teori Ilmu Komputer, 1(1), 14–22. https://doi.org/10.17509/jatikom.v1i1.25143
Goh, O. S., Fung, C. C., Depickere, A., & Wong, K. W. (2007). Using Gunnnig-Fog Index to Assess Instant Messages Readability from ECAs. Third International Conference on Natural Computation (ICNC 2007) Vol V, 480–486. https://doi.org/10.1109/ICNC.2007.800
Goldstein, J., Mittal, V., Carbonell, J., & Kantrowitz, M. (2000). Multi-document summarization by sentence extraction. NAACL-ANLP 2000 Workshop on Automatic Summarization -, 4, 40–48. https://doi.org/10.3115/1117575.1117580
Gunawan, D., Harahap, S. H., & Fadillah Rahmat, R. (2019). Multi-document Summarization by using TextRank and Maximal Marginal Relevance for Text in Bahasa Indonesia. 2019 International Conference on ICT for Smart Society (ICISS), 7, 1–5. https://doi.org/10.1109/ICISS48059.2019.8969785
Gunawan, Y. H. B., & Khodra, M. L. (2021). Multi-document Summarization using Semantic Role Labeling and Semantic Graph for Indonesian News Article. https://doi.org/10.48550/arXiv.2103.03736
Jin, H., & Wan, X. (2020). Abstractive Multi-Document Summarization via Joint Learning with Single-Document Summarization. Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, 2545–2554. https://doi.org/10.18653/v1/2020.findings-emnlp.231
Koto, F., Lau, J. H., & Baldwin, T. (2020). Liputan6: A Large-scale Indonesian Dataset for Text Summarization. Proceedings Ofthe 1st Conference Ofthe Asia-Pacific Chapter Ofthe Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 598–608. https://doi.org/10.48550/arXiv.2011.00679
Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. Proceedings of the 28th International Conference on Computational Linguistics, 757–770. https://doi.org/10.18653/v1/2020.coling-main.66
Kurniawan, K., & Louvan, S. (2018). Indosum: A New Benchmark Dataset for Indonesian Text Summarization. 2018 International Conference on Asian Language Processing (IALP), 215–220. https://doi.org/10.1109/IALP.2018.8629109
Kuyate, S., Jadhav, O., & Jadhav, P. (2023). AI Text Summarization System. International Journal for Research in Applied Science and Engineering Technology, 11(5), 916–919. https://doi.org/10.22214/ijraset.2023.51481
Laksana, M. D. B., Karyawati, A. E., Putri, L. A. A. R., Santiyasa, I. W., Sanjaya ER, N. A., & Kadnyanan, I. G. A. G. A. (2022). Text Summarization terhadap Berita Bahasa Indonesia menggunakan Dual Encoding. JELIKU (Jurnal Elektronik Ilmu Komputer Udayana), 11(2), 339. https://doi.org/10.24843/JLK.2022.v11.i02.p13
Lamsiyah, S., Mahdaouy, A. El, Ouatik, S. E. A., & Espinasse, B. (2023). Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning. Journal of Information Science, 49(1), 164–182. https://doi.org/10.1177/0165551521990616
Li, W., & Zhuge, H. (2021). Abstractive Multi-Document Summarization Based on Semantic Link Network. IEEE Transactions on Knowledge and Data Engineering, 33(1), 43–54. https://doi.org/10.1109/TKDE.2019.2922957
Lin, C.-Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out, 74–81. https://aclanthology.org/W04-1013/
Lucky, H., & Suhartono, D. (2021). Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization. Journal of Information and Communication Technology, 21(1), 71–94. https://doi.org/10.32890/jict2022.21.1.4
Maylawati, D. S. (2019). Sequential Pattern Mining and Deep Learning to Enhance Readability of Indonesian Text Summarization. International Journal of Advanced Trends in Computer Science and Engineering, 8(6), 3147–3159. https://doi.org/10.30534/ijatcse/2019/78862019
Maylawati, D. S., Kumar, Y. J., Kasmin, F., & Ramdhani, M. A. (2024). Deep sequential pattern mining for readability enhancement of Indonesian summarization. International Journal of Electrical and Computer Engineering (IJECE), 14(1), 782. https://doi.org/10.11591/ijece.v14i1.pp782-795
Mursyadah, U. (2021). Tingkat Keterbacaan Buku Sekolah Elektronik (BSE) Pelajaran Biologi Kelas X SMA/MA. TEACHING : Jurnal Inovasi Keguruan Dan Ilmu Pendidikan, 1(4), 298–304. https://doi.org/10.51878/teaching.v1i4.774
Pranowo, D. D. (2011). Alat ukur keterbacaan teks berbahasa Indonesia.
Rothe, S., Narayan, S., & Severyn, A. (2020). Leveraging Pre-trained Checkpoints for Sequence Generation Tasks. Transactions of the Association for Computational Linguistics, 8, 264–280. https://doi.org/10.1162/tacl_a_00313
Sari, M. P., & Herri, H. (2020). Analisa Konten Serta Tingkat Keterbacaan Pernyataan Misi dan Pengaruhnya Terhadap Kinerja Perbankan Indonesia. Menara Ilmu: Jurnal Penelitian Dan Kajian Ilmiah, 14(1), 96–106. https://doi.org/10.31869/mi.v14i1.2003
Scott, B. (2024). Learn How to Use the Flesch-Kincaid Grade Level Formula. ReadabilityFormulas.Com. https://readabilityformulas.com/learn-how-to-use-the-flesch-kincaid-grade-level/
Scott, B. (2025). The Gunning Fog Index (or FOG) Readability Formula. ReadabilityFormulas.Com. https://readabilityformulas.com/the-gunnings-fog-index-or-fog-readability-formula/
Severina, V., & Khodra, M. L. (2019). Multidocument Abstractive Summarization using Abstract Meaning Representation for Indonesian Language. 2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA), 1–6. https://doi.org/10.1109/ICAICTA.2019.8904449
Shen, C., Cheng, L., Nguyen, X.-P., You, Y., & Bing, L. (2023). A Hierarchical Encoding-Decoding Scheme for Abstractive Multi-document Summarization. https://doi.org/10.48550/arXiv.2305.08503
Shinde, K., Roy, T., & Ghosal, T. (2022). An Extractive-Abstractive Approach for Multi-document Summarization of Scientific Articles for Literature Review. Proceedings of the Third Workshop on Scholarly Document Processing, 204–209. https://aclanthology.org/2022.sdp-1.25/
Solnyshkina, M. I., Zamaletdinov, R. R., Gorodetskaya, L. A., & Gabitov, A. I. (2017). Evaluating Text Complexity and Flesch-Kincaid Grade Level. Journal of Social Studies Education Research, 8(3), 238–248. http://www.jsser.org/index.php/jsser/article/view/225
Sugiri, Eko Prasojo, R., & Alfa Krisnadhi, A. (2022). Controllable Abstractive Summarization Using Multilingual Pretrained Language Model. 2022 10th International Conference on Information and Communication Technology (ICoICT), 228–233. https://doi.org/10.1109/ICoICT55009.2022.9914846
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks. http://arxiv.org/abs/1409.3215
Świeczkowski, D., & Kułacz, S. (2021). The use of the Gunning Fog Index to evaluate the readability of Polish and English drug leaflets in the context of Health Literacy challenges in Medical Linguistics: An exploratory study. Cardiology Journal, 28(4), 627–631. https://doi.org/10.5603/CJ.a2020.0142
Utami, S. D., Dewi, I. N., & Efendi, I. (2021). Tingkat Keterbacaan Bahan Ajar Flexible Learning Berbasis Kolaboratif Saintifik. Bioscientist : Jurnal Ilmiah Biologi, 9(2), 577. https://doi.org/10.33394/bioscientist.v9i2.4246
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. https://doi.org/10.48550/arXiv.1706.03762
Verma, P., & Om, H. (2019). A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā, 44(5), 110. https://doi.org/10.1007/s12046-019-1082-4
Verma, P., Pal, S., & Om, H. (2019). A Comparative Analysis on Hindi and English Extractive Text Summarization. ACM Transactions on Asian and Low-Resource Language Information Processing, 18(3), 1–39. https://doi.org/10.1145/3308754
Widjanarko, A., Kusumaningrum, R., & Surarso, B. (2018). Multi document summarization for the Indonesian language based on latent dirichlet allocation and significance sentence. 2018 International Conference on Information and Communications Technology (ICOIACT), 520–524. https://doi.org/10.1109/ICOIACT.2018.8350668
Wijayanti, R., Khodra, M. L., & Widyantoro, D. H. (2021). Indonesian Abstractive Summarization using Pre-trained Model. 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), 79–84. https://doi.org/10.1109/EIConCIT50028.2021.9431880
Zhang, J., Tan, J., & Wan, X. (2018). Towards a Neural Network Approach to Abstractive Multi-Document Summarization. https://doi.org/10.48550/arXiv.1804.09010
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating Text Generation with BERT. 8th International Conference on Learning Representations, ICLR 2020. https://doi.org/10.48550/arXiv.1904.09675
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Aldi Fahluzi Muharam, Yana Aditia Gerhana, Dian Sa'adillah Maylawati, Muhammad Ali Ramdhani, Titik Khawa Abdul Rahman

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.