Enhancing Abstractive Multi-Document Summarization with Bert2Bert Model for Indonesian Language
DOI:
https://doi.org/10.14421/jiska.2025.10.1.110-121Keywords:
Bert2Bert, Abstractive, Multi-document, Summarization, TransformerAbstract
This study investigates the effectiveness of the proposed Bert2Bert and Bert2Bert+Xtreme models in improving abstract multi-document summarization for the Indonesian language. This study uses the transformer model as a basis for developing the proposed Bert2Bert and Bert2Bert+Xtreme models. The results of the model evaluation with the Liputan6 dataset using ROUGE-1, ROUGE-2, ROUGE-L, and BERTScore show that the proposed models have slight improvements over previous research models with Bert2Bert being better than Bert2Bert+Xtreme. Despite the challenges posed by limited reference summarization for Indonesian documents, content-based analysis using readability metrics, including FKGL, GFI, and Dwiyanto Djoko Pranowo revealed that the summaries generated by Bert2Bert and Bert2Bert+Xtreme are at a moderate readability level, which means they are suitable for adult readers and in line with the target audience of the news portal.References
Abka, A. F., Azizah, K., & Jatmiko, W. (2022). Transformer-based Cross-Lingual Summarization using Multilingual Word Embeddings for English - Bahasa Indonesia. International Journal of Advanced Computer Science and Applications, 13(12). https://doi.org/10.14569/IJACSA.2022.0131276
Alquliti, W. H., Binti, N., & Ghani, A. (2019). Convolutional Neural Network based for Automatic Text Summarization. International Journal of Advanced Computer Science and Applications, 10(4), 200–211. https://doi.org/10.14569/IJACSA.2019.0100424
Biddinika, M. K., Lestari, R. P., Indrawan, B., Yoshikawa, K., Tokimatsu, K., & Takahashi, F. (2016). Measuring the readability of Indonesian biomass websites: The ease of understanding biomass energy information on websites in the Indonesian language. Renewable and Sustainable Energy Reviews, 59, 1349–1357. https://doi.org/10.1016/j.rser.2016.01.078
Bing, L., Li, P., Liao, Y., Lam, W., Guo, W., & Passonneau, R. J. (2015). Abstractive Multi-Document Summarization via Phrase Selection and Merging. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, 1587–1597. https://doi.org/10.3115/v1/P15-1153
Dangol, R., Adhikari, P., Dahal, P., & Sharma, H. (2023). Short Updates-Machine Learning Based News Summarizer. Journal of Advanced College of Engineering and Management, 8(2), 15–25.
Devi, K. U. S., & Suadaa, L. H. (2022). Extractive Text Summarization for Snippet Generation on Indonesian Search Engine using Sentence Transformers. 2022 International Conference on Data Science and Its Applications (ICoDSA), 181–186.
Devianti, R. S., & Khodra, M. L. (2019). Abstractive Summarization using Genetic Semantic Graph for Indonesian News Articles. International Conference of Advanced Informatics : Concepts, Theory and Applications (ICAICTA).
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Eprint ArXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805
Dewi, K. E., & Widiastuti, N. I. (2022). The Design of Automatic Summarization of Indonesian Texts Using a Hybrid Approach. Jurnal Teknologi Informasi Dan Pendidikan. https://api.semanticscholar.org/CorpusID:253313594
Fadziah, Y. N. (2017). Penerapan Algoritma Enchanced Confix Stripping dalam Pengukuran Keterbacaan Teks Menggunakan Gunning Fog Index. Universitas Pendidikan Indonesia.
Goldstein, J. (n.d.). Multi-Document Summarization By Sentence Extraction.
Gunawan, D., Harahap, S. H., & Rahmat, R. F. (2019). Multi-document summarization by using textrank and maximal marginal relevance for text in Bahasa Indonesia. 2019 International Conference on ICT for Smart Society (ICISS), 7, 1–5.
Gunawan, Y. H. B., & Khodra, M. L. (2020). Multi-document Summarization using Semantic Role Labeling and Semantic Graph for Indonesian News Article. 2020 7th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA), 1–6.
Jin, H., & Wan, X. (2020). Abstractive Multi-Document Summarization via Joint Learning with Single-Document Summarization. Findings of the Association for Computational Linguistics: EMNLP 2020, 2545–2554. https://doi.org/10.18653/v1/2020.findings-emnlp.231
Koto, F., Lau, J. H., & Baldwin, T. (2020). Liputan6: A Large-scale Indonesian Dataset for Text Summarization. Proceedings Ofthe 1st Conference Ofthe Asia-Pacific Chapter Ofthe Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 598–608. https://doi.org/10.48550/arXiv.2011.00679
Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. The 28th International Conference on Computational Linguistics, 757–770. https://doi.org/10.18653/v1/2020.coling-main.66
Kurniawan, K., & Louvan, S. (2018). IndoSum: A New Benchmark Dataset for Indonesian Text Summarization. Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018, 215–220. https://doi.org/10.1109/IALP.2018.8629109
Kuyate, S., Jadhav, O., & Jadhav, P. (2023). AI Text Summarization System. International Journal for Research in Applied Science and Engineering Technology, 11(5), 916–919. https://doi.org/10.22214/ijraset.2023.51481
Laksana, M. D. B., Karyawati, E. A., Putri, L. A. A. R., Santiyasa, I. W., ER, N. A. S., & Kadnyanan, I. G. A. G. A. (2022). Text Summarization terhadap Berita Bahasa Indonesia menggunakan Dual Encoding. Jurnal Elektronik Ilmu Komputer Udayana, 11, 2.
Lamsiyah, S., Mahdaouy, A. El, Ouatik, S. E. A., & Espinasse, B. (2023). Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning. Journal of Information Science, 49(1), 164–182. https://doi.org/10.1177/0165551521990616
Li, W., & Zhuge, H. (2021). Abstractive multi-document summarization based on semantic link network. IEEE Transactions on Knowledge and Data Engineering, 33(1), 43–54. https://doi.org/10.1109/TKDE.2019.2922957
Lin, C.-Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004).
Lucky, H., & Suhartono, D. (2022). Investigation Of Pre-Trained Bidirectional Encoder Representations From Transformers Checkpoints For Indonesian Abstractive Text Summarization. Journal of Information and Communication Technology, 21(1), 71–94. https://doi.org/10.32890/jict2022.21.1.4
Maylawati, D. S., Kumar, Y. J., Kasmin, F. B., & Raza, B. (2019). Sequential pattern mining and deep learning to enhance readability of indonesian text summarization. International Journal of Advanced Trends in Computer Science and Engineering. Https://Doi. Org/10.30534/Ijatcse/2019/78862019.
Maylawati, D. S., Kumar, Y. J., Kasmin, F., & Ramdhani, M. A. (2024). Deep sequential pattern mining for readability enhancement of Indonesian summarization. International Journal of Electrical and Computer Engineering (IJECE), 14(1), 782. https://doi.org/10.11591/ijece.v14i1.pp782-795
Mursyadah, U. (2021). Tingkat Keterbacaan buku sekolah elektronik (bse) pelajaran biologi kelas X SMA/MA. TEACHING: Jurnal Inovasi Keguruan dan Ilmu Pendidikan, 1 (4), 298–304.
Pranowo, D. D. (2011). Instrument of Indonesian Texts Readability.
Readability Formulas. (2020). The Flesch Grade Level Readability Formula.
Rothe, S., Narayan, S., & Severyn, A. (2019). Leveraging Pre-trained Checkpoints for Sequence Generation Tasks. https://doi.org/10.1162/tacl_a_00313
Sari, M. P., & Herri, H. (2020). ANALISA KONTEN SERTA TINGKAT KETERBACAAN PERNYATAAN MISI DAN PENGARUHNYA TERHADAP KINERJA PERBANKAN INDONESIA. Menara Ilmu: Jurnal Penelitian Dan Kajian Ilmiah, 14(1).
Severina, V., & Khodra, M. L. (2019, September 1). Multidocument Abstractive Summarization using Abstract Meaning Representation for Indonesian Language. Proceedings - 2019 International Conference on Advanced Informatics: Concepts, Theory, and Applications, ICAICTA 2019. https://doi.org/10.1109/ICAICTA.2019.8904449
Shen, C., Cheng, L., Nguyen, X.-P., You, Y., & Bing, L. (2023, May 15). A Hierarchical Encoding-Decoding Scheme for Abstractive Multi-document Summarization. Eprint ArXiv:2305.08503. https://doi.org/10.48550/arXiv.2305.08503
Shinde, K., Roy, T., & Scispace, G. (2022). An Extractive-Abstractive Approach for Multi-document Summarization of Scientific Articles for Literature Review. Proceedings of the Third Workshop on Scholarly Document Processing, 204–209. https://github.com/allenai/ mslr-shared-task
Sing Goh, O., Che Fung, C., Depickere, A., & Wai Wong, K. (2007). Using Gunnnig-Fog Index to Assess Instant Messages Readability from ECAs 1. Third International Conference on Natural Computation (ICNC 2007), 5, 480–486. https://doi.org/10.1109/ICNC.2007.800
Solnyshkina, M. I., Zamaletdinov, R. R., Gorodetskaya, L. A., & Gabitov, A. I. (2017). Evaluating Text Complexity and Flesch-Kincaid Grade Level. Journal of Social Studies Education Research, 8(3), 238–248. www.jsser.org
Sugiri, Eko Prasojo, R., & Alfa Krisnadhi, A. (2022). Controllable Abstractive Summarization Using Multilingual Pretrained Language Model. 2022 10th International Conference on Information and Communication Technology (ICoICT), 228–233. https://doi.org/10.1109/ICoICT55009.2022.9914846
Sutskever, I., Vinyals, O., & Le, Q. V. (2014, September 10). Sequence to Sequence Learning with Neural Networks. Proceedings of the 27th International Conference on Neural Information Processing Systems. https://doi.org/10.48550/arXiv.1409.3215
Świeczkowski, D., & Kułacz, S. (2021). The use of the gunning fog index to evaluate the readability of Polish and english drug leafletin the context of health literacy challenges in medical linguistics: An exploratory study. In Cardiology Journal (Vol. 28, Issue 4, pp. 627–631). Via Medica. https://doi.org/10.5603/CJ.a2020.0142
The Gunning’s Fog Index (or FOG) Readability Formula. (2020).
Utami, S. D., Dewi, I. N., & Efendi, I. (2021). Tingkat Keterbacaan Bahan Ajar Flexible Learning Berbasis Kolaboratif Saintifik. Bioscientist: Jurnal Ilmiah Biologi, 9(2), 577–587.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. http://arxiv.org/abs/1706.03762
Verma, P., & Om, H. (2019). A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā, 44, 1–15.
Verma, P., Pal, S., & Om, H. (2019). A comparative analysis on Hindi and English extractive text summarization. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(3), 1–39.
Widjanarko, A., Kusumaningrum, R., & Surarso, B. (2018). Multi document summarization for the Indonesian language based on latent dirichlet allocation and significance sentence. 2018 International Conference on Information and Communications Technology (ICOIACT), 520–524.
Wijayanti, R., Khodra, M. L., & Widyantoro, D. H. (2021). Indonesian Abstractive Summarization using Pre-Trained Model. 3rd 2021 East Indonesia Conference on Computer and Information Technology, EIConCIT 2021, 79–84. https://doi.org/10.1109/EIConCIT50028.2021.9431880
Zhang, J., Tan, J., & Wan, X. (2018a). Adapting Neural Single-Document Summarization Model for Abstractive Multi-Document Summarization: A Pilot Study. Proceedings of the 11th International Conference on Natural Language Generation, 381–390. https://doi.org/10.18653/v1/W18-6545
Zhang, J., Tan, J., & Wan, X. (2018b, April 24). Towards a Neural Network Approach to Abstractive Multi-Document Summarization. Eprint ArXiv:1804.09010. https://doi.org/10.48550/arXiv.1804.09010
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019, April 21). BERTScore: Evaluating Text Generation with BERT. ICLR2020. http://arxiv.org/abs/1904.09675
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Aldi Fahluzi Muharam, Yana Aditia Gerhana, Dian Sa'adillah Maylawati, Muhammad Ali Ramdhani, Titik Khawa Abdul Rahman

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.