The modification of the SBERT language model for identifying ESG risks based on textual data from companies and supervisory activities

Authors

  • Aleksey V. Buzmakov HSE University, 37, bul. Gagarinа, Perm’, 614000, Russian Federation https://orcid.org/0000-0002-9317-8785
  • Dmitriy A. Kirpishchikov HSE University, 37, bul. Gagarinа, Perm’, 614000, Russian Federation https://orcid.org/0000-0003-3440-5842
  • Yuliya N. Naidenova HSE University, 37, bul. Gagarinа, Perm’, 614000, Russian Federation https://orcid.org/0000-0002-5838-1331
  • Sofiya N. Paklina HSE University, 37, bul. Gagarinа, Perm’, 614000, Russian Federation https://orcid.org/0000-0001-9666-989X
  • Petr A. Parshakov HSE University, 37, bul. Gagarinа, Perm’, 614000, Russian Federation
  • Roman I. Solomatin HSE University, 37, bul. Gagarinа, Perm’, 614000, Russian Federation https://orcid.org/0009-0004-0559-9910
  • Nazar S. Sotiriadi PJSC Sberbank, 19, ul. Vavilovа, Моscow, 117312, Russian Federation

DOI:

https://doi.org/10.21638/spbu10.2025.106

Abstract

An approach has been developed to identify risks associated with companies’ environmental impact, social responsibility, and governance quality (Environmental, Social, and Governance — ESG risks) based on textual information about the company. To achieve this, a modification of the SBERT language model is proposed with a clearly defined distance function for the embedding space. The model is trained on data from supervisory activities and texts of corporate websites. An example of interpretation of the model’s result is provided.

Keywords:

ESG, natural language processing model, model training, topic modeling, website

Downloads

Download data is not yet available.
 

References

Литература

Gao W., Liu Z. Green credit and corporate ESG performance: Evidence from China // Finance Research Letters. 2023. Vol. 55. Art. N. 103940.

Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding // arXiv preprint. arXiv: 1810.04508, 2019. https://arxiv.org/abs/1810.04805v2

Singh A. K., Zhang Y., Anu. Understanding the evolution of environment, social and governance research: Novel implications from bibliometric and network analysis // Evaluation Review. 2022. Vol. 47. N 2. P. 350–386.

Pavani K. A study on risk assessment and financial management on ESG // International Journal of Research Publication and Reviews. 2024. Vol. 5. N 5. P. 3624–3632.

De Giuli M. E., Grechi D., Tanda A. What do we know about ESG and risk? A systematic and bibliometric review // Corporate Social Responsibility and Environmental Management. 2023. Vol. 31. N 2. P. 1096–1108.

Tiwari R., Sharma N., Sharma N. K. Categorizing and understanding the evolution of literature on ESG investments: A bibliometric analysis // A Journal of Business Perspective. 2023. https://doi.org/10.1177/09722629.231197574

Kansal P., Malhotra K., Neelam. Recent trends on Environmental, Social and Governance Research: A bibliometric analysis // Metamorphosis: A Journal of Management Research. 2024. Vol. 23. N 1. P. 7–22.

Ziolo M., Bak I., Spoz A. Incorporating ESG risk in companies’ business models: State of research and energy sector case studies // Energies. 2023. Vol. 16. N 4. Art. N 1809.

Augustin B., Julsain H., Sager M. Integrating ESG risk analysis into a macro investment strategy // CIBC Asset Management Team Report — CIBC, 2021. URL: https://www.cibc.com/en/asset-management/insights/responsible-investing/integrating-esg-risk-analysis. html (дата обращения: 15 ноября 2024 г.).

Gallucci C., Santulli R., Lagasio V. The conceptualization of Environmental, Social and Governance risks in portfolio studies: A systematic literature review // Socio-economic Planning Sciences. 2022. Vol. 84. Art. N 101382.

Sokolov A., Mostovoy J., Ding J., Seco L. Building machine learning systems for automated ESG scoring // The Journal of Impact and ESG Investing. 2021. Vol. 1. N 3. P. 39–50.

Sokolov A., Mostovoy J., Ding J., Seco L. Building machine learning systems for automated ESG scoring // The Journal of Impact and ESG Investing. 2021. Vol. 1. Iss. 3. P. 39–50. https://doi.org/10.3905/jesg.2021.1.010

Luccioni A., Baylor E., Duchene N. Analyzing sustainability reports using natural language processing // arXiv preprint. arXiv: 2011.08073, 2020. https://arxiv.org/abs/2011.08073v2

Yim T. Y., Zhang Y., Tan W., Lam T.-W., Yiu S. M. Meticulously analyzing ESG disclosure: A data-driven approach // 2023 International Conference on Big Data (IEEE 2023). 2023. P. 2884–2889.

Yang W., Rong X. Duration dynamics: Fin-turbo’s rapid route to ESG impact insight // Proceedings of Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing (FinNLP). Torino, Italia: Association for Computational Linguistics, 2024. P. 188–196. URL: https://aclanthology.org/2024.finnlp-1.18/ (дата обращения: 15 ноября 2024 г.).

Ruberg N., Pereira R. B., Stein M. L. GreenAI — An NLP approach to ESG financing // Anais do II Brazilian workshop on artificial intelligence in finance (BWAIF 2023). Sociedade Brasileira de Computacao. 2023. P. 37–48.

Schimanski T., Reding A., Reding N., Bingler J., Kraus M., Leippold M. Bridging the gap in ESG measurement: Using NLP to quantify environmental, social, and governance communication // Finance Research Letters. 2024. Vol. 61. Art. N 104979. https://doi.org/10.1016/j.frl.2024.104979

Hernandez W., Tylinski K., Moore A., Roche N., Vadgama N., Treiblmaier H., Shangguan J., Tasca P., Xu J. Evolution of ESG-focused DLT research: An NLP analysis of the literature // arXiv preprint. arXiv: 2308.12420, 2023. https://arxiv.org/abs/2308.12420v3

Mehra S., Louka R., Zhang Y. ESGBERT: Language model to help with classification tasks related to companies’ environmental, social, and governance practices // Computer Science & Information Technology. 2022. P. 183–190. https://doi.org/10.5121/csit.2022.120616

Lee H., Lee S. H., Park H., Kim J. H., Jung H. S. ESG2PreEM: Automated ESG grade assessment framework using pre-trained ensemble models // Heliyon. 2024. Vol. 10. Iss. 4. Art. N e26404. https://doi.org/10.1016/j.heliyon.2024.e26404

Pontes E. L., Benjannet M., Ming L. K. Leveraging BERT language models for multi-lingual ESG issue identification // Proceedings of 5th Workshop on Financial Technology and Natural Language Processing and the 2nd Multimodal AI For Financial Forecasting (FinNLP). Macao: Association for Computational Linguistics, 2023. P. 121–126. URL: https://aclanthology.org/2023.finnlp-1.13/ (дата обращения: 15 ноября 2024 г.).

Kannan N., Seki Y. Textual evidence extraction for ESG scores // Proceedings of 5th Workshop on Financial Technology and Natural Language Processing and the 2nd Multimodal AI For Financial Forecasting (FinNLP). Macao: Association for Computational Linguistics, 2023. P. 45–54. URL: https://aclanthology.org/2023.finnlp-1.4/ (дата обращения: 15 ноября 2024 г.).

Goel T., Chauhan V., Sangwan S., Verma I., Dasgupta T., Dey L. TCS WITM 2022@FinSim4-ESG: Augmenting BERT with linguistic and semantic features for ESG data classification // Proceedings of 4th Workshop on Financial Technology and Natural Language Processing (FinNLP). Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, 2022. P. 235–242. URL: https://aclanthology.org/2022.finnlp-1.32/ (дата обращения: 15 ноября 2024 г.).

Banerjee N., Sarkar A., Chakraborty S., Ghosh S., Naskar S. Fine-tuning language models for predicting the impact of events associated to financial news articles // Proceedings of Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing (FinNLP). Torino, Italia: Association for Computational Linguistics, 2024. P. 244–247. URL: https://aclanthology.org/2024.finnlp-1.25/ (дата обращения: 15 ноября 2024 г.).

Pasch S., Ehnes D. NLP for responsible finance: Fine-tuning transformer-based models for ESG // 2022 International Conference on Big Data (IEEE 2022). 2022. Vol. 33. P. 3532–3536.

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., Polosukhin I. Attention is all you need // Advances in Neural Information Processing Systems. 2017. Vol. 30. URL: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-linebreak Paper.pdf (дата обращения: 15 ноября 2024 г.)

Reimers N., Gurevych I. Sentence embeddings using Siamese BERT-Networks // Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics, 2019. P. 3982–3992. https://doi.org/10.18653/v1/D19-1410.

Kuratov Y., Arkhipov M. Adaptation of deep bidirectional multilingual transformers for russian language // arXiv preprint. arXiv: 1905.07213, 2023. https://arxiv.org/abs/1905.07213v1

Ци Д., Буре В. М. Исследование инвестиционной привлекательности на основе кластерного анализа // Вестник Санкт-Петербургского университета. Прикладная математика. Информатика. Процессы управления. 2023. Т. 19. Вып. 2. С. 199–211. https://doi.org/10.21638/11701/spbu10.2023.206


References

Gao W., Liu Z. Green credit and corporate ESG performance: Evidence from China. Finance Research Letters, 2023, vol. 55. art. no. 103940.

Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint. arXiv: 1810.04508, 2019. https://arxiv.org/abs/1810.04805v2

Singh A. K., Zhang Y., Anu. Understanding the evolution of environment, social and governance research: Novel implications from bibliometric and network analysis. Evaluation Review, 2022, vol. 47, no. 2, pp. 350–386.

Pavani K. A study on risk assessment and financial management on ESG. International Journal of Research Publication and Reviews, 2024, vol. 5, no. 5, pp. 3624–3632.

De Giuli M. E., Grechi D., Tanda A. What do we know about ESG and risk? A systematic and bibliometric review. Corporate Social Responsibility and Environmental Management, 2023, vol. 31, no. 2, pp. 1096–1108.

Tiwari R., Sharma N., Sharma N. K. Categorizing and understanding the evolution of literature on ESG investments: A bibliometric analysis. A Journal of Business Perspective, 2023. https://doi.org/10.1177/09722629.231197574

Kansal P., Malhotra K., Neelam. Recent trends on Environmental, Social and Governance Research: A bibliometric analysis. Metamorphosis: A Journal of Management Research, 2024, vol. 23, no. 1, pp. 7–22.

Ziolo M., Bak I., Spoz A. Incorporating ESG risk in companies’ business models: State of research and energy sector case studies. Energies, 2023, vol. 16, no. 4, art. no. 1809.

Augustin B., Julsain H., Sager M. Integrating ESG risk analysis into a macro investment strategy. CIBC Asset Management Team Report — CIBC, 2021. Available at: https://www.cibc.com/en/asset-management/insights/responsible-investing/integrating-esg-risk-analysis. html (accessed: November 15, 2024).

Gallucci C., Santulli R., Lagasio V. The conceptualization of Environmental, Social and Governance risks in portfolio studies: A systematic literature review. Socio-economic Planning Sciences, 2022, vol. 84, art. no. 101382.

Sokolov A., Mostovoy J., Ding J., Seco L. Building machine learning systems for automated ESG scoring. The Journal of Impact and ESG Investing, 2021, vol. 1, no. 3, pp. 39–50.

Sokolov A., Mostovoy J., Ding J., Seco L. Building machine learning systems for automated ESG scoring. The Journal of Impact and ESG Investing, 2021, vol. 1, iss. 3, pp. 39–50. https://doi.org/10.3905/jesg.2021.1.010

Luccioni A., Baylor E., Duchene N. Analyzing sustainability reports using natural language processing. arXiv preprint. arXiv: 2011.08073, 2020. https://arxiv.org/abs/2011.08073v2

Yim T. Y., Zhang Y., Tan W., Lam T.-W., Yiu S. M. Meticulously analyzing ESG disclosure: A data-driven approach. 2023 International Conference on Big Data (IEEE 2023), 2023, pp. 2884–2889.

Yang W., Rong X. Duration dynamics: Fin-turbo’s rapid route to ESG impact insight. Proceedings of Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing (FinNLP). Torino, Italia, Association for Computational Linguistics Publ., 2024, pp. 188–196. Available at: https://aclanthology.org/2024.finnlp-1.18/ (accessed: November 15, 2024).

Ruberg N., Pereira R. B., Stein M. L. GreenAI — An NLP approach to ESG financing. Anais do II Brazilian Workshop on Artificial Intelligence in Finance (BWAIF 2023). Sociedade Brasileira de Computacao, 2023, pp. 37–48.

Schimanski T., Reding A., Reding N., Bingler J., Kraus M., Leippold M. Bridging the gap in ESG measurement: Using NLP to quantify environmental, social, and governance communication. Finance Research Letters, 2024, vol. 61, art. no. 104979. https://doi.org/10.1016/j.frl.2024.104979

Hernandez W., Tylinski K., Moore A., Roche N., Vadgama N., Treiblmaier H., Shangguan J., Tasca P., Xu J. Evolution of ESG-focused DLT research: An NLP analysis of the literature. arXiv preprint. arXiv: 2308.12420, 2023. https://arxiv.org/abs/2308.12420v3

Mehra S., Louka R., Zhang Y. ESGBERT: Language model to help with classification tasks related to companies’ environmental, social, and governance practices. Computer Science $&$ Information Technology, 2022, pp. 183–190. https://doi.org/10.5121/csit.2022.120616

Lee H., Lee S. H., Park H., Kim J. H., Jung H. S. ESG2PreEM: Automated ESG grade assessment framework using pre-trained ensemble models. Heliyon, 2024, vol. 10, iss. 4, art. no. e26404. https://doi.org/10.1016/j.heliyon.2024.e26404

Pontes E. L., Benjannet M., Ming L. K. Leveraging BERT language models for multi-lingual ESG issue identification. Proceedings of 5th Workshop on Financial Technology and Natural Language Processing and the 2nd Multimodal AI For Financial Forecasting (FinNLP). Macao, Association for Computational Linguistics Publ., 2023, pp. 121–126. Available at: https://aclanthology.org/2023.finnlp-1.13/ (accessed: November 15, 2024).

Kannan N., Seki Y. Textual evidence extraction for ESG scores. Proceedings of 5th Workshop on Financial Technology and Natural Language Processing and the 2nd Multimodal AI For Financial Forecasting (FinNLP). Macao, Association for Computational Linguistics Publ., 2023, pp. 45–54. Available at: https://aclanthology.org/2023.finnlp-1.4/ (accessed: November 15, 2024).

Goel T., Chauhan V., Sangwan S., Verma I., Dasgupta T., Dey L. TCS WITM 2022@FinSim4-ESG: Augmenting BERT with linguistic and semantic features for ESG data classification. Proceedings of 4th Workshop on Financial Technology and Natural Language Processing (FinNLP). Abu Dhabi, United Arab Emirates, Association for Computational Linguistics Publ., 2022, pp. 235–242. Available at: https://aclanthology.org/2022.finnlp-1.32/ (accessed: November 15, 2024).

Banerjee N., Sarkar A., Chakraborty S., Ghosh S., Naskar S. Fine-tuning language models for predicting the impact of events associated to financial news articles. Proceedings of Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing (FinNLP). Torino, Italia, Association for Computational Linguistics Publ., 2024, pp. 244–247. Available at: https://aclanthology.org/2024.finnlp-1.25/ (accessed: November 15, 2024).

Pasch S., Ehnes D. NLP for responsible finance: Fine-tuning transformer-based models for ESG. 2022 International Conference on Big Data (IEEE 2022), 2022, vol. 33, pp. 3532–3536.

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., Polosukhin I. Attention is all you need. Advances in Neural Information Processing Systems, 2017, vol. 30. Available at: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (accessed: November 15, 2024).

Reimers N., Gurevych I. Sentence embeddings using Siamese BERT-Networks. Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, Association for Computational Linguistics, 2019, pp. 3982–3992. https://doi.org/10.18653/v1/D19-1410.

Kuratov Y., Arkhipov M. Adaptation of deep bidirectional multilingual transformers for Russian language. arXiv preprint. arXiv: 1905.07213, 2023. http://arxiv.org/abs/1905.07213

Qi D., Bure V. M. Issledovanie investitsionnoi privlekatel'nosti na osnove klasternogo analiza [Research of investment attractiveness based on cluster analysis]. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 2023, vol. 19, iss. 2, pp. 199–211. https://doi.org/10.21638/11701/spbu10.2023.206 (In Russian)

Published

2025-05-29

How to Cite

Buzmakov, A. V., Kirpishchikov, D. A., Naidenova, Y. N., Paklina, S. N., Parshakov, P. A., Solomatin, R. I., & Sotiriadi, N. S. (2025). The modification of the SBERT language model for identifying ESG risks based on textual data from companies and supervisory activities. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 21(1), 75–91. https://doi.org/10.21638/spbu10.2025.106

Issue

Section

Computer Science