Data extraction from polymer literature using large language models

This study explores automated data extraction from polymer science literature using a combination of large language models (LLMs), including GPT-3.5, and a named entity recognition (NER)-based model, MaterialsBERT. By analyzing 2.4 million scientific articles, the framework identified 681,000 polymer-related documents, extracting over one million records for 24 polymer properties. The LLMs demonstrated superior performance in quantity and quality of data extracted compared to NER, particularly in identifying complex relationships and unique materials. While GPT-3.5 excelled in capturing intricate data, MaterialsBERT proved cost-efficient. The extracted dataset, now publicly accessible via the Polymer Scholar platform, offers significant potential for advancing polymer informatics, enabling better material property predictions and fostering the development of innovative polymers. However, challenges like high computational costs and limitations in handling non-standardized data formats highlight areas for future improvement.

For more details, please continue reading the full article under the following link:

https://www.nature.com/articles/s43246-024-00708-9


In general, if you enjoy reading this kind of scientific news articles, I am always keen to connect with fellow researchers in materials science, including the possibility to discuss about any potential interest in the Materials Square cloud-based online platform ( www.matsq.com ), designed for streamlining the execution of materials and molecular modelling simulations!

The Materials Square platform in fact provides extensive web-based functionalities for computational chemistry/materials research and training/education purposes, as detailed also in the following PDF brochure: https://www.materialssquare.com/wp-content/uploads/Materials_Square_Brochure_2022-compressed_1674181754.pdf

Many thanks for your interest and consideration,

Dr. Gabriele Mogni
Technical Consultant and EU Representative of Virtual Lab Inc., the parent company of the Materials Square platform
Website: Home | Virtual Lab Inc.
Email: gabriele@simulation.re.kr

#materials #materialsscience #materialsengineering #computationalchemistry #modelling #chemistry #researchanddevelopment #research #MaterialsSquare #ComputationalChemistry #Tutorial #DFT #simulationsoftware #simulation