The topic of Climate Change (CC) has received limited attention in NLP despite its urgency. Activists and policymakers need NLP tools to effectively process the vast and rapidly growing unstructured textual climate reports into structured form. To tackle this challenge we introduce two new large-scale climate questionnaire datasets and use their existing structure to train self-supervised models. We conduct experiments to show that these models can learn to generalize to climate disclosures of different organizations types than seen during training. We then use these models to help align texts from unstructured climate documents to the semi-structured questionnaires in a human pilot study. Finally, to support further NLP research in the climate domain we introduce a benchmark of existing climate text classification datasets to better evaluate and compare existing models.
Towards Answering Climate Questionnaires from Unstructured Climate Reports
The Climate Change Benchmark (ClimaBench) is introduced to evaluate NLP models on diverse climate change-related natural language understanding tasks, including newly curated large-scale datasets for text classification and question-answering, with an analysis of model performance improvements through domain-specific fine-tuning.
- Year
- 2023
- Venue
- arXiv 2023
- Authors
- 4
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2301.04253v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar