Here we maintain the Transformer classifiers trained on the PARTYPRESS database (Erfort et al. 2023).
The models are fine-tuned in seven languages on texts from nine countries (Austria, Denmark, Germany, Ireland, Netherlands, Poland, Spain, Sweden, UK).
For the downstream task of classyfing press releases from political parties into 23 unique policy areas, we achieve a performance comparable to expert human coders.
The PARTYPRESS models have a supervised component. This means, they were fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP):
Code | Issue |
---|---|
1 | Macroeconomics |
2 | Civil Rights |
3 | Health |
4 | Agriculture |
5 | Labor |
6 | Education |
7 | Environment |
8 | Energy |
9 | Immigration |
10 | Transportation |
12 | Law and Crime |
13 | Social Welfare |
14 | Housing |
15 | Domestic Commerce |
16 | Defense |
17 | Technology |
18 | Foreign Trade |
19.1 | International Affairs |
19.2 | European Union |
20 | Government Operations |
23 | Culture |
98 | Non-thematic |
99 | Other |
There are both monolingual models for each of the countries covered by the PARTYPRESS database, and a multilingual model trained press releases from all countries. The models can be easily extended to other languages, country contexts, or time periods by fine-tuning it with minimal additional labeled texts.
The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts.
The classification can then be used to measure which issues parties are discussing in their communication.
@article{erfort_partypress_2023,
author = {Cornelius Erfort and
Lukas F. Stoetzer and
Heike Klüver},
title = {The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases},
journal = {Research and Politics},
volume = {forthcoming},
year = {2023},
}
Github: cornelius-erfort/partypress
Research and Politics Dataverse: Replication Data for: The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases
Research for this contribution is part of the Cluster of Excellence "Contestations of the Liberal Script" (EXC 2055, Project-ID: 390715649), funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy. Cornelius Erfort is moreover grateful for generous funding provided by the DFG through the Research Training Group DYNAMICS (GRK 2458/1).
Cornelius Erfort
Humboldt-Universität zu Berlin