Evaluating German T5 Models on GermEval 2014 (NER)

This repository presents an on-going evaluation of German T5 models on the GermEval 2014 NER downstream task.

Changelog

03.02.2023: Initial version.

Fine-Tuning T5

A few approaches exist for fine-tuning T5 models for token classification tasks:

These approaches tackle the token classification task as a sequence-to-sequence task.

However, it is also possible to use obly the encoder of a T5 model for downstream tasks as presented in:

"EncT5: A Framework for Fine-tuning T5 as Non-autoregressive Models"

The proposed "EncT5" architecture was not evaluated on token classification tasks.

This repository uses the Flair library and encoder-only fine-tuning is performed for the GermEval 2014 NER dataset. The recently released T5 models for German are used as LM backbones.

Results

We perform a basic hyper-parameter search over and report micro F1-Score, averaged over 5 runs (with different seeds). Score in brackets indicates result on development split.

Model Size	Configuration	Run 1	Run 2	Run 3	Run 4	Run 5	Avg.
Small	`bs16-e10-lr0.00011`	(87.24) / 85.53	(86.40) / 85.63	(86.50) / 85.47	(86.32) / 85.57	(86.77) / 85.38	(86.65) / 85.52
Large	`bs16-e10-lr0.00011`	(87.16) / 86.46	(87.07) / 85.76	(87.46) / 85.57	(87.05) / 86.91	(87.15) / 86.11	(87.18) / 86.16

For hyper-parameter search, the script flair-fine-tuner.py is used in combination with a configuration file (passed as argument). All configuration files are located under ./configs that were used for the experiments here.

Baselines:

Fine-tuned DistilBERT models reports (86.84) / 85.62.
GELECTRA Base reports 86.02 on test set.
Current SOTA is GELECTRA Large with 88.95 on test set.

Hardware/Requirements

Latest Flair version (commit 6da65a4) is used for experiments.

All models are fine-tuned on A10 (24GB) instances from Lambda Cloud.

germeval-ner-t5

Commits

configs: rename base to small (according to model name)

logs: fix parsing script

logs: add initial version of flair log parser

readme: clarify test set result

readme: add more useful links

readme: add baseline and hardware/requirement sections

README