mt5-small-prompted-germanquad-1
This model is a fine-tuned version of google/mt5-small on an philschmid/prompted-germanquad dataset. A prompt datasets using the BigScience PromptSource library. The dataset is a copy of germanquad with applying the squad template and translated it to german. TEMPLATE.
This is a first test if it is possible to fine-tune mt5 models to solve similar tasks than T0 of big science but for the German language.
It achieves the following results on the evaluation set:
- Loss: 1.6835
- Rouge1: 27.7309
- Rouge2: 18.7311
- Rougel: 27.4704
- Rougelsum: 27.4818
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5.6e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 7
Training results
| Training Loss |
Epoch |
Step |
Validation Loss |
Rouge1 |
Rouge2 |
Rougel |
Rougelsum |
| 3.3795 |
1.0 |
17496 |
2.0693 |
15.8652 |
9.2569 |
15.6237 |
15.6142 |
| 2.3582 |
2.0 |
34992 |
1.9057 |
21.9348 |
14.0057 |
21.6769 |
21.6825 |
| 2.1809 |
3.0 |
52488 |
1.8143 |
24.3401 |
16.0354 |
24.0862 |
24.0914 |
| 2.0721 |
4.0 |
69984 |
1.7563 |
25.8672 |
17.2442 |
25.5854 |
25.6051 |
| 2.0004 |
5.0 |
87480 |
1.7152 |
27.0275 |
18.0548 |
26.7561 |
26.7685 |
| 1.9531 |
6.0 |
104976 |
1.6939 |
27.4702 |
18.5156 |
27.2027 |
27.2107 |
| 1.9218 |
7.0 |
122472 |
1.6835 |
27.7309 |
18.7311 |
27.4704 |
27.4818 |
Framework versions
- Transformers 4.14.1
- Pytorch 1.10.1+cu102
- Datasets 1.16.1
- Tokenizers 0.10.3