top of page
Date | Model | Contributors | #Params | Input Length | Score (Avg.) | GvRp | SSFD | QMsm | SQAL | Qspr | Nrtv | QALT | MuSQ | SpDg | BkSS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
05/23 | GPT-4 | ZeroSCROLLS team | - | 8K | 41.67 | 26.3 | 17.3 | 18.5 | 22.6 | 50.7 | 27.6 | 89.2 | 41.1 | 62.8 | 60.5 |
05/23 | Claude | ZeroSCROLLS team | - | 8K | 39.07 | 24.2 | 16.1 | 14.6 | 21.0 | 52.3 | 32.6 | 84.8 | 36.1 | 61.6 | 47.4 |
07/23 | Llama 2 Long | Meta | 70B | 16K | 37.71 | 26.0 | 15.0 | 20.0 | 20.9 | 52.0 | 31.7 | 82.6 | 27.3 | 55.5 | 46.2 |
05/23 | ChatGPT | ZeroSCROLLS team | - | 4K | 34.02 | 21.3 | 16.1 | 15.6 | 20.4 | 49.3 | 25.1 | 66.6 | 27.1 | 49.1 | 49.8 |
05/23 | DaVinci003 | ZeroSCROLLS team | - | 4K | 33.74 | 21.7 | 16.1 | 16.9 | 22.0 | 52.7 | 24.6 | 69.0 | 33.5 | 31.3 | 49.5 |
05/23 | Flan-UL2 | ZeroSCROLLS team | 20B | 8K | 30.62 | 16.1 | 11.5 | 13.6 | 5.7 | 56.9 | 25.5 | 75.6 | 51.3 | 36.0 | 14.0 |
05/23 | Flan-T5 | ZeroSCROLLS team | 11B | 8K | 29.90 | 17.6 | 7.8 | 11.0 | 8.0 | 48.3 | 19.3 | 75.2 | 46.8 | 48.7 | 16.4 |
09/24 | simpo_llama3 | zecheng | 29.85 | 21.1 | 13.5 | 16.7 | 18.1 | 50.1 | 24.8 | 52.2 | 19.9 | 46.1 | 36.1 | ||
08/23 | Stable Beluga 7B | yuzhenm | 4K | 23.01 | 13.0 | 13.8 | 14.6 | 17.8 | 28.3 | 15.9 | 51.8 | 18.5 | 46.9 | 9.5 | |
04/24 | por_ms | 1 | 1 | 1 | 20.45 | 21.8 | 13.6 | 15.5 | 21.0 | 21.2 | 17.7 | 47.8 | 9.3 | 33.4 | 3.4 |
09/24 | Llama 3.3-3B | Tam Doan | 3B | 2.61 | 26.1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
11/23 | GPT4 Turbo | Tam Doan | 2.40 | 24.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ||
10/23 | graph | Tam Doan | 2.37 | 23.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ||
10/23 | GPT4 | Tam Doan | 2.29 | 22.9 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ||
10/24 | RC_Llama3.2-3B Instruct | Tam Doan | 2.23 | 22.3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ||
05/23 | Naive | ZeroSCROLLS team | - | - | 19.64 | 22.6 | 6.7 | 6.7 | 10.5 | 6.1 | 2.1 | 26.6 | 20.0 | 45.0 | 50.0 |
04/24 | llama2_H2O_final | zwang | 19.41 | 15.4 | 13.2 | 14.3 | 18.3 | 20.5 | 15.0 | 43.2 | 9.5 | 40.8 | 3.8 | ||
04/24 | 3-4-open | 1 | 1 | 4k | 19.26 | 22.6 | 13.7 | 15.4 | 21.1 | 24.1 | 17.6 | 43.8 | 8.1 | 24.4 | 1.8 |
04/24 | llama2_7B_chat_ours | zwang33 | 18.96 | 15.2 | 11.9 | 14.3 | 17.9 | 19.7 | 15.1 | 42.8 | 9.9 | 39.0 | 3.6 | ||
09/23 | Stable Beluga 13B | yuzhenm | 13B | 4K | 16.84 | 6.0 | 7.4 | 12.8 | 13.3 | 20.0 | 13.4 | 47.8 | 25.0 | 14.8 | 7.9 |
08/24 | llama2-7b-chat | 1 | 1 | 4k | 15.47 | 16.4 | 7.3 | 12.0 | 15.7 | 14.1 | 10.3 | 22.2 | 10.9 | 44.0 | 1.7 |
03/24 | llama2chat_bestbase_1e-4_7top3_bs8_ratio1_gate_v3_3-pretrain-4 | 1 | 1 | 1 | 14.99 | 10.6 | 13.0 | 15.6 | 18.9 | 23.1 | 17.7 | 40.0 | 10.9 | 0.0 | 0.0 |
03/24 | 1 | 1 | 1 | 1 | 14.74 | 10.7 | 13.4 | 15.9 | 19.2 | 21.7 | 18.7 | 39.0 | 8.8 | 0.0 | 0.0 |
05/23 | T0pp | ZeroSCROLLS team | 11B | 8K | 14.34 | 7.1 | 9.6 | 7.2 | 3.9 | 25.0 | 18.7 | 21.4 | 35.3 | 15.2 | 0.0 |
Click here for a downloadable version of the leaderboard with a full breakdown of results.
The dataset abbreviations stand for: GovReport, SummScreenFD, QMSum, SQuALITY, Qasper, NarrativeQA, QuALITY, MuSiQue, SpaceDigest, BookSumSort.
Metrics details
-
Summarization tasks (GovReport, SummScreenFD, QMSum and SQuALITY) scores are given as the geometric mean of Rouge-1/2/L
-
Qasper, NarrativeQA and MuSiQue are scored by F1
-
QuALITY is scored by accuracy
-
SpaceDigest is scored with the exponential similarity as described in the paper
-
BookSumSort score is given by concordance Index
bottom of page