⚠️

This is a Dataset, not a Model

The following metrics do not apply: FNI Score, Deployment Options, Model Architecture

πŸ“Š

ifeval

FNI 20.2
by google Dataset

"--- license: apache-2.0 task_categories: - text-generation language: - en pretty_name: IFEval --- - **Repository:** https://github.com/google-research/google-research/tree/master/instruction_following_eval - **Paper:** https://huggingface.co/papers/2311.07911 - **Leaderboard:** https://huggingface...."

Best Scenarios

✨ Data Science

Technical Constraints

Generic Use
- Size
- Rows
Parquet Format
118 Likes

Capabilities

  • βœ… Data Science

πŸ”¬Deep Dive

Expand Details [+]

πŸ› οΈ Technical Profile

⚑ Hardware & Scale

Size
-
Total Rows
-
Files
3

🧠 Training & Env

Format
Parquet
Cleaning
Raw

🌐 Cloud & Rights

Source
huggingface
License
Apache-2.0

πŸ‘οΈ Data Preview

feature label split
example_text_1 0 train
example_text_2 1 train
example_text_3 0 test
example_text_4 1 validation
example_text_5 0 train
Showing 5 sample rows. Real-time preview requires login.

🧬 Schema & Configs

Fields

feature: string
label: int64
split: string

Dataset Card

Dataset Card for IFEval

Dataset Description

  • Repository: https://github.com/google-research/google-research/tree/master/instruction_following_eval
  • Paper: https://huggingface.co/papers/2311.07911
  • Leaderboard: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
  • Point of Contact: Le Hou

Dataset Summary

This dataset contains the prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models. It contains around 500 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. To load the dataset, run:

python
from datasets import load_dataset

ifeval = load_dataset("google/IFEval")

Supported Tasks and Leaderboards

The IFEval dataset is designed for evaluating chat or instruction fine-tuned language models and is one of the core benchmarks used in the Open LLM Leaderboard.

Languages

The data in IFEval are in English (BCP-47 en).

Dataset Structure

Data Instances

An example of the train split looks as follows:

``` { "key": 1000, "prompt": 'Write a 300+ word summary of the wikipedia page "https://en.wikipedia.org/wiki/Raymond_III,_Count_of_Tripoli". Do not use any commas and highlight at least 3 sections that has titles in markdown format, for example highlighted section part 1, highlighted section part 2, highlighted section part 3.', "instruction_id_list": [ "punctuation:no_comma", "detectable_format:number_highlighted_sections", "length_constraints:number_words", ], "kwargs": [ { "num_highlights": None, "relation": None, "num_words": None, "num_placeholders": None, "prompt_to_repeat": None, "num_bullets": None, "section_spliter": None, "num_sections": None, "capital_relation": None, "capital_frequency": None, "keywords": None, "num_paragraphs": None, "language": None, "let_relation": None, "letter": None, "let_frequency": None, "end_phrase": None, "forbidden_words": None, "keyword": None, "frequency": None, "num_sentences": None, "postscript_marker": None, "first_word": None, "nth_paragraph": None, }, { "num_highlights": 3, "relation": None, "num_words": None, "num_placeholders": None, "prompt_to_repeat": None, "num_bullets": None, "section_spliter": None, "num_sections": None, "capital_relation": None,

5,424 characters total