โš ๏ธ

This is a Dataset, not a Model

The following metrics do not apply: FNI Score, Deployment Options, Model Architecture

๐Ÿ“Š

helpsteer2

FNI 22.4
by nvidia Dataset

"--- license: cc-by-4.0 language: - en pretty_name: HelpSteer2 size_categories: - 10K"

Best Scenarios

โœจ Data Science

Technical Constraints

Generic Use
- Size
- Rows
Parquet Format
435 Likes

Capabilities

  • โœ… Data Science

๐Ÿ”ฌDeep Dive

Expand Details [+]

๐Ÿ› ๏ธ Technical Profile

โšก Hardware & Scale

Size
-
Total Rows
-
Files
6

๐Ÿง  Training & Env

Format
Parquet
Cleaning
Raw

๐ŸŒ Cloud & Rights

Source
huggingface
License
CC-BY-4.0

๐Ÿ‘๏ธ Data Preview

feature label split
example_text_1 0 train
example_text_2 1 train
example_text_3 0 test
example_text_4 1 validation
example_text_5 0 train
Showing 5 sample rows. Real-time preview requires login.

๐Ÿงฌ Schema & Configs

Fields

feature: string
label: int64
split: string

Dataset Card

HelpSteer2: Open-source dataset for training top-performing reward models

HelpSteer2 is an open-source Helpfulness Dataset (CC-BY-4.0) that supports aligning models to become more helpful, factually correct and coherent, while being adjustable in terms of the complexity and verbosity of its responses. This dataset has been created in partnership with Scale AI.

When used to tune a Llama 3.1 70B Instruct Model, we achieve 94.1% on RewardBench, which makes it the best Reward Model as of 1 Oct 2024. This reward model is available on HuggingFace in both .nemo format at Llama-3.1-Nemotron-70B-Reward or HF-compatible format at Llama-3.1-Nemotron-70B-Reward-HF

Using this reward model for RLHF (specifically, REINFORCE), we were able to align a Llama-3.1-70B-Instruct model to reach AlpacaEval 2 LC of 57.6, Arena Hard of 85.0 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo This Instruct model is available at Llama-3.1-Nemotron-70B-Instruct as .nemo model and Llama-3.1-Nemotron-70B-Instruct-HF as a HF Transformers model.

As of 1 Oct 2024, this aligned model is #1 on all three automatic alignment benchmarks, edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

See details on HelpSteer2-Preference paper at https://arxiv.org/abs/2410.01257 - as a preview, this model can correctly the question ``How many r in strawberry?`` without specialized prompting or additional reasoning tokens:

code
A sweet question!
Letโ€™s count the โ€œRโ€s in โ€œstrawberryโ€:
  • S
  • T
  • R
  • A
  • W
  • B
  • E
  • R
  • R
  • Y
  • There are <strong>3 โ€œRโ€s</strong> in the word โ€œstrawberryโ€.

    Reward Models was trained using the open-source NeMo Aligner.

    HelpSteer2 is a follow-up to the popular HelpSteer dataset and we recommend using HelpSteer2 instead of HelpSteer.

    HelpSteer2 Paper : HelpSteer2: Open-source dataset for training top-performing reward models

    RewardBench Primary Dataset LeaderBoard

    As of 1 Oct 2024, Llama-3.1-Nemotron-70B-Reward performs best Overall on RewardBench as well as with strong performance in Chat, Safety and Reasoning categories among the models below.

    | Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning | |:-----------------------------|:----------------|:----

    24,817 characters total