🧠

Model

Ca Reward 3b Ja

Name: Ca Reward 3b Ja
Author: cyberagent

by cyberagent hf-model--cyberagent--ca-reward-3b-ja

Nexus Index

38.4 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 6

R: Recency 90

Q: Quality 65

Tech Context

3 Params

4.096K Ctx

Vital Performance

62 DL / 30D

0.0%

Source →

Audited 38.4 FNI Score

3B Params

4k Context

62 Downloads

8G GPU ~4GB Est. VRAM

Commercial APACHE License

Model Information Summary
Entity Passport
Registry ID	hf-model--cyberagent--ca-reward-3b-ja
License	Apache-2.0
Provider	huggingface

💾

Compute Threshold

~3.5GB VRAM

Interactive

Analyze Hardware

Hardware Compatibility Test

▼

* Static estimation for 4-Bit Quantization.

📜

Cite this model

Academic & Research Attribution

BibTeX

@misc{hf_model__cyberagent__ca_reward_3b_ja,
  author = {cyberagent},
  title = {Ca Reward 3b Ja Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/cyberagent/ca-reward-3b-ja}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

cyberagent. (2026). Ca Reward 3b Ja [Model]. Free2AITools. https://huggingface.co/cyberagent/ca-reward-3b-ja

🔬Technical Deep Dive

Full Specifications [+]

Quick Commands

🦙 Ollama Run

ollama run ca-reward-3b-ja

🤗 HF Download

huggingface-cli download cyberagent/ca-reward-3b-ja

⚖️ Nexus Index V2.0

Methodology Index Protocol

38.4

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 6

Recency (R) 90

Quality (Q) 65

💬 Index Insight

FNI V2.0 for Ca Reward 3b Ja: Semantic (S:50), Authority (A:0), Popularity (P:6), Recency (R:90), Quality (Q:65).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

---

🚀 What's Next?

📊

Find Training Datasets

Discover datasets compatible with this model

📈

Compare Benchmarks

See how this model ranks on standard tests

⚡

Deployment Guide

Understand deployment options

Technical Deep Dive

cyberagent/ca-reward-3b-ja

軽量な日本語報酬モデルの開発を目的として実装したモデルを公開する。
既存の指示文と新たに合成した指示文に対して、応答文を複数生成し、llm-as-a-judgeで疑似選好ラベルを付与することで疑似選好データセットを作成した。
上記の疑似選好データセットを分類するモデルを学習することで、指示文に対する応答文の好ましさを定量化する報酬モデルを作成した。

評価

人手で選好ラベル（好ましい応答文か、好ましくない応答文）が付与された既存のデータセットを収集し、選好ラベルの分類精度(Accuracy)を評価した。
既存の報酬モデルによる分類精度と、gpt-4o-2024-08-06を用いたllm-as-a-judgeによる分類精度を記載した。

	HelpSteer3	llm-jp-chatbot-arena
OpenAssistant/reward-model-deberta-v3-large-v2	0.5124	0.5610
Skywork/Skywork-Reward-Gemma-2-27B-v0.2	0.6854	0.5166
cyberagent/ca-reward-3b-ja	0.7032	0.5366
cyberagent/calm3-22b-chat-selfimprove-experimental	0.7216	0.6075
gpt-4o-2024-08-06	0.7845	0.6842

環境構築

開発環境: Ubuntu 24.04.2 LTS
ライブラリのインストール

jsx

pip install -U -q pip
pip install -q torch==2.8.0
pip install -q transformers==4.51.3
pip install -q accelerate==0.29.3
pip install -q sentencepiece==0.2.0

バージョンの確認

jsx

import torch, transformers, accelerate, tokenizers, sentencepiece
print(torch.__version__)
print(transformers.__version__)
print(accelerate.__version__)
print(tokenizers.__version__)
print(sentencepiece.__version__)

# 2.8.0+cu128
# 4.51.3
# 0.29.3
# 0.21.4
# 0.2.0

コードサンプル

python

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained(
    "cyberagent/ca-reward-3b-ja",
    device_map="auto",
    num_labels=1,
)
tokenizer = AutoTokenizer.from_pretrained("cyberagent/ca-reward-3b-ja")

prompt = """手軽に栄養を補給できる食事を教えてください。"""
response1 = """栄養補給が手軽にできるものとしては、野菜たっぷりのスムージー、ゆで卵とサラダ、納豆ご飯などがおすすめです。特に納豆は手間なく良質なタンパク質が摂れますよ。お身体を大切にしてくださいね。"""
response2 = """手軽な栄養補給なら冷凍食品でいいと思います。レンジで温めるだけだし、時間がない時はコンビニ弁当でも悪くないですよ。"""

chat1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
chat2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]

chat1_formatted = tokenizer.apply_chat_template(chat1, tokenize=False)
chat2_formatted = tokenizer.apply_chat_template(chat2, tokenize=False)
chat1_tokenized = tokenizer(chat1_formatted, return_tensors="pt", max_length=4096, truncation=True, padding="max_length",).to(model.device)
chat2_tokenized = tokenizer(chat2_formatted, return_tensors="pt", max_length=4096, truncation=True, padding="max_length",).to(model.device)

with torch.no_grad():
    chat1_score = model(**chat1_tokenized).logits.item()
    chat2_score = model(**chat2_tokenized).logits.item()

print(f"Score for response 1: {chat1_score}")
print(f"Score for response 2: {chat2_score}")

# Score for response 1: 1.2595189809799194
# Score for response 2: -2.917454242706299

学習モデル

sbintuitions/sarashina2.2-3b-instruct-v0.1をベースモデルに用いた。

学習データセット

ChatBotArenaのinstruction文と、社内で作成したデータセットを用いた。
cyberagent/calm3-22b-chat-selfimprove-experimentalによるLLM-as-a-judgeを用いて、作成したデータセットに対して疑似選好ラベルを付与して、疑似選好データセットを作成した。

報酬モデル単体の分類性能の評価に使用したデータセット

評価データセット名	サンプル数	備考
llm-jp/llm-jp-chatbot-arena-conversations	448	アノテーション結果がどちらも悪い、どちらも良い、同程度(winnerカラムがtie, tie(both good), tie(both bad))のサンプルは評価から除いた。シングルターンのみの学習データで訓練したため、評価にはシングルターンのサンプルのみ使用した。
nvidia/HelpSteer3	283	日本語性能を測るため、日本語サンプル(languageカラム=japanese)のみ評価に使用した。学習データセットにgemmaによって生成された応答文が含まれると記載されていたため、学習には用いず、train/validation split両方の分類性能の平均値を評価に使用した。シングルターンのみの学習データで訓練したため、評価にはシングルターンのサンプルのみ使用した。

学習時のハイパーパラメータ

learning_rateは[8e-07, 1e-06, 2.5e-06, 5e-06]で学習し、learning_rate=5e-06のモデルを採用した。

	training type	batchsize	learning_rate	lr_scheduler	num_epoch	max_length	num_of_training_sample	num_of_validation_sample	hardware
ca-reward-3b-ja	full parameter tuning	32	5e-06	linear	1	4096	601,348	8,000	NVIDIA_A100_80GB

リリース

v0.1, 2025/08/08

Authors

三橋亮太（corresponding author）
開発中にフィードバックをいただいた皆様：陣内佑、坂本充生、森村哲郎、阿部拳之、蟻生開人、藤本悠雅、暮石航大

引用

tex

@misc{cyberagent-ca-reward-3b-ja,
      title={cyberagent/ca-reward-3b-ja},
      url={https://huggingface.co/cyberagent/ca-reward-3b-ja},
      author={Ryota Mitsuhashi},
      year={2025},
}

ライセンス

Apache-2.0

学習データの一部にCC-BY-4.0ライセンスのcyberagent/calm2-7b-chat-dpo-experimentalの応答文を使用した

⚠️ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

• Benchmark scores may vary based on evaluation methodology and hardware configuration.
• VRAM requirements are estimates; actual usage depends on quantization and batch size.
• FNI scores are relative rankings and may change as new models are added.
⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

HuggingFace Hub

62Downloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-model--cyberagent--ca-reward-3b-ja
slug: cyberagent--ca-reward-3b-ja
source: huggingface
author: cyberagent
license: Apache-2.0
tags: safetensors, llama, text-classification, ja, license:apache-2.0, region:us

⚙️ Technical Specs

architecture: null
params billions: 3
context length: 4,096
pipeline tag: text-classification
vram gb: 3.5
vram is estimated: true
vram formula: VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

downloads: 62
stars: 0
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!