📄

Paper

Paper 2511.10648

by Jiahao Wang arxiv-paper--2511.10648

Nexus Index

0.0 Top 18%

S: Semantic 50

A: Authority 0

P: Popularity 0

R: Recency 0

Q: Quality 0

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

High Impact - Citations

2025 Year

ArXiv Venue

Top 18% FNI Rank

Paper Information Summary
Entity Passport
Registry ID	arxiv-paper--2511.10648
Provider	arXiv

📜

Cite this paper

Academic & Research Attribution

BibTeX

@misc{arxiv_paper__2511.10648,
  author = {Jiahao Wang},
  title = {Paper 2511.10648 Paper},
  year = {2026},
  howpublished = {\url{https://arxiv.org/abs/2511.10648v1}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

Jiahao Wang. (2026). Paper 2511.10648 [Paper]. Free2AITools. https://arxiv.org/abs/2511.10648v1

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

0.0

TOP 18% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 0

Recency (R) 0

Quality (Q) 0

💬 Index Insight

FNI V2.0 for Paper 2511.10648: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

📝 Executive Summary

"Outcome-reward reinforcement learning (RL) is a common and increasingly significant way to refine the step-by-step reasoning of multimodal large language models (MLLMs). In the multiple-choice setting - a dominant format for multimodal reasoning benchmarks - the paradigm faces a significant yet often overlooked obstacle: unfaithful trajectories that guess the correct option after a faulty chain of thought receive the same reward as genuine reasoning, which is a flaw that cannot be ignored. We..."

❝ Cite Node

@article{Wang2025ArXiv,
  title={ArXiv 2511.10648 Technical Profile},
  author={Jiahao Wang and Weiye Xu and Aijun Yang and Wengang Zhou and Lewei Lu and Houqiang Li and Xiaohua Wang and Jinguo Zhu},
  journal={arXiv preprint arXiv:arxiv-paper--2511.10648},
  year={2025}
}

👥 Collaborating Minds

Jiahao Wang Weiye Xu Aijun Yang Wengang Zhou Lewei Lu Houqiang Li Xiaohua Wang Jinguo Zhu

Abstract & Analysis

Outcome-reward reinforcement learning (RL) is a common and increasingly significant way to refine the step-by-step reasoning of multimodal large language models (MLLMs). In the multiple-choice setting - a dominant format for multimodal reasoning benchmarks - the paradigm faces a significant yet often overlooked obstacle: unfaithful trajectories that guess the correct option after a faulty chain of thought receive the same reward as genuine reasoning, which is a flaw that cannot be ignored. We propose Self-Consistency Sampling (SCS) to correct this issue. For each question, SCS (i) introduces small visual perturbations and (ii) performs repeated truncation and resampling of an initial trajectory; agreement among the resulting trajectories yields a differentiable consistency score that down-weights unreliable traces during policy updates. Based on Qwen2.5-VL-7B-Instruct, plugging SCS into RLOO, GRPO, and REINFORCE++ series improves accuracy by up to 7.7 percentage points on six multimodal benchmarks with negligible extra computation. SCS also yields notable gains on both Qwen2.5-VL-3B-Instruct and InternVL3-8B, offering a simple, general remedy for outcome-reward RL in MLLMs.

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Paper Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id: arxiv-paper--2511.10648
author: Jiahao Wang
tags: arxiv:cs.CVllm

⚙️ Technical Specs

architecture: null
params billions: null
context length: null

📊 Engagement & Metrics

likes: 0
downloads: 0

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!