πŸ“„
Paper

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

by Wenhao Lan arxiv/2604.27019
Free2AITools Nexus Index
38.5
S: Semantic 50

Query-time baseline · scored live at search

A: Authority 0
P: Popularity 0
R: Recency 92
Q: Quality 60
Tech Context
Vital Performance

Safety-aligned language models must refuse harmful requests without broad over-refusal, but it remains unclear how dynamic adversarial fine-tuning changes refusal-control carriers: Kullback--Leibler (KL)-constrained directions or small subspaces that causally modulate refusal without large safe-prompt distribution shifts. We study a 7B backbone under supervised fine-tuning (SFT) and Robust Refusal Dynamic Defense (R2D2), aligning HarmBench, StrongREJECT, and XSTest evaluations with five-ancho...

- Citations
Paper Information Summary
Entity Passport
Registry ID 2604.27019
License arXiv
Provider arxiv
πŸ“œ

Cite this paper

Academic & Research Attribution

BibTeX
@misc{arxiv_2604_27019,
  author = {Wenhao Lan},
  title = {Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry Paper},
  year = {2026},
  howpublished = {\url{https://arxiv.org/abs/2604.27019}},
  note = {Accessed via Free2AITools.}
}
APA Style
Wenhao Lan. (2026). Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry [Paper]. Free2AITools. https://arxiv.org/abs/2604.27019

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Free2AITools Nexus Index V2.0

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 0
Popularity (P) 0
Recency (R) 92
Quality (Q) 60

πŸ’¬ Index Insight

FNI V2.0 for Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry: Authority (A:0), Popularity (P:0), Recency (R:92), Quality (Q:60). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

Open data Updated: Live data

πŸ“ Executive Summary

"Safety-aligned language models must refuse harmful requests without broad over-refusal, but it remains unclear how dynamic adversarial fine-tuning changes refusal-control carriers: Kullback--Leibler (KL)-constrained directions or small subspaces that causally modulate refusal without large safe-prompt distribution shifts. We study a 7B backbone under supervised fine-tuning (SFT) and Robust Refusal Dynamic Defense (R2D2), aligning HarmBench, StrongREJECT, and XSTest evaluations with five-ancho..."

❝ Cite Node

@article{Lan2026Dynamic,
  title={Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry},
  author={Wenhao Lan},
  journal={arXiv preprint arXiv:2604.27019},
  year={2026}
}

πŸ‘₯ Collaborating Minds

Wenhao Lan

πŸ”— Full Paper

Free2AITools indexes the abstract and factual metadata for this paper. Read the complete, authoritative paper on the official source.

Read the full paper on arXiv

πŸ“Š Research Signals

πŸ“…1970Published
⏱️92RecencyFNI pillar
βœ…60QualityFNI pillar
πŸ—‚οΈcs.LGField
πŸ”„ Updated daily

Source summary: Based on arXiv metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Paper Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

πŸ†” Identity & Source

id
2604.27019
slug
2604.27019
source
arxiv
author
Wenhao Lan
license
arXiv
tags
arxiv:cs.LG, arxiv:cs.CL, arxiv:cs.CR, gan

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

πŸ“Š Engagement & Metrics

downloads
0
stars
null
forks
null

Data indexed from public sources. Updated daily.