Fine-tuned from meta-llama/Llama-3.1-8B-Instruct to express calibrated verbal confidence for adaptive retrieval-augmented generation (RAG).
What it does
Given a factual question, the model reasons step-by-step and ends every response with exactly two lines:
The confidence score reflects the model's genuine uncertainty. At inference, a confidence below 0.5 triggers BM25 retrieval and a second-pass generation with retrieved context. This allows the model to selectively retrieve only when it needs external evidence.
Training
Base model: meta-llama/Llama-3.1-8B-Instruct
Training method: Supervised fine-tuning on QA data with confidence labels, followed by calibration to align expressed confidence with empirical accuracy
Trigger rate = fraction of questions where confidence < 0.5 triggered retrieval.
Intended use
python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("your-username/verbal-calibrate")
model = AutoModelForCausalLM.from_pretrained("your-username/verbal-calibrate")
prompt = tokenizer.apply_chat_template([{
"role": "user",
"content": (
"Answer the following factual question step by step, then state your answer "
"and how confident you are.\n\n"
"{question}\n\n"
"Your response must end with exactly these two lines:\n"
"Answer: $Answer\n"
"Confidence: $Confidence\n\n"
"Where $Confidence is a decimal between 0 and 1."
).format(question="What is the capital of France?")
}], tokenize=False, add_generation_prompt=True)
â ī¸ Incomplete Data
Some information about this model is not available.
Use with Caution - Verify details from the original source before relying on this data.