Biinduct 125m Baseline
| Entity Passport | |
| Registry ID | hf-model--mohammedsabry--biinduct-125m-baseline |
| Provider | huggingface |
Cite this model
Academic & Research Attribution
@misc{hf_model__mohammedsabry__biinduct_125m_baseline,
author = {MohammedSabry},
title = {Biinduct 125m Baseline Model},
year = {2026},
howpublished = {\url{https://huggingface.co/mohammedsabry/biinduct-125m-baseline}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
Quick Commands
huggingface-cli download mohammedsabry/biinduct-125m-baseline pip install -U transformers âī¸ Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for Biinduct 125m Baseline: Semantic (S:50), Authority (A:0), Popularity (P:12), Recency (R:98), Quality (Q:65).
Verification Authority
đ What's Next?
Technical Deep Dive
Bi-Induct 125M Baseline
This repository contains the Bi-Induct 125M Baseline checkpoint from Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning.
This release corresponds to the 0.13B setting in the paper and is a research checkpoint intended for studying matched-compute pretraining, induction-style curricula, and in-context learning behavior. It is not instruction-tuned, alignment-tuned, or safety-tuned.
Variant
Natural-only pretraining baseline with no synthetic copy snippets.
Model overview
- Architecture: decoder-only Transformer
- Positional encoding: RoPE (
theta=10000) - Normalization: pre-norm residual blocks
- MLP: SwiGLU
- Attention: grouped-query / grouped key-value attention
- Precision: bfloat16 training
- Context length: 1024
- Embeddings: untied input/output embeddings
Model specification
| Field | Value |
|---|---|
| Parameters (paper label) | 0.13B |
| Layers | 12 |
| Hidden size | 768 |
| Intermediate / MLP size | 3,072 |
| Head dimension | 64 |
| Attention heads | 12 |
| KV heads | 3 |
Training data
All checkpoints in this family were pretrained on the deduplicated THE PILE in streaming / shuffled mode. A stable MD5-based hash was used to create a fixed held-out evaluation slice, with 0.2% of the corpus reserved for evaluation (roughly 0.4B tokens). Tokenization was truncated to 1024 tokens per sequence.
For the Bi-Induct variants, synthetic snippets were interleaved on top of the natural stream:
- Induction:
[S || SEP || S] - Anti-Induction:
[S || SEP || reverse(S)] - Balanced: each injection randomly chooses induction or anti-induction
The main cross-scale experiments used span length L = 20 and initial mix ratio m0 = 50%, linearly annealed to zero over the full training budget.
Training recipe
- Optimizer: AdamW (
beta1=0.9,beta2=0.999, weight decay0.1) - Learning rate: peak
1e-3 - Schedule:
3%linear warmup, then cosine decay - Update size:
2^16tokens per update - Token budget: approximately
20Ntokens following the Chinchilla-style rule of thumb - Comparison protocol: iso-FLOPs across curricula at each scale
Evaluation summary for the 125M family
The table below summarizes the main results at this scale. Standard LM benchmarks are evaluated 3-shot and Todd et al. function-style probes are evaluated 10-shot with HITS@1.
| Variant | Standard LM ICL composite â | Todd-style ICL composite â | Held-out PPL â |
|---|---|---|---|
| Baseline | 22.7 Âą 0.5 | 5.3 Âą 0.9 | 21.8 |
| Induction | 21.9 Âą 0.5 | 4.1 Âą 0.7 | 25.8 |
| Anti-Induction | 22.5 Âą 0.4 | 3.8 Âą 0.7 | 26.2 |
| Balanced | 22.4 Âą 0.6 | 5.2 Âą 0.8 | 26.2 |
This checkpoint: Baseline.
Benchmarks included
Standard LM benchmarks
- MMLU
- Winogrande
- CommonSenseQA
- PIQA
- HellaSwag
- TriviaQA-Wiki
- BBH (CoT)
- OpenBookQA
- ARC-Challenge
- GPQA
- GSM-8K
- MathQA
- BoolQ
- LAMBADA
Todd et al. function-style probes
- alphabetically first 3
- alphabetically first 5
- alphabetically last 3
- alphabetically last 5
- capitalize
- capitalize first letter
- capitalize last letter
- choose first of 3
- choose first of 5
- choose last of 3
- choose last of 5
- choose middle of 3
- choose middle of 5
- lowercase first letter
- lowercase last letter
- next capital letter
- next item
- prev item
- word length
Example usage
from transformers import AutoTokenizer, AutoModelForCausalLM
repo_id = "MohammedSabry/biinduct-125m-baseline"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
prompt = "The capital of France is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Limitations
- These are research checkpoints, not production chat models.
- They were designed to study the relationship between induction-style telemetry and load-bearing ICL behavior under matched compute.
- The synthetic interventions are intentionally lightweight and token-level; results should not be interpreted as ruling out richer data-rewrite strategies.
- Because Bi-Induct replaces a fraction of natural data under iso-FLOPs, some trade-offs may reflect natural-text displacement in addition to mechanistic redundancy.
Citation
If you use this model, please cite:
@misc{sabry2026inductionsignaturesenoughmatchedcompute,
title={Induction Signatures Are Not Enough: A Matched-Compute Study of Load-Bearing Structure in In-Context Learning},
author={Mohammed Sabry and Anya Belz},
year={2026},
eprint={2509.22947},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.22947},
}
â ī¸ Incomplete Data
Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.
View Original Source âđ Limitations & Considerations
- âĸ Benchmark scores may vary based on evaluation methodology and hardware configuration.
- âĸ VRAM requirements are estimates; actual usage depends on quantization and batch size.
- âĸ FNI scores are relative rankings and may change as new models are added.
- â License Unknown: Verify licensing terms before commercial use.
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Model Transparency Report
Technical metadata sourced from upstream repositories.
đ Identity & Source
- id
- hf-model--mohammedsabry--biinduct-125m-baseline
- slug
- mohammedsabry--biinduct-125m-baseline
- source
- huggingface
- author
- MohammedSabry
- license
- tags
- transformers, safetensors, mistral, text-generation, causal-lm, biinduct, pretraining, matched-compute, the-pile, 125m, baseline, en, arxiv:2509.22947, text-generation-inference, endpoints_compatible, region:us
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
- pipeline tag
- text-generation
đ Engagement & Metrics
- downloads
- 169
- stars
- 0
- forks
- null
Data indexed from public sources. Updated daily.