🧠
Model

Molcrawl Compounds Bert Small

by Kojima Lab hf-model--kojima-lab--molcrawl-compounds-bert-small
Nexus Index
39.0 Top 100%
S: Semantic 50
A: Authority 0
P: Popularity 5
R: Recency 99
Q: Quality 65
Tech Context
Vital Performance
56 DL / 30D
0.0%
Audited 39 FNI Score
Tiny - Params
- Context
56 Downloads
Commercial APACHE License
Model Information Summary
Entity Passport
Registry ID hf-model--kojima-lab--molcrawl-compounds-bert-small
License Apache-2.0
Provider huggingface
📜

Cite this model

Academic & Research Attribution

BibTeX
@misc{hf_model__kojima_lab__molcrawl_compounds_bert_small,
  author = {Kojima Lab},
  title = {Molcrawl Compounds Bert Small Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/kojima-lab/molcrawl-compounds-bert-small}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
Kojima Lab. (2026). Molcrawl Compounds Bert Small [Model]. Free2AITools. https://huggingface.co/kojima-lab/molcrawl-compounds-bert-small

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

Quick Commands

🤗 HF Download
huggingface-cli download kojima-lab/molcrawl-compounds-bert-small

âš–ī¸ Nexus Index V2.0

39.0
TOP 100% SYSTEM IMPACT
Semantic (S) 50
Authority (A) 0
Popularity (P) 5
Recency (R) 99
Quality (Q) 65

đŸ’Ŧ Index Insight

FNI V2.0 for Molcrawl Compounds Bert Small: Semantic (S:50), Authority (A:0), Popularity (P:5), Recency (R:99), Quality (Q:65).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
---

🚀 What's Next?

Technical Deep Dive

molcrawl-compounds-bert-small

Model Description

GPT-2 small (124M parameters) foundation model pre-trained on compound SMILES strings from the MolCrawl dataset.

The tokenizer is a character-level BPE tokenizer (vocab_size=612) that encodes each SMILES character as a separate token. Input SMILES strings should be passed without spaces (e.g. CC(=O)O). The [SEP] token (id=13) is used as the end-of-sequence marker.

Datasets

Usage

python
from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch

model = AutoModelForMaskedLM.from_pretrained("kojima-lab/molcrawl-compounds-bert-small")
tokenizer = AutoTokenizer.from_pretrained("kojima-lab/molcrawl-compounds-bert-small")

# Predict masked SMILES token
prompt = "CC(=O)[MASK]"
inputs = tokenizer(prompt, return_tensors="pt")
mask_token_id = tokenizer.mask_token_id
mask_index = (inputs["input_ids"] == mask_token_id).nonzero(as_tuple=True)[1]

with torch.no_grad():
    outputs = model(**inputs)
logits = outputs.logits

predicted_token_id = logits[0, mask_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)
result = prompt.replace("[MASK]", predicted_token)
print(f"Predicted: {result}")

Training

This model was trained with the RIKEN Foundation Model pipeline. For more details, please refer to the training configuration files included in this repository.

License

This model is released under the APACHE-2.0 license.

Citation

If you use this model, please cite:

bibtex
@misc{molcrawl_compounds_bert_small,
  title={molcrawl-compounds-bert-small},
  author={{RIKEN}},
  year={2026},
  publisher={{Hugging Face}},
  url={{https://huggingface.co/kojima-lab/molcrawl-compounds-bert-small}}
}

âš ī¸ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

  • â€ĸ Benchmark scores may vary based on evaluation methodology and hardware configuration.
  • â€ĸ VRAM requirements are estimates; actual usage depends on quantization and batch size.
  • â€ĸ FNI scores are relative rankings and may change as new models are added.
  • ⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

HuggingFace Hub
56Downloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id
hf-model--kojima-lab--molcrawl-compounds-bert-small
slug
kojima-lab--molcrawl-compounds-bert-small
source
huggingface
author
Kojima Lab
license
Apache-2.0
tags
safetensors, bert, pytorch, molecule-compound, fill-mask, license:apache-2.0, region:us

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag
fill-mask

📊 Engagement & Metrics

downloads
56
stars
0
forks
0

Data indexed from public sources. Updated daily.