FishNALM-20L_H3K27me3 FishNALM-20L_H3K27me3 is a fine-tuned version of FishNALM-20L_pretrain for H3K27me3 prediction in fish genomics.
Model description This repository contains a task-specific fine-tuned checkpoint from the FishNALM model family. The model was initialized from the pretrained base model FishNALM-20L_pretrain and then fine-tuned for H3K27me3 prediction.
Task Task name : H3K27me3 predictionTask type : binary classificationPrediction target : H3K27me3-positive vs H3K27me3-negative genomic sequences
Examples:
CTCF TFBS prediction
Pou5f1 TFBS prediction
Sox2 TFBS prediction
histone modification prediction
promoter prediction
splice donor prediction
splice acceptor prediction
splice classification
Base model
Base model repository : xia-lab/FishNALM-20L_pretrain
Model family : FishNALM
Initialization type : pretrained checkpoint + downstream fine-tuning
Training data This model was fine-tuned on H3K27me3 prediction data from FishGUE.
Evaluation
Primary metric : MCC
Evaluation split / strategy : predefined train/validation/test split
Intended uses This model is intended for:
fish genomics sequence classification
downstream task inference on sequences similar to the fine-tuning setting
comparative benchmarking within fish genomic prediction tasks
Limitations
This is a task-specific fine-tuned model and should be used within the scope of H3K27me3 prediction.
Generalization to other species, tasks, or sequence lengths may be limited.
This is a research model and is not intended for clinical or diagnostic use.
How to use Load tokenizer and model
python
Copy
from transformers import AutoTokenizer, AutoModelForSequenceClassification
repo_name = "xia-lab/FishNALM-20L_H3K27me3"
tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModelForSequenceClassification.from_pretrained(repo_name)
Example inference
python
Copy
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
repo_name = "xia-lab/FishNALM-20L_H3K27me3"
sequence = "ATGCGTACGTTAGCTAGCTAGCTAGCTAGCTA"
tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModelForSequenceClassification.from_pretrained(repo_name)
inputs = tokenizer(
sequence,
return_tensors="pt",
truncation=True,
padding="max_length",
max_length=512,
)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = torch.softmax(logits, dim=-1)
prediction = torch.argmax(probabilities, dim=-1)
print("logits:", logits)
print("probabilities:", probabilities)
print("prediction:", prediction)
Label mapping
Files in this repository Typical files in this repository may include:
config.json
model.safetensors
tokenizer.json
tokenizer_config.json
special_tokens_map.json
vocab.txt
README.md
Citation If you use this model, please cite the FishNALM manuscript.
For questions, please contact: [email protected]