🧠
Model

pg2grasp

by nagaaato hf-model--nagaaato--pg2grasp
Nexus Index
40.1 Top 100%
S: Semantic 50
A: Authority 0
P: Popularity 10
R: Recency 96
Q: Quality 65
Tech Context
Vital Performance
130 DL / 30D
0.0%
Audited 40.1 FNI Score
Tiny - Params
- Context
130 Downloads
Commercial APACHE License
Model Information Summary
Entity Passport
Registry ID hf-model--nagaaato--pg2grasp
License Apache-2.0
Provider huggingface
📜

Cite this model

Academic & Research Attribution

BibTeX
@misc{hf_model__nagaaato__pg2grasp,
  author = {nagaaato},
  title = {pg2grasp Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/nagaaato/pg2grasp}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
nagaaato. (2026). pg2grasp [Model]. Free2AITools. https://huggingface.co/nagaaato/pg2grasp

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

Quick Commands

🤗 HF Download
huggingface-cli download nagaaato/pg2grasp

âš–ī¸ Nexus Index V2.0

40.1
TOP 100% SYSTEM IMPACT
Semantic (S) 50
Authority (A) 0
Popularity (P) 10
Recency (R) 96
Quality (Q) 65

đŸ’Ŧ Index Insight

FNI V2.0 for pg2grasp: Semantic (S:50), Authority (A:0), Popularity (P:10), Recency (R:96), Quality (Q:65).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
---

🚀 What's Next?

Technical Deep Dive

PG2-Grasp — Text-Grounded Robot Grasp Prediction

Fine-tuned from google/paligemma2-10b-mix-224 to predict robot grasp centers from natural language prompts and RGB images.

This repo contains bf16 safetensors weights (fine-tuned, not base model).

Usage

BF16 (full precision)

python
from transformers import PaliGemmaForConditionalGeneration, AutoProcessor
from PIL import Image, ImageOps
import torch

model_id = "nagaaato/pg2grasp"
processor = AutoProcessor.from_pretrained(model_id)
model = PaliGemmaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    attn_implementation="sdpa",
).to("cuda").eval()

# Prepare image (square-pad to max(w, h))
image = Image.open("scene.png").convert("RGB")
w, h = image.size
max_dim = max(w, h)
pad_left = (max_dim - w) // 2
pad_top = (max_dim - h) // 2
padded = ImageOps.pad(image, (max_dim, max_dim), color=0)

# Prompt format: "Pick the ."
inputs = processor(
    images=padded,
    text="Pick the apple.",
    return_tensors="pt",
).to("cuda")

with torch.no_grad():
    outputs = model(**inputs)
    # Extract  logits (last token, 1024-way grid classification)
    loc_logits = outputs.logits[0, -1, 256000:257024]
    pred_token = loc_logits.argmax().item()

# Decode to pixel coordinates (32×32 grid)
LOC_GRID = 32
row = pred_token // LOC_GRID
col = pred_token % LOC_GRID
cx_px = (col + 0.5) / LOC_GRID * max_dim - pad_left
cy_px = (row + 0.5) / LOC_GRID * max_dim - pad_top
print(f"Grasp center: ({cx_px:.0f}, {cy_px:.0f}) px in original image")
    
  

VRAM: ~20 GB (bf16, batch_size=1)

INT8 Quantization

Requires bitsandbytes and accelerate.

python
from transformers import PaliGemmaForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
from PIL import Image, ImageOps
import torch

model_id = "nagaaato/pg2grasp"
processor = AutoProcessor.from_pretrained(model_id)
model = PaliGemmaForConditionalGeneration.from_pretrained(
    model_id,
    quantization_config=BitsAndBytesConfig(load_in_8bit=True),
    device_map="cuda",
).eval()

image = Image.open("scene.png").convert("RGB")
w, h = image.size
max_dim = max(w, h)
pad_left = (max_dim - w) // 2
pad_top = (max_dim - h) // 2
padded = ImageOps.pad(image, (max_dim, max_dim), color=0)

inputs = processor(
    images=padded,
    text="Pick the apple.",
    return_tensors="pt",
).to("cuda")

with torch.no_grad():
    outputs = model(**inputs)
    loc_logits = outputs.logits[0, -1, 256000:257024]
    pred_token = loc_logits.argmax().item()

# Same decoding as BF16
LOC_GRID = 32
row = pred_token // LOC_GRID
col = pred_token % LOC_GRID
cx_px = (col + 0.5) / LOC_GRID * max_dim - pad_left
cy_px = (row + 0.5) / LOC_GRID * max_dim - pad_top

VRAM: ~10-11 GB (INT8 quantized)

INT4 NF4 Quantization

python
from transformers import PaliGemmaForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
import torch

model_id = "nagaaato/pg2grasp"
processor = AutoProcessor.from_pretrained(model_id)
model = PaliGemmaForConditionalGeneration.from_pretrained(
    model_id,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=True,
    ),
    device_map="cuda",
).eval()

# Same usage as above

VRAM: ~5-6 GB (INT4 NF4 quantized)

Model Details

  • Base: PaliGemma2-10B (9.66B params)
    • SigLIP-SO400M vision encoder (robot-adapted from piRECAP05)
    • Gemma2-9B language model (42 decoder layers)
  • Fine-tuning: selective unfreeze on robotics grasp datasets
    • Training: FSDP on 8×A100-80GB
    • Differential LR: base layers frozen, top layers + vision backbone trainable
  • Output: single token (1024-way softmax over 32×32 grid)
  • Input resolution: 224×224 (square-padded with black borders)

Eval Results (validation set)

Metric Score
canonical_median 45.1 px
best_grasp_median 14.1 px
W100 (within 100px) 80.0%
grounding margin 9.04
grounding top1 89.8%

Notes

  • The repository contains bf16 weights only. Quantization happens at load time via BitsAndBytesConfig — weights remain bf16 on disk.
  • Prompt format is strict: "<image>Pick the <object>." with period.
  • Image padding is required: square-pad with black, compute offset to decode back to original space.
  • LOC tokens: indices 256000–257023 (1024 loc tokens for 32×32 grid).

âš ī¸ Incomplete Data

Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.

View Original Source →

📝 Limitations & Considerations

  • â€ĸ Benchmark scores may vary based on evaluation methodology and hardware configuration.
  • â€ĸ VRAM requirements are estimates; actual usage depends on quantization and batch size.
  • â€ĸ FNI scores are relative rankings and may change as new models are added.
  • ⚠ License Unknown: Verify licensing terms before commercial use.

Social Proof

HuggingFace Hub
130Downloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Model Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id
hf-model--nagaaato--pg2grasp
slug
nagaaato--pg2grasp
source
huggingface
author
nagaaato
license
Apache-2.0
tags
safetensors, paligemma, robotics, grasp-prediction, vision-language-model, image-to-text, en, base_model:google/paligemma2-10b-mix-224, base_model:finetune:google/paligemma2-10b-mix-224, license:apache-2.0, region:us

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag
image-to-text

📊 Engagement & Metrics

downloads
130
stars
0
forks
null

Data indexed from public sources. Updated daily.