spatiallm-llama-1b
⚡ Quick Commands
ollama run spatiallm-llama-1b huggingface-cli download manycore-research/spatiallm-llama-1b pip install -U transformers Engineering Specs
⚡ Hardware
🧠 Lifecycle
🌐 Identity
Est. VRAM Benchmark
~2.2GB
* Technical estimation for FP16/Q4 weights. Does not include OS overhead or long-context batching. For Technical Reference Only.
🕸️ Neural Mesh Hub
Interconnecting Research, Data & Ecosystem
🔗 Core Ecosystem
📈 Interest Trend
Real-time Trend Indexing In-Progress
* Real-time activity index across HuggingFace, GitHub and Research citations.
No similar models found.
Social Proof
🔬Technical Deep Dive
Full Specifications [+]▾
🚀 What's Next?
⚡ Quick Commands
ollama run spatiallm-llama-1b huggingface-cli download manycore-research/spatiallm-llama-1b pip install -U transformers Hardware Compatibility
Multi-Tier Validation Matrix
RTX 3060 / 4060 Ti
RTX 4070 Super
RTX 4080 / Mac M3
RTX 3090 / 4090
RTX 6000 Ada
A100 / H100
Pro Tip: Compatibility is estimated for 4-bit quantization (Q4). High-precision (FP16) or ultra-long context windows will significantly increase VRAM requirements.
README
SpatialLM-Llama-1B
Introduction
SpatialLM is a 3D large language model designed to process 3D point cloud data and generate structured 3D scene understanding outputs. These outputs include architectural elements like walls, doors, windows, and oriented object bounding boxes with their semantic categories. Unlike previous methods that require specialized equipment for data collection, SpatialLM can handle point clouds from diverse sources such as monocular video sequences, RGBD images, and LiDAR sensors. This multimodal architecture effectively bridges the gap between unstructured 3D geometric data and structured 3D representations, offering high-level semantic understanding. It enhances spatial reasoning capabilities for applications in embodied robotics, autonomous navigation, and other complex 3D scene analysis tasks.
SpatialLM reconstructs 3D layout from a monocular RGB video with MASt3R-SLAM. Results aligned to video with GT cameras for visualization.
SpatialLM Models
| Model | Download |
|---|---|
| SpatialLM-Llama-1B | 🤗 HuggingFace |
| SpatialLM-Qwen-0.5B | 🤗 HuggingFace |
Usage
Installation
Tested with the following environment:
- Python 3.11
- Pytorch 2.4.1
- CUDA Version 12.4
# clone the repository
git clone https://github.com/manycore-research/SpatialLM.git
cd SpatialLM
# create a conda environment with cuda 12.4
conda create -n spatiallm python=3.11
conda activate spatiallm
conda install -y nvidia/label/cuda-12.4.0::cuda-toolkit conda-forge::sparsehash
# Install dependencies with poetry
pip install poetry && poetry config virtualenvs.create false --local
poetry install
poe install-torchsparse # Building wheel for torchsparse will take a while
Inference
In the current version of SpatialLM, input point clouds are considered axis-aligned where the z-axis is the up axis. This orientation is crucial for maintaining consistency in spatial understanding and scene interpretation across different datasets and applications. Example preprocessed point clouds, reconstructed from RGB videos using MASt3R-SLAM, are available in SpatialLM-Testset.
Download an example point cloud:
huggingface-cli download manycore-research/SpatialLM-Testset pcd/scene0000_00.ply --repo-type dataset --local-dir .
Run inference:
python inference.py --point_cloud pcd/scene0000_00.ply --output scene0000_00.txt --model_path manycore-research/SpatialLM-Llama-1B
Visualization
Use rerun to visualize the point cloud and the predicted structured 3D layout output:
# Convert the predicted layout to Rerun format
python visualize.py --point_cloud pcd/scene0000_00.ply --layout scene0000_00.txt --save scene0000_00.rrd
# Visualize the point cloud and the predicted layout
rerun scene0000_00.rrd
Evaluation
To evaluate the performance of SpatialLM, we provide eval.py script that reports the benchmark results on the SpatialLM-Testset in the table below in section Benchmark Results.
Download the testset:
huggingface-cli download manycore-research/SpatialLM-Testset --repo-type dataset --local-dir SpatialLM-Testset
Run evaluation:
# Run inference on the PLY point clouds in folder SpatialLM-Testset/pcd with SpatialLM-Llama-1B model
python inference.py --point_cloud SpatialLM-Testset/pcd --output SpatialLM-Testset/pred --model_path manycore-research/SpatialLM-Llama-1B
# Evaluate the predicted layouts
python eval.py --metadata SpatialLM-Testset/test.csv --gt_dir SpatialLM-Testset/layout --pred_dir SpatialLM-Testset/pred --label_mapping SpatialLM-Testset/benchmark_categories.tsv
SpatialLM Testset
We provide a test set of 107 preprocessed point clouds, reconstructed from RGB videos using MASt3R-SLAM. SpatialLM-Testset is quite challenging compared to prior clean RGBD scans datasets due to the noises and occlusions in the point clouds reconstructed from monocular RGB videos.
| Dataset | Download |
|---|---|
| SpatialLM-Testset | 🤗 Datasets |
Benchmark Results
Benchmark results on the challenging SpatialLM-Testset are reported in the following table:
| Method | SpatialLM-Llama-1B | SpatialLM-Qwen-0.5B |
|---|---|---|
| Floorplan | mean IoU | |
| wall | 78.62 | 74.81 |
| Objects | F1 @.25 IoU (3D) | |
| curtain | 27.35 | 28.59 |
| nightstand | 57.47 | 54.39 |
| chandelier | 38.92 | 40.12 |
| wardrobe | 23.33 | 30.60 |
| bed | 95.24 | 93.75 |
| sofa | 65.50 | 66.15 |
| chair | 21.26 | 14.94 |
| cabinet | 8.47 | 8.44 |
| dining table | 54.26 | 56.10 |
| plants | 20.68 | 26.46 |
| tv cabinet | 33.33 | 10.26 |
| coffee table | 50.00 | 55.56 |
| side table | 7.60 | 2.17 |
| air conditioner | 20.00 | 13.04 |
| dresser | 46.67 | 23.53 |
| Thin Objects | F1 @.25 IoU (2D) | |
| painting | 50.04 | 53.81 |
| carpet | 31.76 | 45.31 |
| tv | 67.31 | 52.29 |
| door | 50.35 | 42.15 |
| window | 45.4 | 45.9 |
License
SpatialLM-Llama-1B is derived from Llama3.2-1B-Instruct, which is licensed under the Llama3.2 license. SpatialLM-Qwen-0.5B is derived from the Qwen-2.5 series, originally licensed under the Apache 2.0 License.
All models are built upon the SceneScript point cloud encoder, licensed under the CC-BY-NC-4.0 License. TorchSparse, utilized in this project, is licensed under the MIT License.
Citation
If you find this work useful, please consider citing:
@misc{spatiallm,
title = {SpatialLM: Large Language Model for Spatial Understanding},
author = {ManyCore Research Team},
howpublished = {\url{https://github.com/manycore-research/SpatialLM}},
year = {2025}
}
Acknowledgements
We would like to thank the following projects that made this work possible:
Llama3.2 | Qwen2.5 | Transformers | SceneScript | TorchSparse
10,168 chars • Full Disclosure Protocol Active
SpatialLM-Llama-1B
Introduction
SpatialLM is a 3D large language model designed to process 3D point cloud data and generate structured 3D scene understanding outputs. These outputs include architectural elements like walls, doors, windows, and oriented object bounding boxes with their semantic categories. Unlike previous methods that require specialized equipment for data collection, SpatialLM can handle point clouds from diverse sources such as monocular video sequences, RGBD images, and LiDAR sensors. This multimodal architecture effectively bridges the gap between unstructured 3D geometric data and structured 3D representations, offering high-level semantic understanding. It enhances spatial reasoning capabilities for applications in embodied robotics, autonomous navigation, and other complex 3D scene analysis tasks.
SpatialLM reconstructs 3D layout from a monocular RGB video with MASt3R-SLAM. Results aligned to video with GT cameras for visualization.
SpatialLM Models
| Model | Download |
|---|---|
| SpatialLM-Llama-1B | 🤗 HuggingFace |
| SpatialLM-Qwen-0.5B | 🤗 HuggingFace |
Usage
Installation
Tested with the following environment:
- Python 3.11
- Pytorch 2.4.1
- CUDA Version 12.4
# clone the repository
git clone https://github.com/manycore-research/SpatialLM.git
cd SpatialLM
# create a conda environment with cuda 12.4
conda create -n spatiallm python=3.11
conda activate spatiallm
conda install -y nvidia/label/cuda-12.4.0::cuda-toolkit conda-forge::sparsehash
# Install dependencies with poetry
pip install poetry && poetry config virtualenvs.create false --local
poetry install
poe install-torchsparse # Building wheel for torchsparse will take a while
Inference
In the current version of SpatialLM, input point clouds are considered axis-aligned where the z-axis is the up axis. This orientation is crucial for maintaining consistency in spatial understanding and scene interpretation across different datasets and applications. Example preprocessed point clouds, reconstructed from RGB videos using MASt3R-SLAM, are available in SpatialLM-Testset.
Download an example point cloud:
huggingface-cli download manycore-research/SpatialLM-Testset pcd/scene0000_00.ply --repo-type dataset --local-dir .
Run inference:
python inference.py --point_cloud pcd/scene0000_00.ply --output scene0000_00.txt --model_path manycore-research/SpatialLM-Llama-1B
Visualization
Use rerun to visualize the point cloud and the predicted structured 3D layout output:
# Convert the predicted layout to Rerun format
python visualize.py --point_cloud pcd/scene0000_00.ply --layout scene0000_00.txt --save scene0000_00.rrd
# Visualize the point cloud and the predicted layout
rerun scene0000_00.rrd
Evaluation
To evaluate the performance of SpatialLM, we provide eval.py script that reports the benchmark results on the SpatialLM-Testset in the table below in section Benchmark Results.
Download the testset:
huggingface-cli download manycore-research/SpatialLM-Testset --repo-type dataset --local-dir SpatialLM-Testset
Run evaluation:
# Run inference on the PLY point clouds in folder SpatialLM-Testset/pcd with SpatialLM-Llama-1B model
python inference.py --point_cloud SpatialLM-Testset/pcd --output SpatialLM-Testset/pred --model_path manycore-research/SpatialLM-Llama-1B
# Evaluate the predicted layouts
python eval.py --metadata SpatialLM-Testset/test.csv --gt_dir SpatialLM-Testset/layout --pred_dir SpatialLM-Testset/pred --label_mapping SpatialLM-Testset/benchmark_categories.tsv
SpatialLM Testset
We provide a test set of 107 preprocessed point clouds, reconstructed from RGB videos using MASt3R-SLAM. SpatialLM-Testset is quite challenging compared to prior clean RGBD scans datasets due to the noises and occlusions in the point clouds reconstructed from monocular RGB videos.
| Dataset | Download |
|---|---|
| SpatialLM-Testset | 🤗 Datasets |
Benchmark Results
Benchmark results on the challenging SpatialLM-Testset are reported in the following table:
| Method | SpatialLM-Llama-1B | SpatialLM-Qwen-0.5B |
|---|---|---|
| Floorplan | mean IoU | |
| wall | 78.62 | 74.81 |
| Objects | F1 @.25 IoU (3D) | |
| curtain | 27.35 | 28.59 |
| nightstand | 57.47 | 54.39 |
| chandelier | 38.92 | 40.12 |
| wardrobe | 23.33 | 30.60 |
| bed | 95.24 | 93.75 |
| sofa | 65.50 | 66.15 |
| chair | 21.26 | 14.94 |
| cabinet | 8.47 | 8.44 |
| dining table | 54.26 | 56.10 |
| plants | 20.68 | 26.46 |
| tv cabinet | 33.33 | 10.26 |
| coffee table | 50.00 | 55.56 |
| side table | 7.60 | 2.17 |
| air conditioner | 20.00 | 13.04 |
| dresser | 46.67 | 23.53 |
| Thin Objects | F1 @.25 IoU (2D) | |
| painting | 50.04 | 53.81 |
| carpet | 31.76 | 45.31 |
| tv | 67.31 | 52.29 |
| door | 50.35 | 42.15 |
| window | 45.4 | 45.9 |
License
SpatialLM-Llama-1B is derived from Llama3.2-1B-Instruct, which is licensed under the Llama3.2 license. SpatialLM-Qwen-0.5B is derived from the Qwen-2.5 series, originally licensed under the Apache 2.0 License.
All models are built upon the SceneScript point cloud encoder, licensed under the CC-BY-NC-4.0 License. TorchSparse, utilized in this project, is licensed under the MIT License.
Citation
If you find this work useful, please consider citing:
@misc{spatiallm,
title = {SpatialLM: Large Language Model for Spatial Understanding},
author = {ManyCore Research Team},
howpublished = {\url{https://github.com/manycore-research/SpatialLM}},
year = {2025}
}
Acknowledgements
We would like to thank the following projects that made this work possible:
Llama3.2 | Qwen2.5 | Transformers | SceneScript | TorchSparse
📝 Limitations & Considerations
- • Benchmark scores may vary based on evaluation methodology and hardware configuration.
- • VRAM requirements are estimates; actual usage depends on quantization and batch size.
- • FNI scores are relative rankings and may change as new models are added.
- ⚠ License Unknown: Verify licensing terms before commercial use.
- • Source: Unknown
Cite this model
Academic & Research Attribution
@misc{hf_model__manycore_research__spatiallm_llama_1b,
author = {manycore-research},
title = {undefined Model},
year = {2026},
howpublished = {\url{https://huggingface.co/manycore-research/spatiallm-llama-1b}},
note = {Accessed via Free2AITools Knowledge Fortress}
} AI Summary: Based on Hugging Face metadata. Not a recommendation.
🛡️ Model Transparency Report
Verified data manifest for traceability and transparency.
🆔 Identity & Source
- id
- hf-model--manycore-research--spatiallm-llama-1b
- author
- manycore-research
- tags
- transformerssafetensorsspatiallm_llamatext-generationconversationalbase_model:meta-llama/llama-3.2-1b-instructlicense:llama3.2endpoints_compatibleregion:us
⚙️ Technical Specs
- architecture
- SpatialLMLlamaForCausalLM
- params billions
- 1.25
- context length
- 4,096
- vram gb
- 2.2
- vram is estimated
- true
- vram formula
- VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)
📊 Engagement & Metrics
- likes
- 990
- downloads
- 176
Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)