space FNI: 89

hf-space--enzostvs--deepsite

--- title: DeepSite v3 emoji: 🐳 colorFrom: blue colorTo: blue sdk: docker pinned: true app_port: 3000 license: mit failure_strategy: rollback short_description: Generate any application by Vibe Coding models: - deepseek-ai/DeepSeek-V3-0324 - deepseek-ai/DeepSeek-V3.2 - Qwen/Qwen3-Coder-480B-A35B-Instruct - moonshotai/Kimi-K2-Instruct - moonshotai/Kimi-K2-Instruct-0905 - zai-org/GLM-4.7 - MiniMaxAI/MiniMax-M2.1 --- DeepSite is a Vibe Coding Platform designed to make coding smarter and more ef...

dataset FNI: 85

hf-dataset--documentation-images

--- license: cc-by-nc-sa-4.0 --- HF Team: Please make sure you optimize the assets before uploading them. My favorite tool for this is https://tinypng.com/.

dataset FNI: 84

hf-dataset--hf-doc-build--doc-build

--- license: mit pretty_name: Generated Docs for HF viewer: false --- This repo contains all the docs published on https://huggingface.co/docs. The docs are generated with https://github.com/huggingface/doc-builder. <!-- comment to trigger webhook.= -->

space FNI: 84

hf-space--mteb--leaderboard

--- title: MTEB Leaderboard emoji: 🥇 colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 app_file: app.py pinned: true tags: - leaderboard startup_duration_timeout: 1h fullWidth: true license: mit short_description: Embedding Leaderboard --- Embedding Leaderboard

dataset FNI: 82

hf-dataset--kakologarchives--kakologarchives

--- pretty_name: ニコニコ実況 過去ログアーカイブ license: mit language: - ja task_categories: - text-classification --- ニコニコ実況 過去ログアーカイブは、ニコニコ実況 のサービス開始から現在までのすべての過去ログコメントを収集したデータセットです。 去る2020年12月、ニコニコ実況は ニコニコ生放送内の一公式チャンネルとしてリニューアル されました。 これに伴い、2009年11月から運用されてきた旧システムは提供終了となり(事実上のサービス終了)、torne や BRAVIA などの家電への対応が軒並み終了する中、当時の生の声が詰まった約11年分の過去ログも同時に失われることとなってしまいました。 そこで 5ch の DTV 板の住民が中心となり、旧ニコニコ実況が終了するまでに11年分の全チャンネルの過去ログをアーカイブする計画が立ち上がりました。紆余曲折あり Nekopanda 氏が約11年分のラジオや BS も含めた全チャンネルの過去ログを完璧に取得してくださったおかげで、11年分の...

space FNI: 81

hf-space--black-forest-labs--flux.1-dev

--- title: FLUX.1 [dev] emoji: 🖥️ colorFrom: yellow colorTo: pink sdk: gradio sdk_version: 5.25.2 app_file: app.py pinned: false license: mit --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

space FNI: 81

hf-space--wan-ai--wan2.2-animate

--- title: Wan2.2 Animate emoji: 👁 colorFrom: blue colorTo: yellow sdk: gradio sdk_version: 5.47.2 app_file: app.py pinned: false license: apache-2.0 short_description: Wan2.2 Animate --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

space FNI: 80

hf-space--jbilcke-hf--ai-comic-factory

--- title: AI Comic Factory emoji: 👩‍🎨 colorFrom: red colorTo: yellow sdk: docker pinned: true app_port: 3000 disable_embedding: false short_description: Create your own AI comic with a single prompt hf_oauth: true hf_oauth_expiration_minutes: 43200 hf_oauth_scopes: [inference-api] --- Last release: AI Comic Factory 1.2 The AI Comic Factory has an official website: aicomicfactory.app For more information about my other projects please check linktr.ee/FLNGR. If you like the AI Comic Factory,...

dataset FNI: 80

hf-dataset--nvidia--physicalai-robotics-gr00t-x-embodiment-sim

--- license: cc-by-4.0 task_categories: - robotics tags: - robotics --- !image/png Github Repo: Isaac GR00T N1 We provide a set of datasets used for post-training of GR00T N1. Each dataset is a collection of trajectories from different robot embodiments and tasks. | Dataset Name | #trajectories | | - | -| | bimanual_panda_gripper.Threading | 1000 | | bimanual_panda_hand.LiftTray | 1000 | | bimanual_panda_gripper.ThreePieceAssembly | 1000 | | bimanual_panda_gripper.Transport | 1000 | | bimanua...

dataset FNI: 80

hf-dataset--opendatalab--aicc

--- license: cc-by-4.0 size_categories: - n>1T task_categories: - text-generation language: - multilingual tags: - common-crawl - html-parsing - markdown - code - math --- 🔧 🔧 **Our New-Gen Html Parser MinerU-HTML** Now Realease! Paper | Project page <img src="./images/AICC_christmas_LOGO.png" width="600" /> - **[2025-12-24]** 🔥 **CC-MinerU-Code Updated!** We have updated our specialized high-quality code dataset **CC-MinerU-Code**, containing **4.58M** samples, also extracted from the ful...

space FNI: 79

hf-space--akhaliq--anycoder

--- title: AnyCoder emoji: 🏆 colorFrom: blue colorTo: purple sdk: docker app_port: 7860 pinned: false disable_embedding: false hf_oauth: true hf_oauth_expiration_minutes: 43200 hf_oauth_scopes: - manage-repos - write-discussions --- AnyCoder is a full-stack AI-powered code generator with a modern React/TypeScript frontend and FastAPI backend. Generate applications by describing them in plain English, with support for multiple AI models and one-click deployment to Hugging Face Spaces. - **Mod...

space FNI: 78

hf-space--huggingfacetb--smol-training-playbook

--- title: The Smol Training Playbook short_description: The secrets to building world-class LLMs emoji: 📚 colorFrom: blue colorTo: indigo sdk: docker pinned: false header: mini app_port: 8080 tags: - research-article-template - research paper - scientific paper - data visualization thumbnail: >- https://huggingface.co/spaces/HuggingFaceTB/smol-training-playbook/public/thumb.png --- <div align="center"> **A modern, interactive template for scientific writing** that brings papers to life with...

dataset FNI: 76

hf-dataset--fka--awesome-chatgpt-prompts

--- license: cc0-1.0 tags: - ChatGPT - prompts - AI - GPT - Claude - Gemini - Llama - Mistral - LLM - prompt-engineering - conversational-ai - text-generation - chatbot - awesome-list task_categories: - question-answering - text-generation size_categories: - 100K<n<1M --- <p align="center"> <img width="558" height="148" alt="prompts.chat logo" src="https://github.com/user-attachments/assets/8de2ba4c-5e89-4aae-aecb-32b188fb1bfb" /> <br> <small>a.k.a. Awesome ChatGPT Prompts</small> </p> This i...

dataset FNI: 76

hf-dataset--fka--prompts.chat

--- license: cc0-1.0 tags: - ChatGPT - prompts - AI - GPT - Claude - Gemini - Llama - Mistral - LLM - prompt-engineering - conversational-ai - text-generation - chatbot - awesome-list task_categories: - question-answering - text-generation size_categories: - 100K<n<1M --- <p align="center"> <img width="558" height="148" alt="prompts.chat logo" src="https://github.com/user-attachments/assets/8de2ba4c-5e89-4aae-aecb-32b188fb1bfb" /> <br> <small>a.k.a. Awesome ChatGPT Prompts</small> </p> This i...

dataset FNI: 76

Bitcoin Historical Data

Bitcoin data at 1-min intervals from select exchanges, Jan 2012 to Present

space FNI: 76

hf-space--zerogpu-aoti--wan2-2-fp8da-aoti-faster

--- title: Wan2.2 14B Fast emoji: 🎥💨 colorFrom: gray colorTo: pink sdk: gradio sdk_version: 6.1.0 app_file: app.py pinned: true short_description: generate a video from an image with a text prompt --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

dataset FNI: 76

hf-dataset--openai--gsm8k

--- annotations_creators: - crowdsourced language_creators: - crowdsourced language: - en license: - mit multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - text-generation task_ids: [] paperswithcode_id: gsm8k pretty_name: Grade School Math 8K tags: - math-word-problems dataset_info: - config_name: main features: - name: question dtype: string - name: answer dtype: string splits: - name: train num_bytes: 3963202 num_examples: 7473 - name:...

space FNI: 76

hf-space--finegrain--finegrain-image-enhancer

--- title: Finegrain Image Enhancer emoji: 🖼️🪄 colorFrom: pink colorTo: indigo sdk: gradio sdk_version: 5.27.1 python_version: 3.12 app_file: src/app.py pinned: false short_description: Clarity AI Upscaler Reproduction license: mit models: - refiners/juggernaut.reborn.sd1_5.unet - refiners/juggernaut.reborn.sd1_5.autoencoder - refiners/juggernaut.reborn.sd1_5.text_encoder - refiners/controlnet.sd1_5.tile - philz1337x/upscaler - philz1337x/embeddings - philz1337x/loras tags: - enhancer - ups...

dataset FNI: 76

hf-dataset--allenai--dolma3_mix-6t-1025-7b

--- license: odc-by task_categories: - text-generation language: - en configs: - config_name: default data_files: - split: train path: data/**/*.jsonl.zst features: - name: id dtype: string - name: text dtype: string - name: metadata dtype: string - name: source dtype: string - name: version dtype: string - name: created dtype: string - name: added dtype: string - name: doc dtype: string - name: attributes dtype: string --- For all other training use cases, including training from scratch, **...

space FNI: 76

hf-space--linoyts--qwen-image-edit-angles

--- title: Qwen Image Edit Camera Control emoji: 🎬 colorFrom: indigo colorTo: pink sdk: gradio sdk_version: 6.2.0 app_file: app.py pinned: true license: apache-2.0 short_description: Fast 4 step inference with Qwen Image Edit 2509 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

space FNI: 76

hf-space--mrfakename--z-image-turbo

--- title: Z Image Turbo emoji: 🖼️ colorFrom: yellow colorTo: yellow sdk: gradio sdk_version: 6.0.1 app_file: app.py pinned: true --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

dataset FNI: 75

Fruits-360 dataset

A dataset with 173786 images of 250 fruits, vegetables, nuts and seeds

space FNI: 75

hf-space--sczhou--codeformer

--- title: CodeFormer emoji: 🐼 colorFrom: blue colorTo: green sdk: gradio sdk_version: 6.1.0 app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

space FNI: 75

hf-space--jasperai--flux.1-dev-controlnet-upscaler

--- title: Flux.1-dev Upscaler emoji: 🔎 colorFrom: green colorTo: blue sdk: gradio sdk_version: 5.29.0 python_version: 3.12 app_file: app.py pinned: false license: other tags: - upscaler - super-resolution - controlnet - flux.1-dev license_name: flux-1-dev-non-commercial-license license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE. --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

space FNI: 75

hf-space--resembleai--chatterbox

--- title: Chatterbox TTS emoji: 🍿 colorFrom: indigo colorTo: blue sdk: gradio sdk_version: 6.3.0 app_file: app.py pinned: false short_description: Expressive Zeroshot TTS --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

space FNI: 75

hf-space--nihalgazi--text-to-speech-unlimited

--- title: Realistic Text To Speech Unlimited emoji: 🔥 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 5.28.0 app_file: app.py pinned: true short_description: Free Text-To-Speech generator with Emotion control (OpenAI) --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

dataset FNI: 75

hf-dataset--nvidia--physicalai-autonomous-vehicles

Technical specifications and benchmarking details are available on the entity page.

space FNI: 74

hf-space--dontplantoend--ugi-leaderboard

--- title: UGI Leaderboard emoji: 📢 colorFrom: gray colorTo: purple sdk: docker app_port: 8050 pinned: false license: apache-2.0 short_description: Uncensored General Intelligence Leaderboard tags: - leaderboard - submission:manual - test:private - modality:text - eval:generation - eval:safety - eval:math - language:English --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

dataset FNI: 74

hf-dataset--yibozhang2001--texverse-1k

--- license: odc-by ---

space FNI: 74

hf-space--qwen--qwen3-tts

--- title: Qwen3-TTS Demo emoji: 🎙️ colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.33.0 app_file: app.py pinned: false license: apache-2.0 suggested_hardware: zero-a10g ---

dataset FNI: 73

hf-dataset--ntu-nlp-sg--xcodeeval

--- annotations_creators: - expert-generated language: - code - en language_creators: - found - expert-generated license: - cc-by-4.0 multilinguality: - multilingual pretty_name: xCodeEval size_categories: - 1M<n<10M - 10M<n<100M source_datasets: - original tags: - programming-language - code - program-synthesis - automatic-code-repair - code-retrieval - code-translation - code-classification task_categories: - translation - token-classification - text2text-generation - text-retrieval - text-...

space FNI: 73

hf-space--not-lain--background-removal

--- title: Background Removal emoji: 🌘w🌖 colorFrom: purple colorTo: indigo sdk: gradio sdk_version: 5.35.0 app_file: app.py pinned: true license: mit --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

space FNI: 73

hf-space--ai4editing--magicquill

--- title: MagicQuill emoji: 🪶 colorFrom: purple colorTo: yellow sdk: gradio sdk_version: 4.44.1 app_file: app.py pinned: false disable_embedding: true license: cc-by-nc-4.0 --- The paper is MagicQuill: An Intelligent Interactive Image Editing System. Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference.

dataset FNI: 73

hf-dataset--wendyl21--parity-experiments

This dataset saves the oracle and parity experiment logs for adapters. Please upload them according to the following format and draft a PR. We will use them to estimate costs for each adapter on diverse agent and models. --- license: apache-2.0 ---

space FNI: 73

hf-space--gokaygokay--flux-prompt-generator

--- title: FLUX Prompt Generator emoji: 😻 colorFrom: red colorTo: blue sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: true license: apache-2.0 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

space FNI: 73

hf-space--multimodalart--qwen-image-multiple-angles-3d-camera

--- title: Qwen Image Multiple Angles 3D Camera emoji: 🎥 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 6.2.0 app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

space FNI: 73

hf-space--tonyassi--video-face-swap

--- title: Video Face Swap emoji: 👱🏻‍♀️ colorFrom: pink colorTo: indigo sdk: gradio sdk_version: 5.38.0 app_file: app.py pinned: true disable_embedding: false short_description: Video deep fake --- facefusion MIT License Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

dataset FNI: 73

hf-dataset--genrobot2025--10kh-realomin-opendata

Technical specifications and benchmarking details are available on the entity page.

space FNI: 73

hf-space--prithivmlmods--flux-lora-dlc

--- title: FLUX LoRA DLC emoji: 🥳 colorFrom: indigo colorTo: gray sdk: gradio sdk_version: 6.3.0 app_file: app.py pinned: true license: creativeml-openrail-m short_description: 270+ Impressive LoRAs for Flux.1 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

space FNI: 73

hf-space--hf-audio--open_asr_leaderboard

--- title: Open ASR Leaderboard emoji: 🏆 colorFrom: red colorTo: blue sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: true tags: - leaderboard --- ```bibtex @misc{srivastav2025openasrleaderboardreproducible, title={Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation}, author={Vaibhav Srivastav and Steven Zheng and Eric Bezzam and Eustache Le Bihan and Nithin Koluguri and Piotr Żelasko and Somshubra Majumdar and Adel Mou...

dataset FNI: 73

hf-dataset--allenai--dolma3_dolmino_mix-100b-1125

--- license: odc-by language: - en --- <img alt="Logo for Dolmino Mix" src="dolmino-mix.png" width="289px" style="margin-left:'auto' margin-right:'auto' display:'block'"> This dataset contains the high-quality pool of data considered for the second stage of Olmo 3 32B. | Source | Category | |--------|----------| | TinyMATH Mind | Math (synth) | | TinyMATH PoT | Math (synth) | | CraneMath | Math (synth) | | MegaMatt | Math (synth) | | Dolmino Math | Math (synth) | | StackEdu (FIM) | Code | | C...

dataset FNI: 73

hf-dataset--jsinowitz--snodas-snowmelt-cache

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 73

hf-dataset--dsinghvi--eval_awareness

--- license: mit tags: - eval_awareness - evals pretty_name: Eval Awareness Dataset size_categories: - 1K<n<10K --- Eval Awareness Dataset with contrastive pairs of with and without eval cues with behavioural changes across various misaligned situations. Also we provide automated scripts to create these scenarios at with lots of other codes dumped regarding suppression of eval awareness https://github.com/divyanshsinghvi/evalawareness_techniques/ Authors: @divyanshsinghvi, @Riteshbhalerao11

dataset FNI: 72

hf-dataset--banned-historical-archives--banned-historical-archives

--- size_categories: - n>1T --- 和谐历史档案馆数据集包含已录入 https://banned-historical-archives.github.io 和暂未未录入的原始文件。 - banned-historical-archives.github.io # 已录入该网站的原始数据,不定期从 github 仓库中同步 - raw # 原始文件 - config # 配置文件 - todo # 存放暂未录入网站的文件 部分报纸和图片资料存放在单独的仓库: |名称| 地址 | 状态 | |---|---|---| |参考消息|https://huggingface.co/datasets/banned-historical-archives/ckxx|未录入| |人民日报|https://huggingface.co/datasets/banned-historical-archives/rmrb|已精选重要的文章录入| |文汇报| https://huggingface.co/datasets/banned-historical-archives/...

space FNI: 72

hf-space--fffiloni--diffusers-image-outpaint

--- title: Diffusers Image Outpaint emoji: 🔅 colorFrom: gray colorTo: purple sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: true disable_embbeding: true license: apache-2.0 short_description: 'Easily expand image boundaries ' --- Check out the configuration reference at <https://huggingface.co/docs/hub/spaces-config-reference>

dataset FNI: 72

hf-dataset--xlangai--ubuntu_osworld_file_cache

--- license: apache-2.0 --- This repository serves as a file cache for the OSWorld project, providing reliable and fast access to evaluation files that were previously hosted on Google Drive. OSWorld is a scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating systems and applications. This cache repository ensures that all evaluation files are consistently accessible for research and devel...

space FNI: 72

hf-space--tongyi-mai--z-image-turbo

--- title: Z Image Turbo emoji: 🏃 colorFrom: blue colorTo: pink sdk: gradio sdk_version: 6.0.1 app_file: app.py pinned: false license: apache-2.0 disable_embedding: true --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

space FNI: 72

hf-space--innoai--edge-tts-text-to-speech

--- title: Edge TTS Text To Speech emoji: 👁 colorFrom: pink colorTo: yellow sdk: gradio sdk_version: 5.25.2 app_file: app.py pinned: false license: gpl-2.0 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

dataset FNI: 72

hf-dataset--animefans--iwara_mmd_all

--- license: other tags: - not-for-all-audiences --- > For English readers, please see the English README. > **⚠️ 内容警告:此数据集源于 Iwara.tv,其中可能包含成人内容 (NSFW/R-18)。请在合适的环境下使用。** 本项目包含了自 Iwara.tv 建站至 2025年6月15日 的所有公开视频,总大小超过 30TB。Iwara.tv 是一个以 MikuMikuDance (MMD) 视频创作为主的分享社区。此数据集旨在为计算机视觉、数据分析、推荐系统等领域的研究人员提供素材。 与视频文件配套的**元数据**(如标题、作者、标签等)已被分离至另一个仓库,请访问:ACCA225/iwara_metadata 数据集的视频文件按 的目录格式进行组织,例如 文件夹内包含了2024年6月上传的所有视频。数据集具体大小见末尾一些说明 直接在本页上方的的Files and Version选择栏进入下载 - 下载特定目录: - 下载所有目录: - 各目录大小: - 视频...

dataset FNI: 71

hf-dataset--opensqz--automathtext-v2

--- task_categories: - text-generation - question-answering language: - en - zh tags: - LLM - pretraining - finetuning - midtraining - reasoning - STEM - math size_categories: - 10B<n<100B configs: - config_name: automathtext-v2-ultra data_files: - split: train path: - nemotron_cc_high/80-90/*.parquet - nemotron_cc_high/90-100/*.parquet - nemotron_cc_medium_high/80-90/*.parquet - nemotron_cc_medium_high/90-100/*.parquet - dclm/80-90/*.parquet - dclm/90-100/*.parquet - fineweb_edu/80-90/*.parq...

dataset FNI: 70

arXiv Dataset

arXiv dataset and metadata of 1.7M+ scholarly papers across STEM

dataset FNI: 69

hf-dataset--martj42--international-football-results-from-1872-to-2017

An up-to-date dataset of over 49,000 international football results

space FNI: 69

hf-space--fishaudio--s1-mini

--- title: Fish Audio S1 emoji: 🏆 colorFrom: purple colorTo: gray sdk: gradio sdk_version: 5.32.1 app_file: app.py pinned: true license: cc-by-nc-sa-4.0 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

dataset FNI: 68

hf-dataset--taohan10200--cra5-dataset

--- license: cdla-sharing-1.0 task_categories: - time-series-forecasting - compression tags: - climate - weather - era5 - cra5 pretty_name: CRA5 ERA5 Dataset size_categories: - 1T<n<10T --- <!-- !ID-CompressAI-logo --> <a href="url"><img src="assets/CRA5LOGO.svg" align="center"></a> <div align="center"> </div> We introduce **VAEformer**, a variational autoencoder transformer designed for the extreme compression of climate data. Addressing the storage challenges of massive datasets like ERA5, ...

dataset FNI: 68

hf-dataset--enronarchive--mail

An accessible and organized copy of the Enron Email dataset. Part of the **Enron Archive**. A live mirror is available at https://mail.enroncorp.com - **151 employee mailboxes** with complete email history and attachments - **Pseudo-login system** - Browse any account; stays logged in until logged off - **Folder navigation** - Inbox, Sent Items, and custom folders per user - **Full email viewing** - Read complete emails with headers, body, and attachments - **Attachment support** - Download o...

dataset FNI: 68

hf-dataset--rajeev-gupta--hnm-search-data

--- configs: - config_name: articles data_files: - split: train path: data/raw/articles.csv - config_name: customers data_files: - split: train path: data/raw/customers.csv - config_name: transactions data_files: - split: train path: data/raw/transactions_train.csv dataset_info: - config_name: articles features: - name: article_id dtype: int64 - name: product_code dtype: int64 - name: prod_name dtype: string - name: product_type_no dtype: int64 - name: product_type_name dtype: string - name: ...

dataset FNI: 67

hf-dataset--lijiaxin0111--m3_vos

--- configs: - config_name: default data_files: - split: test path: m3vos_viewer_data_with_paths.jsonl license: apache-2.0 task_categories: - video-classification language: - en - zh tags: - CVPR2025 - video - segmentation - computer-vision - physical - M3-VOS pretty_name: M3-VOS size_categories: - n<1K --- <h2 align="center"> <a href="https://zixuan-chen.github.io/M-cube-VOS.github.io/">[CVPR 2025] M<sup>3</sup>-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation<...

dataset FNI: 67

hf-dataset--spw2000--temp_video

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 66

hf-dataset--brightonzen17--movie-m1-v3

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 66

hf-dataset--keymotek--simdata-dataset

--- license: apache-2.0 task_categories: - object-detection language: - en pretty_name: 'SimData-NuScenes: Synthetic Autonomous Driving Dataset' size_categories: - 100B<n<1T --- <!-- Provide a quick summary of the dataset. --> **SimData-NuScenes** is a large-scale synthetic dataset generated from high-fidelity simulation environments using **aiSim**. By leveraging **aiSim's** advanced physics engine and deterministic sensor modeling, we ensure that every frame maintains **high-quality visual ...

dataset FNI: 66

hf-dataset--brightonzen17--movie-m4-v5

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 66

hf-dataset--borisguo--pair_touch_13m

--- configs: - config_name: pose_data data_files: "pose_data/metadata.jsonl" - config_name: force_data data_files: "force_data/metadata.jsonl" - config_name: tacniq_gsmini data_files: "tacniq_gsmini/metadata.jsonl" - config_name: xela_9dtact data_files: "xela_9dtact/metadata.jsonl" --- Multi-modal tactile dataset with pose, force, and tactile sensor data. | Config | Description | Sensors | |--------|-------------|---------| | | Pose estimation data | tac02/xela + camera | | | Force measuremen...

dataset FNI: 66

hf-dataset--johnbosco20--community_dataset_v3

--- license: apache-2.0 tags: - robotics - community - so100 - so101 - manipulation - smolvla - lerobot community - vision-language-action - embodied-ai - cross-embodiment task_categories: - robotics language: - en size_categories: - 10M<n<100M pretty_name: Community Dataset v3 --- A large-scale robotics dataset for vision-language-action learning, featuring **791 datasets** across **46 robot types**, enabling cross-embodiment pretraining for generalist robot policies. !3 This is a **crowdsou...

dataset FNI: 66

hf-dataset--sunghong--cads-dataset

--- license: other license_name: cadsdataset license_link: https://github.com/murong-xu/CADS task_categories: - image-segmentation tags: - medical - ct - segmentation - image - 3d - whole-body - anatomy size_categories: - 10K<n<100K configs: - config_name: 0001_visceral_gc data_files: - split: all path: "0001_visceral_gc/0001_visceral_gc.csv" - config_name: 0002_visceral_sc data_files: - split: all path: "0002_visceral_sc/0002_visceral_sc.csv" - config_name: 0003_kits21 data_files: - split: a...

dataset FNI: 65

hf-dataset--brightonzen17--movie-m4-v8

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 65

hf-dataset--brightonzen17--movie-m3-v10

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 65

hf-dataset--wuzhao73--3d-adam

--- size_categories: 100K<n<1M modalities: - image - 3D language: - en pretty_name: '"3D ADAM"' tags: - computer-vision - anomaly-detection - 3D-anomaly-detection - 3D - industrial - detection - vision - anomaly - 3d-adam - advanced-manufacturing - manufacturing license: cc-by-nc-sa-4.0 --- Repository for the 3D-ADAM (3D Anomaly Detection in Additive Manufacturing) Dataset. This is the raw data for our complete dataset, separated by part-instance to allow users to utilise the dataset as desir...

dataset FNI: 65

hf-dataset--brightonzen17--movie-m4-v1

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 65

hf-dataset--trl-lib--documentation-images

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 65

hf-dataset--zhu0110--sa-med3d-140k

--- license: apache-2.0 --- SA-Med3D-140K is a large-scale, multi-modal, multi-anatomical volumetric medical image segmentation dataset. It was created to facilitate the development of general-purpose foundation models for 3D medical image segmentation. The dataset comprises 21,729 3D medical images and 143,518 corresponding masks. It was gathered from a combination of 70 public datasets and 8,128 privately licensed annotated cases from 24 hospitals. The primary task supported by this dataset...

dataset FNI: 65

hf-dataset--sazirarrwth99--droid_metadata_only

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 65

hf-dataset--brightonzen17--movie-m1-v2

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 65

hf-dataset--brightonzen17--movie-m4-v6

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 65

hf-dataset--lazlo-bleker--bridge-net

--- license: mit pretty_name: BridgeNet size_categories: - 10K<n<100K --- This is the main repository of BridgeNet, a dataset of 20,000 form-found bridge structures. Each bridge contains: - a pin-jointed equilibrium wireframe model generated with the Combinatorial Equilibrium Modeling (CEM) form-finding method - a volumetric 3D mesh obtained through force-informed materialization - rendered images from two canonical camera angles More information on this dataset can be found in the BridgeNet ...

dataset FNI: 65

hf-dataset--yqy6--slides-align

--- license: mit task_categories: - other language: - en tags: - slides-generation - human-preference - benchmark pretty_name: Slides-Align --- **Project Page** | **Paper** | **GitHub** **Slides-Align** is a human preference dataset for evaluating AI-generated slide presentations, introduced as part of the **SlidesGen-Bench** framework. It contains **1,326 human rankings** comparing presentations generated by **9 different AI slide generation products** across **7 scenario categories** and **...

dataset FNI: 65

hf-dataset--brightonzen17--movie-m2-v8

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 64

hf-dataset--warmhammer--osma-bench_dataset

--- license: cc-by-4.0 task_categories: - robotics - image-segmentation - image-text-to-text --- **Project Page** | **Paper** | **Code** OSMa-Bench (Open Semantic Mapping Benchmark) dataset is a fully automatically generated dataset for evaluating the robustness of open semantic mapping and segmentation systems under varying indoor lighting conditions and robot movement dynamics. This dataset is part of OSMa-Bench pipeline. This dataset provides simulated RGB-D and semantically annotated pose...

dataset FNI: 64

hf-dataset--thu-pacman--pcmind-2.1-kaiyuan-2b

--- license: apache-2.0 task_categories: - text-generation language: - zh - en tags: - code - math - language - sft size_categories: - n>1T --- This repository contains the complete pretraining dataset for PCMind-v2.1-Kaiyuan-2B, a leading fully open-source language model. The dataset is organized into **5 training phases**, with all phase datasets open-sourced in this repository. Our training methodology employs domain-specific mixing strategies across five primary domains: - English: Genera...

dataset FNI: 64

hf-dataset--brightonzen17--movie-m3-v8

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 64

hf-dataset--khushhiii08--idd_detection_for_yolo

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 64

hf-dataset--brightonzen17--movie-m4-v3

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 64

hf-dataset--nyuuzyou--gitee-code

--- annotations_creators: - machine-generated language_creators: - found language: - code - zh - en license: other multilinguality: - multilingual pretty_name: Gitee Code Dataset size_categories: - 100M<n<1B source_datasets: - original task_categories: - text-generation tags: - code - chinese configs: - config_name: default data_files: - split: train path: "data/*.parquet" default: true dataset_info: features: - name: code dtype: string - name: repo_name dtype: string - name: path dtype: stri...

dataset FNI: 64

hf-dataset--genesis-intelligence--assets

--- license: mit ---

dataset FNI: 64

hf-dataset--varunburde--transparent_finetune_dataset

--- license: cc-by-4.0 task_categories: - object-detection - robotics tags: - 6d-pose-estimation - transparency - megapose - bop - synthetic pretty_name: Transparent Object Pose Estimation size_categories: - 100K<n<1M --- This dataset consists of raw rendered Physically Based Rendering (PBR) data and 3D mesh assets designed for training and fine-tuning pose-estimation models, specifically adapting Megapose for transparent objects. Created by Varun Burde in November 2024 and prepared for distr...

dataset FNI: 64

hf-dataset--brightonzen17--movie-m2-v12

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 64

hf-dataset--elliotvincent--b-flair-spot

--- license: etalab-2.0 task_categories: - image-segmentation language: - en tags: - remote-sensing - earth-observation - change-detection - weak-temporal-supervision pretty_name: b-FLAIR-spot size_categories: - 10K<n<100K viewer: false --- <img src="./thumbnail.png" alt="b-FLAIR-spot" width="700"> b-FLAIR-spot is a temporal extension of the FLAIR dataset [1], mirroring b-FLAIR in SPOT-6/7 modality, focused on land cover classification in France. The dataset provides bi-temporal satellite ima...

dataset FNI: 63

hf-dataset--hexingchen--seismic-ai-data

--- license: mit size_categories: - n>1T language: - zh - en tags: - climate - code pretty_name: chuanjun --- This dataset is for AI seismology and can be used for tasks such as phase picking and polarity determination.Some datasets are download in seisbench. I am developing a unified tool for polarity determination--SeisPolarity, and I hope that like-minded individuals will join me.

dataset FNI: 63

hf-dataset--mvp-lab--llava-onevision-1.5-instruct-data

--- license: apache-2.0 task_categories: - image-text-to-text language: - en tags: - multimodal - vision-language-model - lmm - instruction-tuning - pretraining - dataset-collection - vqa - image-captioning - large-language-model configs: - config_name: CLEVR data_files: - split: train path: CLEVR/train-* - config_name: CLEVR-Math data_files: - split: train path: CLEVR-Math/train-* - config_name: Docmatix data_files: - split: train path: Docmatix/train-* - config_name: Docmatix-part-00-of-10 ...

dataset FNI: 63

hf-dataset--h1merka--tigas_dataset

--- license: mit task_categories: - image-classification language: - en tags: - ai-generated-image-detection - deepfake-detection - synthetic-image-detection - computer-vision - binary-classification - gan-detection - diffusion-model-detection size_categories: - 100K<n<1M pretty_name: TIGAS Dataset dataset_info: features: - name: image_path dtype: string - name: label dtype: int64 splits: - name: train num_examples: 128776 - name: test num_examples: 14126 --- <div align="center"> ![Dataset Si...

dataset FNI: 63

hf-dataset--svcfusion--launcher

--- license: cc ---

dataset FNI: 63

hf-dataset--tomg-group-umd--huginn-dataset

--- tags: - code - math - reasoning - llm license: other language: - en source_datasets: - HuggingFaceTB/smollm-corpus - jon-tow/starcoderdata-python-edu - ubaada/booksum-complete-cleaned - euirim/goodwiki - togethercomputer/RedPajama-Data-1T - allenai/dolma - bigcode/the-stack-v2-train-smol-ids - bigcode/starcoderdata - m-a-p/Matrix - cerebras/SlimPajama-627B - open-phi/textbooks - open-phi/textbooks_grounded - open-phi/programming_books_llama - nampdn-ai/tiny-strange-textbooks - nampdn-ai/t...

dataset FNI: 63

hf-dataset--elonelonelon--2025-challenge-demos

--- license: mit task_categories: - robotics tags: - LeRobot - v - '2' - . - '1' configs: - config_name: default data_files: data/*/*.parquet --- This dataset was created using LeRobot. - **Homepage:** [More Information Needed] - **Paper:** [More Information Needed] - **License:** mit meta/info.json: **BibTeX:**

dataset FNI: 63

hf-dataset--yanzu128--realsource-world

--- pretty_name: RealSource World size_categories: - 100B<n<1T task_categories: - robotics language: - en tags: - real-world - dual-arm - robotics manipulation - humanoid robot license: cc-by-nc-4.0 --- <div align="center"> <video controls autoplay src="https://realmanrobot.github.io/real_source_dataset/assets/real_source_video-CQfv30ls.mp4"></video> </div> RealSource World is a large-scale real-world robotics manipulation dataset collected using the RS-02 dual-arm humanoid robot. This datase...

dataset FNI: 63

hf-dataset--brightonzen17--movie-m1-v8

Technical specifications and benchmarking details are available on the entity page.

dataset FNI: 63

hf-dataset--brightonzen17--movie-m2-v10

Technical specifications and benchmarking details are available on the entity page.