hf-space--enzostvs--deepsite
--- title: DeepSite v3 emoji: 🐳 colorFrom: blue colorTo: blue sdk: docker pinned: true app_port: 3000 license: mit failure_strategy: rollback short_description: Generate any application by Vibe Coding models: - deepseek-ai/DeepSeek-V3-0324 - deepseek-ai/DeepSeek-V3.2 - Qwen/Qwen3-Coder-480B-A35B-Instruct - moonshotai/Kimi-K2-Instruct - moonshotai/Kimi-K2-Instruct-0905 - zai-org/GLM-4.7 - MiniMaxAI/MiniMax-M2.1 --- DeepSite is a Vibe Coding Platform designed to make coding smarter and more ef...
hf-dataset--documentation-images
--- license: cc-by-nc-sa-4.0 --- HF Team: Please make sure you optimize the assets before uploading them. My favorite tool for this is https://tinypng.com/.
hf-dataset--hf-doc-build--doc-build
--- license: mit pretty_name: Generated Docs for HF viewer: false --- This repo contains all the docs published on https://huggingface.co/docs. The docs are generated with https://github.com/huggingface/doc-builder. <!-- comment to trigger webhook.= -->
hf-space--mteb--leaderboard
--- title: MTEB Leaderboard emoji: 🥇 colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 app_file: app.py pinned: true tags: - leaderboard startup_duration_timeout: 1h fullWidth: true license: mit short_description: Embedding Leaderboard --- Embedding Leaderboard
hf-dataset--kakologarchives--kakologarchives
--- pretty_name: ニコニコ実況 過去ログアーカイブ license: mit language: - ja task_categories: - text-classification --- ニコニコ実況 過去ログアーカイブは、ニコニコ実況 のサービス開始から現在までのすべての過去ログコメントを収集したデータセットです。 去る2020年12月、ニコニコ実況は ニコニコ生放送内の一公式チャンネルとしてリニューアル されました。 これに伴い、2009年11月から運用されてきた旧システムは提供終了となり(事実上のサービス終了)、torne や BRAVIA などの家電への対応が軒並み終了する中、当時の生の声が詰まった約11年分の過去ログも同時に失われることとなってしまいました。 そこで 5ch の DTV 板の住民が中心となり、旧ニコニコ実況が終了するまでに11年分の全チャンネルの過去ログをアーカイブする計画が立ち上がりました。紆余曲折あり Nekopanda 氏が約11年分のラジオや BS も含めた全チャンネルの過去ログを完璧に取得してくださったおかげで、11年分の...
hf-space--black-forest-labs--flux.1-dev
--- title: FLUX.1 [dev] emoji: 🖥️ colorFrom: yellow colorTo: pink sdk: gradio sdk_version: 5.25.2 app_file: app.py pinned: false license: mit --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-space--wan-ai--wan2.2-animate
--- title: Wan2.2 Animate emoji: 👁 colorFrom: blue colorTo: yellow sdk: gradio sdk_version: 5.47.2 app_file: app.py pinned: false license: apache-2.0 short_description: Wan2.2 Animate --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-space--jbilcke-hf--ai-comic-factory
--- title: AI Comic Factory emoji: 👩🎨 colorFrom: red colorTo: yellow sdk: docker pinned: true app_port: 3000 disable_embedding: false short_description: Create your own AI comic with a single prompt hf_oauth: true hf_oauth_expiration_minutes: 43200 hf_oauth_scopes: [inference-api] --- Last release: AI Comic Factory 1.2 The AI Comic Factory has an official website: aicomicfactory.app For more information about my other projects please check linktr.ee/FLNGR. If you like the AI Comic Factory,...
hf-dataset--nvidia--physicalai-robotics-gr00t-x-embodiment-sim
--- license: cc-by-4.0 task_categories: - robotics tags: - robotics --- !image/png Github Repo: Isaac GR00T N1 We provide a set of datasets used for post-training of GR00T N1. Each dataset is a collection of trajectories from different robot embodiments and tasks. | Dataset Name | #trajectories | | - | -| | bimanual_panda_gripper.Threading | 1000 | | bimanual_panda_hand.LiftTray | 1000 | | bimanual_panda_gripper.ThreePieceAssembly | 1000 | | bimanual_panda_gripper.Transport | 1000 | | bimanua...
hf-dataset--opendatalab--aicc
--- license: cc-by-4.0 size_categories: - n>1T task_categories: - text-generation language: - multilingual tags: - common-crawl - html-parsing - markdown - code - math --- 🔧 🔧 **Our New-Gen Html Parser MinerU-HTML** Now Realease! Paper | Project page <img src="./images/AICC_christmas_LOGO.png" width="600" /> - **[2025-12-24]** 🔥 **CC-MinerU-Code Updated!** We have updated our specialized high-quality code dataset **CC-MinerU-Code**, containing **4.58M** samples, also extracted from the ful...
hf-space--akhaliq--anycoder
--- title: AnyCoder emoji: 🏆 colorFrom: blue colorTo: purple sdk: docker app_port: 7860 pinned: false disable_embedding: false hf_oauth: true hf_oauth_expiration_minutes: 43200 hf_oauth_scopes: - manage-repos - write-discussions --- AnyCoder is a full-stack AI-powered code generator with a modern React/TypeScript frontend and FastAPI backend. Generate applications by describing them in plain English, with support for multiple AI models and one-click deployment to Hugging Face Spaces. - **Mod...
hf-space--huggingfacetb--smol-training-playbook
--- title: The Smol Training Playbook short_description: The secrets to building world-class LLMs emoji: 📚 colorFrom: blue colorTo: indigo sdk: docker pinned: false header: mini app_port: 8080 tags: - research-article-template - research paper - scientific paper - data visualization thumbnail: >- https://huggingface.co/spaces/HuggingFaceTB/smol-training-playbook/public/thumb.png --- <div align="center"> **A modern, interactive template for scientific writing** that brings papers to life with...
hf-dataset--fka--awesome-chatgpt-prompts
--- license: cc0-1.0 tags: - ChatGPT - prompts - AI - GPT - Claude - Gemini - Llama - Mistral - LLM - prompt-engineering - conversational-ai - text-generation - chatbot - awesome-list task_categories: - question-answering - text-generation size_categories: - 100K<n<1M --- <p align="center"> <img width="558" height="148" alt="prompts.chat logo" src="https://github.com/user-attachments/assets/8de2ba4c-5e89-4aae-aecb-32b188fb1bfb" /> <br> <small>a.k.a. Awesome ChatGPT Prompts</small> </p> This i...
hf-dataset--fka--prompts.chat
--- license: cc0-1.0 tags: - ChatGPT - prompts - AI - GPT - Claude - Gemini - Llama - Mistral - LLM - prompt-engineering - conversational-ai - text-generation - chatbot - awesome-list task_categories: - question-answering - text-generation size_categories: - 100K<n<1M --- <p align="center"> <img width="558" height="148" alt="prompts.chat logo" src="https://github.com/user-attachments/assets/8de2ba4c-5e89-4aae-aecb-32b188fb1bfb" /> <br> <small>a.k.a. Awesome ChatGPT Prompts</small> </p> This i...
Bitcoin Historical Data
Bitcoin data at 1-min intervals from select exchanges, Jan 2012 to Present
hf-space--zerogpu-aoti--wan2-2-fp8da-aoti-faster
--- title: Wan2.2 14B Fast emoji: 🎥💨 colorFrom: gray colorTo: pink sdk: gradio sdk_version: 6.1.0 app_file: app.py pinned: true short_description: generate a video from an image with a text prompt --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-dataset--openai--gsm8k
--- annotations_creators: - crowdsourced language_creators: - crowdsourced language: - en license: - mit multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - text-generation task_ids: [] paperswithcode_id: gsm8k pretty_name: Grade School Math 8K tags: - math-word-problems dataset_info: - config_name: main features: - name: question dtype: string - name: answer dtype: string splits: - name: train num_bytes: 3963202 num_examples: 7473 - name:...
hf-space--finegrain--finegrain-image-enhancer
--- title: Finegrain Image Enhancer emoji: 🖼️🪄 colorFrom: pink colorTo: indigo sdk: gradio sdk_version: 5.27.1 python_version: 3.12 app_file: src/app.py pinned: false short_description: Clarity AI Upscaler Reproduction license: mit models: - refiners/juggernaut.reborn.sd1_5.unet - refiners/juggernaut.reborn.sd1_5.autoencoder - refiners/juggernaut.reborn.sd1_5.text_encoder - refiners/controlnet.sd1_5.tile - philz1337x/upscaler - philz1337x/embeddings - philz1337x/loras tags: - enhancer - ups...
hf-dataset--allenai--dolma3_mix-6t-1025-7b
--- license: odc-by task_categories: - text-generation language: - en configs: - config_name: default data_files: - split: train path: data/**/*.jsonl.zst features: - name: id dtype: string - name: text dtype: string - name: metadata dtype: string - name: source dtype: string - name: version dtype: string - name: created dtype: string - name: added dtype: string - name: doc dtype: string - name: attributes dtype: string --- For all other training use cases, including training from scratch, **...
hf-space--linoyts--qwen-image-edit-angles
--- title: Qwen Image Edit Camera Control emoji: 🎬 colorFrom: indigo colorTo: pink sdk: gradio sdk_version: 6.2.0 app_file: app.py pinned: true license: apache-2.0 short_description: Fast 4 step inference with Qwen Image Edit 2509 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-space--mrfakename--z-image-turbo
--- title: Z Image Turbo emoji: 🖼️ colorFrom: yellow colorTo: yellow sdk: gradio sdk_version: 6.0.1 app_file: app.py pinned: true --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Fruits-360 dataset
A dataset with 173786 images of 250 fruits, vegetables, nuts and seeds
hf-space--sczhou--codeformer
--- title: CodeFormer emoji: 🐼 colorFrom: blue colorTo: green sdk: gradio sdk_version: 6.1.0 app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-space--jasperai--flux.1-dev-controlnet-upscaler
--- title: Flux.1-dev Upscaler emoji: 🔎 colorFrom: green colorTo: blue sdk: gradio sdk_version: 5.29.0 python_version: 3.12 app_file: app.py pinned: false license: other tags: - upscaler - super-resolution - controlnet - flux.1-dev license_name: flux-1-dev-non-commercial-license license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE. --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-space--resembleai--chatterbox
--- title: Chatterbox TTS emoji: 🍿 colorFrom: indigo colorTo: blue sdk: gradio sdk_version: 6.3.0 app_file: app.py pinned: false short_description: Expressive Zeroshot TTS --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-space--nihalgazi--text-to-speech-unlimited
--- title: Realistic Text To Speech Unlimited emoji: 🔥 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 5.28.0 app_file: app.py pinned: true short_description: Free Text-To-Speech generator with Emotion control (OpenAI) --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-dataset--nvidia--physicalai-autonomous-vehicles
Technical specifications and benchmarking details are available on the entity page.
hf-space--dontplantoend--ugi-leaderboard
--- title: UGI Leaderboard emoji: 📢 colorFrom: gray colorTo: purple sdk: docker app_port: 8050 pinned: false license: apache-2.0 short_description: Uncensored General Intelligence Leaderboard tags: - leaderboard - submission:manual - test:private - modality:text - eval:generation - eval:safety - eval:math - language:English --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-space--qwen--qwen3-tts
--- title: Qwen3-TTS Demo emoji: 🎙️ colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.33.0 app_file: app.py pinned: false license: apache-2.0 suggested_hardware: zero-a10g ---
hf-dataset--ntu-nlp-sg--xcodeeval
--- annotations_creators: - expert-generated language: - code - en language_creators: - found - expert-generated license: - cc-by-4.0 multilinguality: - multilingual pretty_name: xCodeEval size_categories: - 1M<n<10M - 10M<n<100M source_datasets: - original tags: - programming-language - code - program-synthesis - automatic-code-repair - code-retrieval - code-translation - code-classification task_categories: - translation - token-classification - text2text-generation - text-retrieval - text-...
hf-space--not-lain--background-removal
--- title: Background Removal emoji: 🌘w🌖 colorFrom: purple colorTo: indigo sdk: gradio sdk_version: 5.35.0 app_file: app.py pinned: true license: mit --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-space--ai4editing--magicquill
--- title: MagicQuill emoji: 🪶 colorFrom: purple colorTo: yellow sdk: gradio sdk_version: 4.44.1 app_file: app.py pinned: false disable_embedding: true license: cc-by-nc-4.0 --- The paper is MagicQuill: An Intelligent Interactive Image Editing System. Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference.
hf-dataset--wendyl21--parity-experiments
This dataset saves the oracle and parity experiment logs for adapters. Please upload them according to the following format and draft a PR. We will use them to estimate costs for each adapter on diverse agent and models. --- license: apache-2.0 ---
hf-space--gokaygokay--flux-prompt-generator
--- title: FLUX Prompt Generator emoji: 😻 colorFrom: red colorTo: blue sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: true license: apache-2.0 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-space--multimodalart--qwen-image-multiple-angles-3d-camera
--- title: Qwen Image Multiple Angles 3D Camera emoji: 🎥 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 6.2.0 app_file: app.py pinned: false --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-space--tonyassi--video-face-swap
--- title: Video Face Swap emoji: 👱🏻♀️ colorFrom: pink colorTo: indigo sdk: gradio sdk_version: 5.38.0 app_file: app.py pinned: true disable_embedding: false short_description: Video deep fake --- facefusion MIT License Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-dataset--genrobot2025--10kh-realomin-opendata
Technical specifications and benchmarking details are available on the entity page.
hf-space--prithivmlmods--flux-lora-dlc
--- title: FLUX LoRA DLC emoji: 🥳 colorFrom: indigo colorTo: gray sdk: gradio sdk_version: 6.3.0 app_file: app.py pinned: true license: creativeml-openrail-m short_description: 270+ Impressive LoRAs for Flux.1 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-space--hf-audio--open_asr_leaderboard
--- title: Open ASR Leaderboard emoji: 🏆 colorFrom: red colorTo: blue sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: true tags: - leaderboard --- ```bibtex @misc{srivastav2025openasrleaderboardreproducible, title={Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation}, author={Vaibhav Srivastav and Steven Zheng and Eric Bezzam and Eustache Le Bihan and Nithin Koluguri and Piotr Żelasko and Somshubra Majumdar and Adel Mou...
hf-dataset--allenai--dolma3_dolmino_mix-100b-1125
--- license: odc-by language: - en --- <img alt="Logo for Dolmino Mix" src="dolmino-mix.png" width="289px" style="margin-left:'auto' margin-right:'auto' display:'block'"> This dataset contains the high-quality pool of data considered for the second stage of Olmo 3 32B. | Source | Category | |--------|----------| | TinyMATH Mind | Math (synth) | | TinyMATH PoT | Math (synth) | | CraneMath | Math (synth) | | MegaMatt | Math (synth) | | Dolmino Math | Math (synth) | | StackEdu (FIM) | Code | | C...
hf-dataset--jsinowitz--snodas-snowmelt-cache
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--dsinghvi--eval_awareness
--- license: mit tags: - eval_awareness - evals pretty_name: Eval Awareness Dataset size_categories: - 1K<n<10K --- Eval Awareness Dataset with contrastive pairs of with and without eval cues with behavioural changes across various misaligned situations. Also we provide automated scripts to create these scenarios at with lots of other codes dumped regarding suppression of eval awareness https://github.com/divyanshsinghvi/evalawareness_techniques/ Authors: @divyanshsinghvi, @Riteshbhalerao11
hf-dataset--banned-historical-archives--banned-historical-archives
--- size_categories: - n>1T --- 和谐历史档案馆数据集包含已录入 https://banned-historical-archives.github.io 和暂未未录入的原始文件。 - banned-historical-archives.github.io # 已录入该网站的原始数据,不定期从 github 仓库中同步 - raw # 原始文件 - config # 配置文件 - todo # 存放暂未录入网站的文件 部分报纸和图片资料存放在单独的仓库: |名称| 地址 | 状态 | |---|---|---| |参考消息|https://huggingface.co/datasets/banned-historical-archives/ckxx|未录入| |人民日报|https://huggingface.co/datasets/banned-historical-archives/rmrb|已精选重要的文章录入| |文汇报| https://huggingface.co/datasets/banned-historical-archives/...
hf-space--fffiloni--diffusers-image-outpaint
--- title: Diffusers Image Outpaint emoji: 🔅 colorFrom: gray colorTo: purple sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: true disable_embbeding: true license: apache-2.0 short_description: 'Easily expand image boundaries ' --- Check out the configuration reference at <https://huggingface.co/docs/hub/spaces-config-reference>
hf-dataset--xlangai--ubuntu_osworld_file_cache
--- license: apache-2.0 --- This repository serves as a file cache for the OSWorld project, providing reliable and fast access to evaluation files that were previously hosted on Google Drive. OSWorld is a scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating systems and applications. This cache repository ensures that all evaluation files are consistently accessible for research and devel...
hf-space--tongyi-mai--z-image-turbo
--- title: Z Image Turbo emoji: 🏃 colorFrom: blue colorTo: pink sdk: gradio sdk_version: 6.0.1 app_file: app.py pinned: false license: apache-2.0 disable_embedding: true --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-space--innoai--edge-tts-text-to-speech
--- title: Edge TTS Text To Speech emoji: 👁 colorFrom: pink colorTo: yellow sdk: gradio sdk_version: 5.25.2 app_file: app.py pinned: false license: gpl-2.0 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-dataset--animefans--iwara_mmd_all
--- license: other tags: - not-for-all-audiences --- > For English readers, please see the English README. > **⚠️ 内容警告:此数据集源于 Iwara.tv,其中可能包含成人内容 (NSFW/R-18)。请在合适的环境下使用。** 本项目包含了自 Iwara.tv 建站至 2025年6月15日 的所有公开视频,总大小超过 30TB。Iwara.tv 是一个以 MikuMikuDance (MMD) 视频创作为主的分享社区。此数据集旨在为计算机视觉、数据分析、推荐系统等领域的研究人员提供素材。 与视频文件配套的**元数据**(如标题、作者、标签等)已被分离至另一个仓库,请访问:ACCA225/iwara_metadata 数据集的视频文件按 的目录格式进行组织,例如 文件夹内包含了2024年6月上传的所有视频。数据集具体大小见末尾一些说明 直接在本页上方的的Files and Version选择栏进入下载 - 下载特定目录: - 下载所有目录: - 各目录大小: - 视频...
hf-dataset--opensqz--automathtext-v2
--- task_categories: - text-generation - question-answering language: - en - zh tags: - LLM - pretraining - finetuning - midtraining - reasoning - STEM - math size_categories: - 10B<n<100B configs: - config_name: automathtext-v2-ultra data_files: - split: train path: - nemotron_cc_high/80-90/*.parquet - nemotron_cc_high/90-100/*.parquet - nemotron_cc_medium_high/80-90/*.parquet - nemotron_cc_medium_high/90-100/*.parquet - dclm/80-90/*.parquet - dclm/90-100/*.parquet - fineweb_edu/80-90/*.parq...
hf-dataset--martj42--international-football-results-from-1872-to-2017
An up-to-date dataset of over 49,000 international football results
hf-space--fishaudio--s1-mini
--- title: Fish Audio S1 emoji: 🏆 colorFrom: purple colorTo: gray sdk: gradio sdk_version: 5.32.1 app_file: app.py pinned: true license: cc-by-nc-sa-4.0 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
hf-dataset--taohan10200--cra5-dataset
--- license: cdla-sharing-1.0 task_categories: - time-series-forecasting - compression tags: - climate - weather - era5 - cra5 pretty_name: CRA5 ERA5 Dataset size_categories: - 1T<n<10T --- <!-- !ID-CompressAI-logo --> <a href="url"><img src="assets/CRA5LOGO.svg" align="center"></a> <div align="center"> </div> We introduce **VAEformer**, a variational autoencoder transformer designed for the extreme compression of climate data. Addressing the storage challenges of massive datasets like ERA5, ...
hf-dataset--enronarchive--mail
An accessible and organized copy of the Enron Email dataset. Part of the **Enron Archive**. A live mirror is available at https://mail.enroncorp.com - **151 employee mailboxes** with complete email history and attachments - **Pseudo-login system** - Browse any account; stays logged in until logged off - **Folder navigation** - Inbox, Sent Items, and custom folders per user - **Full email viewing** - Read complete emails with headers, body, and attachments - **Attachment support** - Download o...
hf-dataset--rajeev-gupta--hnm-search-data
--- configs: - config_name: articles data_files: - split: train path: data/raw/articles.csv - config_name: customers data_files: - split: train path: data/raw/customers.csv - config_name: transactions data_files: - split: train path: data/raw/transactions_train.csv dataset_info: - config_name: articles features: - name: article_id dtype: int64 - name: product_code dtype: int64 - name: prod_name dtype: string - name: product_type_no dtype: int64 - name: product_type_name dtype: string - name: ...
hf-dataset--lijiaxin0111--m3_vos
--- configs: - config_name: default data_files: - split: test path: m3vos_viewer_data_with_paths.jsonl license: apache-2.0 task_categories: - video-classification language: - en - zh tags: - CVPR2025 - video - segmentation - computer-vision - physical - M3-VOS pretty_name: M3-VOS size_categories: - n<1K --- <h2 align="center"> <a href="https://zixuan-chen.github.io/M-cube-VOS.github.io/">[CVPR 2025] M<sup>3</sup>-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation<...
hf-dataset--spw2000--temp_video
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--brightonzen17--movie-m1-v3
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--keymotek--simdata-dataset
--- license: apache-2.0 task_categories: - object-detection language: - en pretty_name: 'SimData-NuScenes: Synthetic Autonomous Driving Dataset' size_categories: - 100B<n<1T --- <!-- Provide a quick summary of the dataset. --> **SimData-NuScenes** is a large-scale synthetic dataset generated from high-fidelity simulation environments using **aiSim**. By leveraging **aiSim's** advanced physics engine and deterministic sensor modeling, we ensure that every frame maintains **high-quality visual ...
hf-dataset--brightonzen17--movie-m4-v5
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--borisguo--pair_touch_13m
--- configs: - config_name: pose_data data_files: "pose_data/metadata.jsonl" - config_name: force_data data_files: "force_data/metadata.jsonl" - config_name: tacniq_gsmini data_files: "tacniq_gsmini/metadata.jsonl" - config_name: xela_9dtact data_files: "xela_9dtact/metadata.jsonl" --- Multi-modal tactile dataset with pose, force, and tactile sensor data. | Config | Description | Sensors | |--------|-------------|---------| | | Pose estimation data | tac02/xela + camera | | | Force measuremen...
hf-dataset--johnbosco20--community_dataset_v3
--- license: apache-2.0 tags: - robotics - community - so100 - so101 - manipulation - smolvla - lerobot community - vision-language-action - embodied-ai - cross-embodiment task_categories: - robotics language: - en size_categories: - 10M<n<100M pretty_name: Community Dataset v3 --- A large-scale robotics dataset for vision-language-action learning, featuring **791 datasets** across **46 robot types**, enabling cross-embodiment pretraining for generalist robot policies. !3 This is a **crowdsou...
hf-dataset--sunghong--cads-dataset
--- license: other license_name: cadsdataset license_link: https://github.com/murong-xu/CADS task_categories: - image-segmentation tags: - medical - ct - segmentation - image - 3d - whole-body - anatomy size_categories: - 10K<n<100K configs: - config_name: 0001_visceral_gc data_files: - split: all path: "0001_visceral_gc/0001_visceral_gc.csv" - config_name: 0002_visceral_sc data_files: - split: all path: "0002_visceral_sc/0002_visceral_sc.csv" - config_name: 0003_kits21 data_files: - split: a...
hf-dataset--brightonzen17--movie-m4-v8
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--brightonzen17--movie-m3-v10
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--wuzhao73--3d-adam
--- size_categories: 100K<n<1M modalities: - image - 3D language: - en pretty_name: '"3D ADAM"' tags: - computer-vision - anomaly-detection - 3D-anomaly-detection - 3D - industrial - detection - vision - anomaly - 3d-adam - advanced-manufacturing - manufacturing license: cc-by-nc-sa-4.0 --- Repository for the 3D-ADAM (3D Anomaly Detection in Additive Manufacturing) Dataset. This is the raw data for our complete dataset, separated by part-instance to allow users to utilise the dataset as desir...
hf-dataset--nurshatmenglik--paired_compressible_boussinesq_flow_simulation_with_random_temperature_bcs
--- license: cc-by-4.0 ---
hf-dataset--brightonzen17--movie-m4-v1
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--trl-lib--documentation-images
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--zhu0110--sa-med3d-140k
--- license: apache-2.0 --- SA-Med3D-140K is a large-scale, multi-modal, multi-anatomical volumetric medical image segmentation dataset. It was created to facilitate the development of general-purpose foundation models for 3D medical image segmentation. The dataset comprises 21,729 3D medical images and 143,518 corresponding masks. It was gathered from a combination of 70 public datasets and 8,128 privately licensed annotated cases from 24 hospitals. The primary task supported by this dataset...
hf-dataset--sazirarrwth99--droid_metadata_only
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--brightonzen17--movie-m1-v2
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--brightonzen17--movie-m4-v6
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--lazlo-bleker--bridge-net
--- license: mit pretty_name: BridgeNet size_categories: - 10K<n<100K --- This is the main repository of BridgeNet, a dataset of 20,000 form-found bridge structures. Each bridge contains: - a pin-jointed equilibrium wireframe model generated with the Combinatorial Equilibrium Modeling (CEM) form-finding method - a volumetric 3D mesh obtained through force-informed materialization - rendered images from two canonical camera angles More information on this dataset can be found in the BridgeNet ...
hf-dataset--yqy6--slides-align
--- license: mit task_categories: - other language: - en tags: - slides-generation - human-preference - benchmark pretty_name: Slides-Align --- **Project Page** | **Paper** | **GitHub** **Slides-Align** is a human preference dataset for evaluating AI-generated slide presentations, introduced as part of the **SlidesGen-Bench** framework. It contains **1,326 human rankings** comparing presentations generated by **9 different AI slide generation products** across **7 scenario categories** and **...
hf-dataset--brightonzen17--movie-m2-v8
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--warmhammer--osma-bench_dataset
--- license: cc-by-4.0 task_categories: - robotics - image-segmentation - image-text-to-text --- **Project Page** | **Paper** | **Code** OSMa-Bench (Open Semantic Mapping Benchmark) dataset is a fully automatically generated dataset for evaluating the robustness of open semantic mapping and segmentation systems under varying indoor lighting conditions and robot movement dynamics. This dataset is part of OSMa-Bench pipeline. This dataset provides simulated RGB-D and semantically annotated pose...
hf-dataset--thu-pacman--pcmind-2.1-kaiyuan-2b
--- license: apache-2.0 task_categories: - text-generation language: - zh - en tags: - code - math - language - sft size_categories: - n>1T --- This repository contains the complete pretraining dataset for PCMind-v2.1-Kaiyuan-2B, a leading fully open-source language model. The dataset is organized into **5 training phases**, with all phase datasets open-sourced in this repository. Our training methodology employs domain-specific mixing strategies across five primary domains: - English: Genera...
hf-dataset--brightonzen17--movie-m3-v8
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--khushhiii08--idd_detection_for_yolo
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--brightonzen17--movie-m4-v3
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--nyuuzyou--gitee-code
--- annotations_creators: - machine-generated language_creators: - found language: - code - zh - en license: other multilinguality: - multilingual pretty_name: Gitee Code Dataset size_categories: - 100M<n<1B source_datasets: - original task_categories: - text-generation tags: - code - chinese configs: - config_name: default data_files: - split: train path: "data/*.parquet" default: true dataset_info: features: - name: code dtype: string - name: repo_name dtype: string - name: path dtype: stri...
hf-dataset--varunburde--transparent_finetune_dataset
--- license: cc-by-4.0 task_categories: - object-detection - robotics tags: - 6d-pose-estimation - transparency - megapose - bop - synthetic pretty_name: Transparent Object Pose Estimation size_categories: - 100K<n<1M --- This dataset consists of raw rendered Physically Based Rendering (PBR) data and 3D mesh assets designed for training and fine-tuning pose-estimation models, specifically adapting Megapose for transparent objects. Created by Varun Burde in November 2024 and prepared for distr...
hf-dataset--brightonzen17--movie-m2-v12
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--elliotvincent--b-flair-spot
--- license: etalab-2.0 task_categories: - image-segmentation language: - en tags: - remote-sensing - earth-observation - change-detection - weak-temporal-supervision pretty_name: b-FLAIR-spot size_categories: - 10K<n<100K viewer: false --- <img src="./thumbnail.png" alt="b-FLAIR-spot" width="700"> b-FLAIR-spot is a temporal extension of the FLAIR dataset [1], mirroring b-FLAIR in SPOT-6/7 modality, focused on land cover classification in France. The dataset provides bi-temporal satellite ima...
hf-dataset--hexingchen--seismic-ai-data
--- license: mit size_categories: - n>1T language: - zh - en tags: - climate - code pretty_name: chuanjun --- This dataset is for AI seismology and can be used for tasks such as phase picking and polarity determination.Some datasets are download in seisbench. I am developing a unified tool for polarity determination--SeisPolarity, and I hope that like-minded individuals will join me.
hf-dataset--mvp-lab--llava-onevision-1.5-instruct-data
--- license: apache-2.0 task_categories: - image-text-to-text language: - en tags: - multimodal - vision-language-model - lmm - instruction-tuning - pretraining - dataset-collection - vqa - image-captioning - large-language-model configs: - config_name: CLEVR data_files: - split: train path: CLEVR/train-* - config_name: CLEVR-Math data_files: - split: train path: CLEVR-Math/train-* - config_name: Docmatix data_files: - split: train path: Docmatix/train-* - config_name: Docmatix-part-00-of-10 ...
hf-dataset--h1merka--tigas_dataset
--- license: mit task_categories: - image-classification language: - en tags: - ai-generated-image-detection - deepfake-detection - synthetic-image-detection - computer-vision - binary-classification - gan-detection - diffusion-model-detection size_categories: - 100K<n<1M pretty_name: TIGAS Dataset dataset_info: features: - name: image_path dtype: string - name: label dtype: int64 splits: - name: train num_examples: 128776 - name: test num_examples: 14126 --- <div align="center"> ![Dataset Si...
hf-dataset--tomg-group-umd--huginn-dataset
--- tags: - code - math - reasoning - llm license: other language: - en source_datasets: - HuggingFaceTB/smollm-corpus - jon-tow/starcoderdata-python-edu - ubaada/booksum-complete-cleaned - euirim/goodwiki - togethercomputer/RedPajama-Data-1T - allenai/dolma - bigcode/the-stack-v2-train-smol-ids - bigcode/starcoderdata - m-a-p/Matrix - cerebras/SlimPajama-627B - open-phi/textbooks - open-phi/textbooks_grounded - open-phi/programming_books_llama - nampdn-ai/tiny-strange-textbooks - nampdn-ai/t...
hf-dataset--elonelonelon--2025-challenge-demos
--- license: mit task_categories: - robotics tags: - LeRobot - v - '2' - . - '1' configs: - config_name: default data_files: data/*/*.parquet --- This dataset was created using LeRobot. - **Homepage:** [More Information Needed] - **Paper:** [More Information Needed] - **License:** mit meta/info.json: **BibTeX:**
hf-dataset--yanzu128--realsource-world
--- pretty_name: RealSource World size_categories: - 100B<n<1T task_categories: - robotics language: - en tags: - real-world - dual-arm - robotics manipulation - humanoid robot license: cc-by-nc-4.0 --- <div align="center"> <video controls autoplay src="https://realmanrobot.github.io/real_source_dataset/assets/real_source_video-CQfv30ls.mp4"></video> </div> RealSource World is a large-scale real-world robotics manipulation dataset collected using the RS-02 dual-arm humanoid robot. This datase...
hf-dataset--brightonzen17--movie-m1-v8
Technical specifications and benchmarking details are available on the entity page.
hf-dataset--brightonzen17--movie-m2-v10
Technical specifications and benchmarking details are available on the entity page.