recipes
| Entity Passport | |
| Registry ID | hf-dataset--chonkie-ai--recipes |
| License | Apache-2.0 |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__chonkie_ai__recipes,
author = {Chonkie Ai},
title = {recipes Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/chonkie-ai/recipes}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
âī¸ Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for recipes: Semantic (S:50), Authority (A:0), Popularity (P:54), Recency (R:36), Quality (Q:30).
Verification Authority
đī¸ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
đ Explore Full Dataset âđ§Ŧ Field Logic
Schema not yet indexed for this dataset.
Dataset Specification

đĻ Chonkie Recipes đŗ
Chonkie loves to cook up a storm in the kitchen
This repository contains all the recipes that you can use with Chonkie to manage various documents, languages, and more.
Usage
To use the recipes, you need to install chonkie with the hub feature, with the following command:
pip install "chonkie[hub]"
This would enable Hubie which is used internally to get the recipes from this repository. So, you can do things like use the from_recipe method to create a RecursiveChunker from a recipe.
from chonkie import RecursiveChunker
chunker = RecursiveChunker.from_recipe("markdown")
with open("example.md", "r") as f:
text = f.read()
chunks = chunker.chunk(text)
for chunk in chunks:
print(chunk)
The above code would read the example.md file and chunk it into smaller chunks using the markdown recipe.
Recipes
The recipes are stored in the recipes folder. Each recipe is a JSON file that contains the rules for the recipe.
{
"name": "markdown",
"description": "Recipe for markdown documents in English",
"language": "en",
"metadata": {
"version": "0.1.0",
"author": "Chonkie Team"
},
"recipe": {
"delimiters": [".", "!", "?", "\n"],
"include_delim": "prev",
"recursive_rules": [
{
"delimiters": ["######", "#####", "####", "###", "##", "#"],
"include_delim": "next",
"whitespace": false
},
{
"delimiters": ["\n\n", "\n\r"],
"include_delim": "next",
"whitespace": false
},
...
]
}
}
đ Available Recipes
Default Language Recipes
| Recipe Name | Language Code | Language Name | Description |
|---|---|---|---|
default_af |
af |
Afrikaans | Default recipe for plaintext documents in Afrikaans |
default_am |
am |
Amharic | Default recipe for plaintext documents in Amharic |
default_ar |
ar |
Arabic | Default recipe for plaintext documents in Arabic |
default_bg |
bg |
Bulgarian | Default recipe for plaintext documents in Bulgarian |
default_bn |
bn |
Bengali | Default recipe for plaintext documents in Bengali |
default_ca |
ca |
Catalan | Default recipe for plaintext documents in Catalan |
default_cs |
cs |
Czech | Default recipe for plaintext documents in Czech |
default_cy |
cy |
Welsh | Default recipe for plaintext documents in Welsh |
default_da |
da |
Danish | Default recipe for plaintext documents in Danish |
default_de |
de |
German | Default recipe for plaintext documents in German |
default_en |
en |
English | Default recipe for plaintext documents |
default_es |
es |
Spanish | Default recipe for plaintext documents in Spanish |
default_et |
et |
Estonian | Default recipe for plaintext documents in Estonian |
default_fa |
fa |
Persian/Farsi | Default recipe for plaintext documents in Persian/Farsi |
default_fi |
fi |
Finnish | Default recipe for plaintext documents in Finnish |
default_fr |
fr |
French | Default recipe for plaintext documents in French |
default_ga |
ga |
Irish | Default recipe for plaintext documents in Irish |
default_gl |
gl |
Galician | Default recipe for plaintext documents in Galician |
default_he |
he |
Hebrew | Default recipe for plaintext documents in Hebrew |
default_hi |
hi |
Hindi | Default recipe for plaintext documents in Hindi |
default_hr |
hr |
Croatian | Default recipe for plaintext documents in Croatian |
default_hu |
hu |
Hungarian | Default recipe for plaintext documents in Hungarian |
default_id |
id |
Indonesian | Default recipe for plaintext documents in Indonesian |
default_is |
is |
Icelandic | Default recipe for plaintext documents in Icelandic |
default_it |
it |
Italian | Default recipe for plaintext documents in Italian |
default_jp |
jp |
Japanese | Default recipe for plaintext documents in Japanese |
default_kr |
kr |
Korean | Default recipe for plaintext documents in Korean |
default_lt |
lt |
Lithuanian | Default recipe for plaintext documents in Lithuanian |
default_lv |
lv |
Latvian | Default recipe for plaintext documents in Latvian |
default_mk |
mk |
Macedonian | Default recipe for plaintext documents in Macedonian |
default_ms |
ms |
Malay | Default recipe for plaintext documents in Malay |
default_mt |
mt |
Maltese | Default recipe for plaintext documents in Maltese |
default_ne |
ne |
Nepali | Default recipe for plaintext documents in Nepali |
default_nl |
nl |
Dutch | Default recipe for plaintext documents in Dutch |
default_no |
no |
Norwegian | Default recipe for plaintext documents in Norwegian |
default_pl |
pl |
Polish | Default recipe for plaintext documents in Polish |
default_pt |
pt |
Portuguese | Default recipe for plaintext documents in Portuguese |
default_pt-BR |
pt-BR |
Brazilian Portuguese | Default recipe for plaintext documents in Brazilian Portuguese |
default_ro |
ro |
Romanian | Default recipe for plaintext documents in Romanian |
default_ru |
ru |
Russian | Default recipe for plaintext documents in Russian |
default_sk |
sk |
Slovak | Default recipe for plaintext documents in Slovak |
default_sl |
sl |
Slovenian | Default recipe for plaintext documents in Slovenian |
default_sr |
sr |
Serbian | Default recipe for plaintext documents in Serbian |
default_sv |
sv |
Swedish | Default recipe for plaintext documents in Swedish |
default_sw |
sw |
Swahili | Default recipe for plaintext documents in Swahili |
default_ta |
ta |
Tamil | Default recipe for plaintext documents in Tamil |
default_te |
te |
Telugu | Default recipe for plaintext documents in Telugu |
default_th |
th |
Thai | Default recipe for plaintext documents in Thai |
default_tl |
tl |
Filipino/Tagalog | Default recipe for plaintext documents in Filipino/Tagalog |
default_tr |
tr |
Turkish | Default recipe for plaintext documents in Turkish |
default_uk |
uk |
Ukrainian | Default recipe for plaintext documents in Ukrainian |
default_ur |
ur |
Urdu | Default recipe for plaintext documents in Urdu |
default_vi |
vi |
Vietnamese | Default recipe for plaintext documents in Vietnamese |
default_zh |
zh |
Chinese | Default recipe for plaintext documents in Chinese |
default_zu |
zu |
Zulu | Default recipe for plaintext documents in Zulu |
Specialized Format Recipes
| Recipe Name | Language Code | Language Name | Description |
|---|---|---|---|
markdown_en |
en |
English | Recipe for markdown documents in English |
The full JSON schema for the recipes is available here. You can use this schema to validate the recipes before using them.
Contributing
If you want to contribute to this repository, you can do so by creating a pull request. Please follow the guidelines for the recipes and the schema.
License
This project is licensed under the Apache 2.0 License.
đ Structured Schema (Zero-Fabrication)
| Feature Key | Data Type |
|---|---|
name |
string |
schema |
string |
description |
string |
language |
string |
metadata |
unknown |
recipe |
unknown |
Estimated Rows: 56
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Dataset Transparency Report
Technical metadata sourced from upstream repositories.
đ Identity & Source
- id
- hf-dataset--chonkie-ai--recipes
- slug
- chonkie-ai--recipes
- source
- huggingface
- author
- Chonkie Ai
- license
- Apache-2.0
- tags
- license:apache-2.0, size_categories:n<1k, format:json, modality:text, library:datasets, library:dask, library:mlcroissant, region:us
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
- pipeline tag
đ Engagement & Metrics
- downloads
- 52,076
- stars
- 11
- forks
- 0
Data indexed from public sources. Updated daily.