FinCoT
| Entity Passport | |
| Registry ID | hf-dataset--thefinai--fincot |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__thefinai__fincot,
author = {TheFinAI},
title = {FinCoT Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/thefinai/fincot}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
âī¸ Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for FinCoT: Semantic (S:50), Authority (A:0), Popularity (P:60), Recency (R:25), Quality (Q:30).
Verification Authority
đī¸ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
đ Explore Full Dataset âđ§Ŧ Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
Fino1 is a financial reasoning dataset based on FinQA, ConvFinQA, TATQA, DocMath-Eval, Econ-Logic, BizBench-QA, DocFinQA dataset., with GPT-4o-generated reasoning paths to enhance structured financial question answering.
For more details, please check our paper Fin-o1[arxiv.org/abs/2502.08127].
Source Data
Initial Data Collection and Normalization
The dataset originates from FinQA, TATQA, DocMath-Eval, Econ-Logic, BizBench-QA, DocFinQA dataset.
FinQA (Apache 2.0): A dataset for financial question answering, incorporating structured tables and textual context to test multi-step reasoning abilities.
TATQA (CC BY 4.0): A tabular question-answering dataset that includes diverse financial reports, allowing for multi-step reasoning over tables and text.
DocMath-Eval (MIT License): A dataset designed to evaluate mathematical reasoning over financial documents, focusing on quantitative financial statements.
Econ-Logic (CC BY-NC-SA 4.0): A dataset that requires logical reasoning over economic and financial texts, with restrictions on commercial use.
BizBench-QA (Apache 2.0): A business-focused question-answering dataset that tests contextual understanding and financial reasoning.
DocFinQA (MIT License): A financial QA dataset that includes multi-document reasoning, designed for comprehensive financial statement analysis.
ConvFinQA (MIT License): A dataset for conversational financial QA, allowing for multi-turn interactions and progressive information extraction.
Annotations
Annotation Process
We employ an iterative verification and refinement strategy, utilizing GPT-4o to generate a comprehensive reasoning process for each question-answer pair.
đĄ Citation
If you use this dataset in your research, please cite the original paper and our paper:
@article{qian2025fino1,
title={Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance},
author={Qian, Lingfei and Zhou, Weipeng and Wang, Yan and Peng, Xueqing and Huang, Jimin and Xie, Qianqian},
journal={arXiv preprint arXiv:2502.08127},
year={2025}
}
@article{chen2021finqa,
title={Finqa: A dataset of numerical reasoning over financial data},
author={Chen, Zhiyu and Chen, Wenhu and Smiley, Charese and Shah, Sameena and Borova, Iana and Langdon, Dylan and Moussa, Reema and Beane, Matt and Huang, Ting-Hao and Routledge, Bryan and others},
journal={arXiv preprint arXiv:2109.00122},
year={2021}
@article{chen2022convfinqa,
title={Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering},
author={Chen, Zhiyu and Li, Shiyang and Smiley, Charese and Ma, Zhiqiang and Shah, Sameena and Wang, William Yang},
journal={arXiv preprint arXiv:2210.03849},
year={2022}
}
@misc{zhu2021tatqaquestionansweringbenchmark,
title={TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance},
author={Fengbin Zhu and Wenqiang Lei and Youcheng Huang and Chao Wang and Shuo Zhang and Jiancheng Lv and Fuli Feng and Tat-Seng Chua},
year={2021},
eprint={2105.07624},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2105.07624},
}
@inproceedings{zhao2024docmath,
title={DocMath-eval: Evaluating math reasoning capabilities of LLMs in understanding long and specialized documents},
author={Zhao, Yilun and Long, Yitao and Liu, Hongjun and Kamoi, Ryo and Nan, Linyong and Chen, Lyuhao and Liu, Yixin and Tang, Xiangru and Zhang, Rui and Cohan, Arman},
booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={16103--16120},
year={2024}
}
@article{quan2024econlogicqa,
title={Econlogicqa: A question-answering benchmark for evaluating large language models in economic sequential reasoning},
author={Quan, Yinzhu and Liu, Zefang},
journal={arXiv preprint arXiv:2405.07938},
year={2024}
}
@inproceedings{krumdick2024bizbench,
title={BizBench: A Quantitative Reasoning Benchmark for Business and Finance},
author={Krumdick, Michael and Koncel-Kedziorski, Rik and Lai, Viet Dac and Reddy, Varshini and Lovering, Charles and Tanner, Chris},
booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={8309--8332},
year={2024}
}
@article{reddy2024docfinqa,
title={Docfinqa: A long-context financial reasoning dataset},
author={Reddy, Varshini and Koncel-Kedziorski, Rik and Lai, Viet Dac and Krumdick, Michael and Lovering, Charles and Tanner, Chris},
journal={arXiv preprint arXiv:2401.06915},
year={2024}
}
## đ Structured Schema (Zero-Fabrication)
| Feature Key | Data Type |
| :--- | :--- |
| `Question` | `string` |
| `Reasoning_process` | `string` |
| `Final_response` | `string` |
| `Negative_reasoning_process` | `string` |
| `Negative_response` | `string` |
**Estimated Rows:** `9,186`
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Dataset Transparency Report
Technical metadata sourced from upstream repositories.
đ Identity & Source
- id
- hf-dataset--thefinai--fincot
- slug
- thefinai--fincot
- source
- huggingface
- author
- TheFinAI
- license
- tags
- size_categories:1k<n<10k, format:parquet, modality:text, library:datasets, library:pandas, library:mlcroissant, library:polars, arxiv:2502.08127, arxiv:2109.00122, arxiv:2210.03849, arxiv:2105.07624, arxiv:2405.07938, arxiv:2401.06915, region:us
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
- pipeline tag
đ Engagement & Metrics
- downloads
- 151,052
- stars
- 15
- forks
- 0
Data indexed from public sources. Updated daily.