📊
Dataset

FinCoT

by TheFinAI hf-dataset--thefinai--fincot
Nexus Index
29.1 Top 100%
S: Semantic 50
A: Authority 0
P: Popularity 60
R: Recency 25
Q: Quality 30
Tech Context
Vital Performance
0 DL / 30D
0.0%
Data Integrity 29.1 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--thefinai--fincot
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__thefinai__fincot,
  author = {TheFinAI},
  title = {FinCoT Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/thefinai/fincot}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
TheFinAI. (2026). FinCoT [Dataset]. Free2AITools. https://huggingface.co/datasets/thefinai/fincot

đŸ”ŦTechnical Deep Dive

Full Specifications [+]

âš–ī¸ Nexus Index V2.0

29.1
TOP 100% SYSTEM IMPACT
Semantic (S) 50
Authority (A) 0
Popularity (P) 60
Recency (R) 25
Quality (Q) 30

đŸ’Ŧ Index Insight

FNI V2.0 for FinCoT: Semantic (S:50), Authority (A:0), Popularity (P:60), Recency (R:25), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

Unbiased Data Node Refresh: VFS Live
âŦ‡ī¸
Downloads
151,052

đŸ‘ī¸ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

đŸ§Ŧ Field Logic

đŸ§Ŧ

Schema not yet indexed for this dataset.

Dataset Specification

Fino1 is a financial reasoning dataset based on FinQA, ConvFinQA, TATQA, DocMath-Eval, Econ-Logic, BizBench-QA, DocFinQA dataset., with GPT-4o-generated reasoning paths to enhance structured financial question answering.

For more details, please check our paper Fin-o1[arxiv.org/abs/2502.08127].

Source Data

Initial Data Collection and Normalization

The dataset originates from FinQA, TATQA, DocMath-Eval, Econ-Logic, BizBench-QA, DocFinQA dataset.

FinQA (Apache 2.0): A dataset for financial question answering, incorporating structured tables and textual context to test multi-step reasoning abilities.

TATQA (CC BY 4.0): A tabular question-answering dataset that includes diverse financial reports, allowing for multi-step reasoning over tables and text.

DocMath-Eval (MIT License): A dataset designed to evaluate mathematical reasoning over financial documents, focusing on quantitative financial statements.

Econ-Logic (CC BY-NC-SA 4.0): A dataset that requires logical reasoning over economic and financial texts, with restrictions on commercial use.

BizBench-QA (Apache 2.0): A business-focused question-answering dataset that tests contextual understanding and financial reasoning.

DocFinQA (MIT License): A financial QA dataset that includes multi-document reasoning, designed for comprehensive financial statement analysis.

ConvFinQA (MIT License): A dataset for conversational financial QA, allowing for multi-turn interactions and progressive information extraction.

Annotations

Annotation Process

We employ an iterative verification and refinement strategy, utilizing GPT-4o to generate a comprehensive reasoning process for each question-answer pair.

💡 Citation

If you use this dataset in your research, please cite the original paper and our paper:

bibtex

@article{qian2025fino1,
  title={Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance},
  author={Qian, Lingfei and Zhou, Weipeng and Wang, Yan and Peng, Xueqing and Huang, Jimin and Xie, Qianqian},
  journal={arXiv preprint arXiv:2502.08127},
  year={2025}
}

@article{chen2021finqa,
  title={Finqa: A dataset of numerical reasoning over financial data},
  author={Chen, Zhiyu and Chen, Wenhu and Smiley, Charese and Shah, Sameena and Borova, Iana and Langdon, Dylan and Moussa, Reema and Beane, Matt and Huang, Ting-Hao and Routledge, Bryan and others},
  journal={arXiv preprint arXiv:2109.00122},
  year={2021}

@article{chen2022convfinqa,
  title={Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering},
  author={Chen, Zhiyu and Li, Shiyang and Smiley, Charese and Ma, Zhiqiang and Shah, Sameena and Wang, William Yang},
  journal={arXiv preprint arXiv:2210.03849},
  year={2022}
}

@misc{zhu2021tatqaquestionansweringbenchmark,
      title={TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance}, 
      author={Fengbin Zhu and Wenqiang Lei and Youcheng Huang and Chao Wang and Shuo Zhang and Jiancheng Lv and Fuli Feng and Tat-Seng Chua},
      year={2021},
      eprint={2105.07624},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2105.07624}, 
}

@inproceedings{zhao2024docmath,
  title={DocMath-eval: Evaluating math reasoning capabilities of LLMs in understanding long and specialized documents},
  author={Zhao, Yilun and Long, Yitao and Liu, Hongjun and Kamoi, Ryo and Nan, Linyong and Chen, Lyuhao and Liu, Yixin and Tang, Xiangru and Zhang, Rui and Cohan, Arman},
  booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={16103--16120},
  year={2024}
}

@article{quan2024econlogicqa,
  title={Econlogicqa: A question-answering benchmark for evaluating large language models in economic sequential reasoning},
  author={Quan, Yinzhu and Liu, Zefang},
  journal={arXiv preprint arXiv:2405.07938},
  year={2024}
}

@inproceedings{krumdick2024bizbench,
  title={BizBench: A Quantitative Reasoning Benchmark for Business and Finance},
  author={Krumdick, Michael and Koncel-Kedziorski, Rik and Lai, Viet Dac and Reddy, Varshini and Lovering, Charles and Tanner, Chris},
  booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={8309--8332},
  year={2024}
}

@article{reddy2024docfinqa,
  title={Docfinqa: A long-context financial reasoning dataset},
  author={Reddy, Varshini and Koncel-Kedziorski, Rik and Lai, Viet Dac and Krumdick, Michael and Lovering, Charles and Tanner, Chris},
  journal={arXiv preprint arXiv:2401.06915},
  year={2024}
}

## 📊 Structured Schema (Zero-Fabrication)
| Feature Key | Data Type |
| :--- | :--- |
| `Question` | `string` |
| `Reasoning_process` | `string` |
| `Final_response` | `string` |
| `Negative_reasoning_process` | `string` |
| `Negative_response` | `string` |

**Estimated Rows:** `9,186`

Social Proof

HuggingFace Hub
151.1KDownloads
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id
hf-dataset--thefinai--fincot
slug
thefinai--fincot
source
huggingface
author
TheFinAI
license
tags
size_categories:1k<n<10k, format:parquet, modality:text, library:datasets, library:pandas, library:mlcroissant, library:polars, arxiv:2502.08127, arxiv:2109.00122, arxiv:2210.03849, arxiv:2105.07624, arxiv:2405.07938, arxiv:2401.06915, region:us

âš™ī¸ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

📊 Engagement & Metrics

downloads
151,052
stars
15
forks
0

Data indexed from public sources. Updated daily.