📊

Dataset

Github Code 2025 Language Split

Name: Github Code 2025 Language Split
Creator: lumees
License: ["other"]

by lumees hf-dataset--lumees--github-code-2025-language-split

Nexus Index

32.8 Top 100%

S: Semantic 50

A: Authority 0

P: Popularity 62

R: Recency 48

Q: Quality 30

Tech Context

Vital Performance

0 DL / 30D

0.0%

Source →

Data Integrity 32.8 FNI Score

- Size

- Rows

Parquet Format

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hf-dataset--lumees--github-code-2025-language-split
License	["other"]
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset__lumees__github_code_2025_language_split,
  author = {lumees},
  title = {Github Code 2025 Language Split Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/lumees/github-code-2025-language-split}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}

APA Style

lumees. (2026). Github Code 2025 Language Split [Dataset]. Free2AITools. https://huggingface.co/datasets/lumees/github-code-2025-language-split

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Nexus Index V2.0

Methodology Index Protocol

32.8

TOP 100% SYSTEM IMPACT

Semantic (S) 50

Authority (A) 0

Popularity (P) 62

Recency (R) 48

Quality (Q) 30

💬 Index Insight

FNI V2.0 for Github Code 2025 Language Split: Semantic (S:50), Authority (A:0), Popularity (P:62), Recency (R:48), Quality (Q:30).

Free2AITools Nexus Index

Verification Authority

HuggingFace API GitHub Metadata Arxiv Citation DB System Audit

Unbiased Data Node Refresh: VFS Live

⬇️

Downloads

218,748

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

📜 Source Data & Attribution

This dataset is a processed derivative of nick007x/github-code-2025.

Origination

The original data was aggregated by nick007x from public GitHub repositories. We have retained the original content, file paths, and metadata while restructuring the format for easier consumption by language-specific models.

Processing Steps

To create this dataset, we performed the following processing on the source data:

Language Identification: We mapped file extensions (e.g., .py, .rs, .ts) to their respective programming languages using a comprehensive extension map.
Splitting: The dataset was sharded and split into separate sub-directories/categories by programming language to allow for targeted loading (e.g., loading only Python or Rust data).
Filtering: Binary files and ambiguous extensions were categorized as "Unknown" or removed to ensure text-based model compatibility.

Licensing Information

The data contained in this dataset belongs to the original authors of the code repositories on GitHub.

Source Aggregation: The aggregation was provided by nick007x/github-code-2025.
Individual Code Files: Each file typically retains the license of its original repository (MIT, Apache 2.0, BSD, etc.). Users of this dataset are responsible for adhering to the license terms of the individual code files contained within.

Citation

If you use this dataset, please cite the original source:

bibtex

@misc{github-code-2025,
  author = {nick007x},
  title = {GitHub Code 2025 Dataset},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/datasets/nick007x/github-code-2025}}
}

📊 Structured Schema (Zero-Fabrication)

Feature Key	Data Type
`repo_id`	`string`
`size`	`int64`
`file_path`	`string`
`content`	`string`

Estimated Rows: 494,903

Social Proof

HuggingFace Hub

218.7KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--lumees--github-code-2025-language-split
slug: lumees--github-code-2025-language-split
source: huggingface
author: lumees
license: ["other"]
tags: source_datasets:nick007x/github-code-2025, license:other, size_categories:100m<n<1b, format:parquet, modality:text, library:datasets, library:dask, library:polars, library:mlcroissant, region:us

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 218,748
stars: 6
forks: 0

Data indexed from public sources. Updated daily.

Welcome to Free2AI Tools!

Smart Search

FNI Score

You're All Set!