Github Code 2025 Language Split
--- source_datasets: - nick007x/github-code-2025 other configs: - config_name: assembly data_files: - split: train path: Assembly/*.parquet - config_name: batch data_files: - split: train path: Batch/*.parquet - config_name: c data_files: - split: train path: C/*.parquet - config_name: csharp data_f...
| Entity Passport | |
| Registry ID | hf-dataset--lumees--github-code-2025-language-split |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__lumees__github_code_2025_language_split,
author = {Community},
title = {Github Code 2025 Language Split Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/lumees/github-code-2025-language-split}},
note = {Accessed via Free2AITools Knowledge Fortress}
} π¬Technical Deep Dive
Full Specifications [+]βΎ
π¬ Why this score?
The Nexus Index for Github Code 2025 Language Split aggregates Popularity (P:0), Velocity (V:0), and Credibility (C:0). The Utility score (U:0) represents deployment readiness, context efficiency, and structural reliability within the Nexus ecosystem.
π Source Links (Click to verify)
ποΈ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
π Explore Full Dataset β𧬠Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
source_datasets:
- nick007x/github-code-2025
license: - other
configs: - config_name: assembly
data_files:- split: train
path: Assembly/*.parquet
- split: train
- config_name: batch
data_files:- split: train
path: Batch/*.parquet
- split: train
- config_name: c
data_files:- split: train
path: C/*.parquet
- split: train
- config_name: csharp
data_files:- split: train
path: C#/*.parquet
- split: train
- config_name: cpp
data_files:- split: train
path: C++/*.parquet
- split: train
- config_name: css
data_files:- split: train
path: CSS/*.parquet
- split: train
- config_name: clojure
data_files:- split: train
path: Clojure/*.parquet
- split: train
- config_name: crystal
data_files:- split: train
path: Crystal/*.parquet
- split: train
- config_name: dart
data_files:- split: train
path: Dart/*.parquet
- split: train
- config_name: dockerfile
data_files:- split: train
path: Dockerfile/*.parquet
- split: train
- config_name: elixir
data_files:- split: train
path: Elixir/*.parquet
- split: train
- config_name: erlang
data_files:- split: train
path: Erlang/*.parquet
- split: train
- config_name: fsharp
data_files:- split: train
path: F#/*.parquet
- split: train
- config_name: go
data_files:- split: train
path: Go/*.parquet
- split: train
- config_name: gradle
data_files:- split: train
path: Gradle/*.parquet
- split: train
- config_name: groovy
data_files:- split: train
path: Groovy/*.parquet
- split: train
- config_name: html
data_files:- split: train
path: HTML/*.parquet
- split: train
- config_name: haskell
data_files:- split: train
path: Haskell/*.parquet
- split: train
- config_name: json
data_files:- split: train
path: JSON/*.parquet
- split: train
- config_name: java
data_files:- split: train
path: Java/*.parquet
- split: train
- config_name: javascript
data_files:- split: train
path: JavaScript/*.parquet
- split: train
- config_name: julia
data_files:- split: train
path: Julia/*.parquet
- split: train
- config_name: kotlin
data_files:- split: train
path: Kotlin/*.parquet
- split: train
- config_name: lua
data_files:- split: train
path: Lua/*.parquet
- split: train
- config_name: makefile
data_files:- split: train
path: Makefile/*.parquet
- split: train
- config_name: markdown
data_files:- split: train
path: Markdown/*.parquet
- split: train
- config_name: nim
data_files:- split: train
path: Nim/*.parquet
- split: train
- config_name: ocaml
data_files:- split: train
path: OCaml/*.parquet
- split: train
- config_name: objective_c
data_files:- split: train
path: Objective-C/*.parquet
- split: train
- config_name: objective_cpp
data_files:- split: train
path: Objective-C++/*.parquet
- split: train
- config_name: php
data_files:- split: train
path: PHP/*.parquet
- split: train
- config_name: perl
data_files:- split: train
path: Perl/*.parquet
- split: train
- config_name: powershell
data_files:- split: train
path: PowerShell/*.parquet
- split: train
- config_name: python
data_files:- split: train
path: Python/*.parquet
- split: train
- config_name: r
data_files:- split: train
path: R/*.parquet
- split: train
- config_name: ruby
data_files:- split: train
path: Ruby/*.parquet
- split: train
- config_name: rust
data_files:- split: train
path: Rust/*.parquet
- split: train
- config_name: sql
data_files:- split: train
path: SQL/*.parquet
- split: train
- config_name: scala
data_files:- split: train
path: Scala/*.parquet
- split: train
- config_name: shell
data_files:- split: train
path: Shell/*.parquet
- split: train
- config_name: solidity
data_files:- split: train
path: Solidity/*.parquet
- split: train
- config_name: svelte
data_files:- split: train
path: Svelte/*.parquet
- split: train
- config_name: swift
data_files:- split: train
path: Swift/*.parquet
- split: train
- config_name: systemverilog
data_files:- split: train
path: SystemVerilog/*.parquet
- split: train
- config_name: toml
data_files:- split: train
path: TOML/*.parquet
- split: train
- config_name: text
data_files:- split: train
path: Text/*.parquet
- split: train
- config_name: typescript
data_files:- split: train
path: TypeScript/*.parquet
- split: train
- config_name: unknown
data_files:- split: train
path: Unknown/*.parquet
- split: train
- config_name: vhdl
data_files:- split: train
path: VHDL/*.parquet
- split: train
- config_name: verilog
data_files:- split: train
path: Verilog/*.parquet
- split: train
- config_name: visual_basic
data_files:- split: train
path: Visual Basic/*.parquet
- split: train
- config_name: vue
data_files:- split: train
path: Vue/*.parquet
- split: train
- config_name: xml
data_files:- split: train
path: XML/*.parquet
- split: train
- config_name: yaml
data_files:- split: train
path: YAML/*.parquet
- split: train
- config_name: zig
data_files:- split: train
path: Zig/*.parquet
- split: train
π Source Data & Attribution
This dataset is a processed derivative of nick007x/github-code-2025.
Origination
The original data was aggregated by nick007x from public GitHub repositories. We have retained the original content, file paths, and metadata while restructuring the format for easier consumption by language-specific models.
Processing Steps
To create this dataset, we performed the following processing on the source data:
- Language Identification: We mapped file extensions (e.g.,
.py,.rs,.ts) to their respective programming languages using a comprehensive extension map. - Splitting: The dataset was sharded and split into separate sub-directories/categories by programming language to allow for targeted loading (e.g., loading only Python or Rust data).
- Filtering: Binary files and ambiguous extensions were categorized as "Unknown" or removed to ensure text-based model compatibility.
Licensing Information
The data contained in this dataset belongs to the original authors of the code repositories on GitHub.
- Source Aggregation: The aggregation was provided by
nick007x/github-code-2025. - Individual Code Files: Each file typically retains the license of its original repository (MIT, Apache 2.0, BSD, etc.). Users of this dataset are responsible for adhering to the license terms of the individual code files contained within.
Citation
If you use this dataset, please cite the original source:
@misc{github-code-2025,
author = {nick007x},
title = {GitHub Code 2025 Dataset},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/datasets/nick007x/github-code-2025}}
}
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
π‘οΈ Dataset Transparency Report
Verified data manifest for traceability and transparency.
π Identity & Source
- id
- hf-dataset--lumees--github-code-2025-language-split
- source
- huggingface
- author
- Community
- tags
- source_datasets:nick007x/github-code-2025license:othersize_categories:100m
format:parquetmodality:textlibrary:datasetslibrary:dasklibrary:polarslibrary:mlcroissantregion:us
βοΈ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
π Engagement & Metrics
- likes
- 5
- downloads
- 14,114
Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)