Tigas Dataset
Pillar scores are computed during the next indexing cycle.
![Dataset Si...
| Entity Passport | |
| Registry ID | hf-dataset--h1merka--tigas_dataset |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__h1merka__tigas_dataset,
author = {H1merka},
title = {Tigas Dataset Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/H1merka/TIGAS_dataset}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
âī¸ Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for Tigas Dataset: Semantic (S:50), Authority (A:0), Popularity (P:0), Recency (R:0), Quality (Q:0).
Verification Authority
đī¸ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
đ Explore Full Dataset âđ§Ŧ Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
license: mit
task_categories:
- image-classification
language: - en
tags: - ai-generated-image-detection
- deepfake-detection
- synthetic-image-detection
- computer-vision
- binary-classification
- gan-detection
- diffusion-model-detection
size_categories: - 100K<n<1M
pretty_name: TIGAS Dataset
dataset_info:
features:- name: image_path
dtype: string - name: label
dtype: int64
splits: - name: train
num_examples: 128776 - name: test
num_examples: 14126
- name: image_path
TIGAS Dataset
A comprehensive dataset for training AI-generated image detection models
Dataset Description
The TIGAS Dataset is a large-scale collection of real and AI-generated images designed for training and evaluating AI-generated image detection models. It contains 142,902 images from diverse sources, including state-of-the-art generative models.
Key Features
- Binary classification task: Real (label=0) vs AI-Generated/Fake (label=1)
- Diverse generators: 19 different image sources including GANs and diffusion models
- Balanced split: ~54% real, ~46% fake images
- High-quality annotations: CSV format with image paths and labels
- Ready-to-use: Compatible with PyTorch and standard ML pipelines
Dataset Statistics
Overall Distribution
| Split | Total Images | Real (label=0) | Fake (label=1) | Real % |
|---|---|---|---|---|
| Train | 128,776 | 69,772 | 59,004 | 54.2% |
| Test | 14,126 | 7,037 | 7,089 | 49.8% |
| Total | 142,902 | 76,809 | 66,093 | 53.7% |
Image Sources (Train Split)
The dataset includes images from the following generators and sources:
| Source | Images | Type | Description |
|---|---|---|---|
art002_4 |
10,986 | Mixed | Artistic images subset 4 |
art002_1 |
10,801 | Mixed | Artistic images subset 1 |
VQDM |
9,518 | Generated | Vector Quantized Diffusion Model |
sd14 |
9,517 | Generated | Stable Diffusion 1.4 |
Midjourney |
9,516 | Generated | Midjourney AI |
Glide |
9,513 | Generated | OpenAI GLIDE |
wuk |
9,510 | Mixed | Mixed source images |
art002_3 |
8,295 | Mixed | Artistic images subset 3 |
gaugan |
7,992 | Generated | NVIDIA GauGAN |
art002_2 |
6,911 | Mixed | Artistic images subset 2 |
sd15_1 |
6,353 | Generated | Stable Diffusion 1.5 subset 1 |
sd15_2 |
6,349 | Generated | Stable Diffusion 1.5 subset 2 |
art001 |
5,966 | Mixed | Artistic images |
ADM |
4,756 | Mixed | Ablated Diffusion Model (ImageNet) |
biggan |
3,200 | Generated | BigGAN |
stargan |
3,198 | Generated | StarGAN (face manipulation) |
sd_xl |
3,196 | Generated | Stable Diffusion XL |
face |
1,600 | Mixed | Face images |
DALLE2 |
â | Generated | DALL-E 2 (fake only in subset) |
Image Formats
| Format | Count | Percentage |
|---|---|---|
| PNG | 48,130 | 37.4% |
| JPG | 44,414 | 34.5% |
| JPEG | 34,632 | 26.9% |
| jpeg | 1,600 | 1.2% |
Dataset Structure
TIGAS/
âââ LICENSE # MIT License
âââ README.md # This file
âââ train/
â âââ annotations01.csv # Training annotations (128,776 entries)
â âââ images/
â âââ ADM/
â â âââ 0_real/ # Real images from ImageNet
â â âââ 1_fake/ # Generated by ADM
â âââ art001/
â â âââ 0_real/
â â âââ 1_fake/
â âââ art002_1/ ... art002_4/
â âââ biggan/
â âââ DALLE2/
â âââ face/
â âââ gaugan/
â âââ Glide/
â âââ Midjourney/
â âââ sd_xl/
â âââ sd14/
â âââ sd15_1/
â âââ sd15_2/
â âââ stargan/
â âââ VQDM/
â âââ wuk/
âââ test/
âââ annotations01.csv # Test annotations (14,126 entries)
Annotation Format
The CSV files contain two columns:
image_path,label
images\ADM\0_real\ILSVRC2012_val_00000005.JPEG,0
images\Midjourney\1_fake\image_001.png,1
- image_path: Relative path to the image file (Windows-style backslashes)
- label: Binary label where:
0= Real/Natural image1= AI-Generated/Fake image
Note: The
testsplit uses the sameimages/directory astrainbut with different image subsets defined in its annotation file.
Usage
Loading with Python
import pandas as pd
from pathlib import Path
from PIL import Image
Load annotations
data_root = Path("TIGAS")
train_df = pd.read_csv(data_root / "train" / "annotations01.csv")
test_df = pd.read_csv(data_root / "test" / "annotations01.csv")
Convert Windows paths to current OS format
train_df['image_path'] = train_df['image_path'].str.replace('\', '/')
Load an image
def load_image(row):
img_path = data_root / "train" / row['image_path']
image = Image.open(img_path).convert('RGB')
label = row['label']
return image, label
Example
image, label = load_image(train_df.iloc[0])
print(f"Label: {'Real' if label == 0 else 'Fake'}")
Loading with PyTorch
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
import pandas as pd
from PIL import Image
from pathlib import Path
class TIGASDataset(Dataset):
def init(self, root_dir, split='train', transform=None):
self.root_dir = Path(root_dir)
self.split = split
self.transform = transform
text
# Load annotations
ann_path = self.root_dir / split / "annotations01.csv"
self.annotations = pd.read_csv(ann_path)
self.annotations['image_path'] = self.annotations['image_path'].str.replace('\\', '/')
# Images are always in train/images/
self.images_dir = self.root_dir / "train"
def __len__(self):
return len(self.annotations)
def __getitem__(self, idx):
row = self.annotations.iloc[idx]
img_path = self.images_dir / row['image_path']
image = Image.open(img_path).convert('RGB')
label = row['label']
if self.transform:
image = self.transform(image)
return image, label
Example usage
transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])
train_dataset = TIGASDataset("TIGAS", split='train', transform=transform)
test_dataset = TIGASDataset("TIGAS", split='test', transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=4)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=4)
Using with TIGAS Model
from tigas import TIGAS
Initialize model (auto-downloads from HuggingFace)
tigas = TIGAS(auto_download=True, device='cuda')
Evaluate on dataset images
from pathlib import Path
import pandas as pd
data_root = Path("TIGAS")
test_df = pd.read_csv(data_root / "test" / "annotations01.csv")
test_df['image_path'] = test_df['image_path'].str.replace('\', '/')
Evaluate first 10 images
for i, row in test_df.head(10).iterrows():
img_path = data_root / "train" / row['image_path']
score = tigas(str(img_path))
true_label = "Real" if row['label'] == 0 else "Fake"
pred_label = "Real" if score > 0.5 else "Fake"
print(f"{img_path.name}: Score={score:.4f}, True={true_label}, Pred={pred_label}")
Generators Included
Diffusion Models
- Stable Diffusion 1.4, 1.5, XL - Open-source text-to-image diffusion models
- DALL-E 2 - OpenAI's text-to-image model
- Midjourney - Commercial text-to-image service
- GLIDE - OpenAI's guided language-to-image diffusion
- ADM - Ablated Diffusion Model (class-conditional on ImageNet)
- VQDM - Vector Quantized Diffusion Model
GANs (Generative Adversarial Networks)
- BigGAN - Large-scale class-conditional GAN
- GauGAN - NVIDIA's semantic image synthesis
- StarGAN - Multi-domain face manipulation
Citation
If you use this dataset in your research, please cite:
@dataset{tigas_dataset_2025,
title={TIGAS Dataset: A Comprehensive Collection for AI-Generated Image Detection},
author={Morgenshtern, Dmitrij},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/datasets/H1merka/TIGAS-dataset}
}
License
This dataset is released under the MIT License.
Important: Individual images in this dataset may be derived from or generated using various models with their own licensing terms:
- ImageNet images (in
0_realfolders) are subject to ImageNet terms of use - Generated images are outputs of the respective models (Stable Diffusion, Midjourney, etc.)
The annotations and dataset organization are MIT licensed.
Related Resources
- TIGAS Model: huggingface.co/H1merka/TIGAS
- GitHub Repository: github.com/H1merka/TIGAS
Changelog
- v1.0 (December 2025): Initial release with 142,902 images from 19 sources
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Dataset Transparency Report
Verified data manifest for traceability and transparency.
đ Identity & Source
- id
- hf-dataset--h1merka--tigas_dataset
- source
- huggingface
- author
- H1merka
- tags
- task_categories:image-classificationlanguage:enlicense:mitsize_categories:100k
modality:imageregion:usai-generated-image-detectiondeepfake-detectionsynthetic-image-detectioncomputer-visionbinary-classificationgan-detectiondiffusion-model-detection
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
đ Engagement & Metrics
- likes
- 0
- downloads
- 66,500
Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)