SiliciclasticReservoirs
| Entity Passport | |
| Registry ID | hf-dataset--anonymouscientist--siliciclasticreservoirs |
| License | CC-BY-4.0 |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__anonymouscientist__siliciclasticreservoirs,
author = {AnonymouScientist},
title = {SiliciclasticReservoirs Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/AnonymouScientist/SiliciclasticReservoirs}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
âī¸ Free2AITools Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for SiliciclasticReservoirs: Semantic (S:50), Authority (A:61), Popularity (P:51), Recency (R:93), Quality (Q:50).
Verification Authority
đī¸ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
đ Explore Full Dataset âđ§Ŧ Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
SiliciclasticReservoirs
1,000,000 synthetic 3D siliciclastic-reservoir geology cubes generated from rule-based sedimentological simulations (turbidite lobes + 6 fluvial-channel architectures + delta-fan distributary). Cubes are voxelized at (64, 64, 32) cells. Each sample carries facies, porosity, permeability, and a structured set of geological conditioning parameters.
Designed to train conditional generative models (flow matching, diffusion, etc.) of subsurface geology under interpretable physical conditioning.
Quick stats
- 1,000,000 samples across 8 reservoir architectures
- Cube shape:
(64, 64, 32)voxels â(x, y, z),zis depth (0 = base) - 4 voxel arrays per sample: binary facies, 6-class facies, porosity, permeability
- Compact slim parquet (9â12 cols) and full reproducibility parquet (25â75 cols)
- Captions for text-conditioning experiments
- Fully reproducible â every sample regenerable from
(seed, params)
File structure
SiliciclasticReservoirs/
âââ README.md â this file
âââ DATASHEET.md â Datasheet for Datasets (Gebru et al. 2018)
âââ splits/ â deterministic 90/5/5 split, stratified by layer_type
â âââ train.parquet â 900,000 rows
â âââ validation.parquet â 50,000 rows
â âââ test.parquet â 50,000 rows
âââ lobe/ â 200,000 samples
â âââ shard_0000/
â â âââ facies.npy (n, 64, 64, 32) int8 â binary 0/1
â â âââ facies_alluvsim.npy (n, 64, 64, 32) int8 â 6-class -1..4
â â âââ poro.npy (n, 64, 64, 32) float16 â porosity in [0, 0.5]
â â âââ perm.npy (n, 64, 64, 32) float16 â permeability in mD [0, 60000]
â â âââ params_slim.parquet â training conditioning (use this)
â â âââ params.parquet â full physics for reproducibility
â âââ shard_0001/
â âââ âĻ 256 shards total
âââ channel_pv_shoestring/ â 100,000 samples, 256 shards
âââ channel_cb_labyrinth/ â 100,000 samples, 256 shards
âââ channel_cb_jigsaw/ â 150,000 samples, 256 shards
âââ channel_sh_distal/ â 100,000 samples, 256 shards
âââ channel_sh_proximal/ â 100,000 samples, 256 shards
âââ channel_meander_oxbow/ â 100,000 samples, 256 shards
âââ delta/ â 150,000 samples, 256 shards
256 shards per preset, each ~380 MB â 1.2 GB depending on per-shard sample count.
Layer types
layer_type |
architecture | samples |
|---|---|---|
lobe |
turbidite lobe (deep-water gravity-flow deposit) | 200,000 |
channel:PV_SHOESTRING |
paleo-valley shoestring sandstone | 100,000 |
channel:CB_JIGSAW |
channel-and-bar bodies, jigsaw connectivity | 150,000 |
channel:CB_LABYRINTH |
channel-and-bar bodies, labyrinthine connectivity | 100,000 |
channel:SH_DISTAL |
distal sheet-sandstone | 100,000 |
channel:SH_PROXIMAL |
proximal sheet-sandstone | 100,000 |
channel:MEANDER_OXBOW |
multi-storey meander-belt with oxbow plugs | 100,000 |
delta |
prograding distributary-fan delta | 150,000 |
These follow Pyrcz & Deutsch's standard fluvial-architecture taxonomy ("User Guide to the Alluvsim Program", 2004).
Voxel arrays per sample
| file | dtype | shape | description |
|---|---|---|---|
facies.npy |
int8 |
(N, 64, 64, 32) |
binary facies: 0 = mud, 1 = sand. Primary signal for generative training. |
facies_alluvsim.npy |
int8 |
(N, 64, 64, 32) |
6-class facies preserving architectural sub-detail: -1 = FF (overbank fines), 0 = FFCH (mud plug), 1 = CS (crevasse splay), 2 = LV (levee), 3 = LA (lateral-accretion / point-bar), 4 = CH (active channel). Lobe samples only contain {-1, 3}. |
poro.npy |
float16 |
(N, 64, 64, 32) |
porosity, range [0, 0.5] |
perm.npy |
float16 |
(N, 64, 64, 32) |
permeability in millidarcies (mD), range [0, 60000]. Use log10(perm + 1e-3) for ML â perm spans 5+ decades. |
All four arrays are row-aligned within a shard (sample i of facies corresponds to row i of the parquets and to sample i of all other arrays).
Slim parquet (`params_slim.parquet`) â the training conditioning
This is what you train on as conditioning. It carries the geological parameters that vary across samples.
Universal columns (all 8 layer types)
| column | dtype | description |
|---|---|---|
layer_type |
str | one of the 8 strings above |
caption |
str | human-readable description, e.g. "Paleo-valley shoestring reservoir with sinuosity 1.55, channel depth 9.7 m, âĻ". Useful for text-conditioned variants. |
ntg |
float32 | realized net-to-gross over active cells (post-crop), range [0, 1] |
poro_ave |
float32 | realized mean porosity over active cells |
perm_ave |
float32 | realized mean of log10(perm in mD) over active cells |
azimuth |
float32 | regional flow direction, compass-CW degrees [0, 360) |
width_cells |
float32 | characteristic horizontal extent in cells. Lobe = 2 Ã semi-minor diameter. Channel/delta = full channel width. |
depth_cells |
float32 | characteristic vertical extent in cells. Lobe = thickness dh_ave. Channel/delta = mCHdepth. |
Family-specific columns (NULL outside family)
| column | dtype | family | description |
|---|---|---|---|
asp |
float32 | lobe | ellipse aspect ratio (major/minor), [1.0, 2.5] |
mCHsinu |
float32 | channel + delta | sinuosity (path / straight-line), [1.05, 1.85] |
probAvulInside |
float32 | channel + delta | per-event in-belt avulsion probability, [0, 0.8] |
mFFCHprop |
float32 | channel + delta | abandoned-channel mud-plug fraction, [0, 0.7] |
trunk_length_fraction |
float32 | delta | proximal-trunk fraction protected from avulsion (controls fan-apex position), [0.1, 0.5] |
Total: 9 cols for lobe rows, 11 for channel rows, 12 for delta rows. Rows from different families share the universal columns and have NULL in non-applicable family columns.
Full parquet (`params.parquet`) â for reproducibility
25 columns for lobe rows, 73 for channel rows, 75 for delta rows. Includes the per-realization seeds, the per-event statistical knob widths (stdev*), the engine-internal event-budget knobs (ntime*), and the FACIES_PROPS extras. Don't train on this â most columns are constant or redundant. Useful for:
- Re-generating a specific sample bit-for-bit (
ChannelLayer(...).create_geology(**row.to_dict())with the saved seed) - Diagnostic analysis of the engine's behavior
Splits
splits/{train,validation,test}.parquet define a deterministic 90 / 5 / 5 partition stratified by layer_type (each split has all 8 architectures in proportion). Generated with seed 42.
Each row is (layer_type, shard_dir, sample_idx) pointing into one of the data shards.
Cell-physical-size note
Cube shape is identical (64Ã64Ã32) but the physical cell size differs by family:
| family | dx = dy | dz | physical extent |
|---|---|---|---|
lobe |
100 m | 1 m | 6.4 km à 6.4 km à 32 m |
channel:* and delta |
10 m | 1 m | 640 m à 640 m à 32 m |
For training in cell-units, this difference is hidden inside width_cells / depth_cells. The model can ignore physical scale and just train on the cube as-is.
Loading example (PyTorch)
import numpy as np
import pyarrow.parquet as pq
from pathlib import Path
import torch
from torch.utils.data import IterableDataset
class ReservoirShardDataset(IterableDataset):
"""One worker walks an assigned subset of shards, yields (cube, params)."""
def __init__(self, root: Path, split: str = "train"):
self.root = Path(root)
# Read split index
self.index = pq.read_table(self.root / f"splits/{split}.parquet").to_pylist()
def __iter__(self):
worker = torch.utils.data.get_worker_info()
my_rows = (self.index if worker is None
else self.index[worker.id::worker.num_workers])
cur_shard = None; cur_data = None
for row in my_rows:
shard_dir = self.root / row["shard_dir"]
if shard_dir != cur_shard:
cur_shard = shard_dir
cur_data = {
"facies": np.load(shard_dir / "facies.npy", mmap_mode="r"),
"poro": np.load(shard_dir / "poro.npy", mmap_mode="r"),
"perm": np.load(shard_dir / "perm.npy", mmap_mode="r"),
"params": pq.read_table(
shard_dir / "params_slim.parquet").to_pylist(),
}
i = row["sample_idx"]
yield {
"facies": np.asarray(cur_data["facies"][i], dtype=np.int8),
"poro": np.asarray(cur_data["poro"][i], dtype=np.float32),
"perm": np.asarray(cur_data["perm"][i], dtype=np.float32),
"params": cur_data["params"][i],
}
ds = ReservoirShardDataset("/path/to/SiliciclasticReservoirs", split="train")
loader = torch.utils.data.DataLoader(ds, batch_size=8, num_workers=4)
Recommended preprocessing for flow-matching training
faciesâ[0, 1]float: justcube.astype(np.float32)poroâ standardized:(poro - 0.15) / 0.10keeps it roughly N(0, 1)-ish for sand cellspermâ log-perm:np.log10(np.maximum(perm, 1e-3)), then maybe standardizelayer_typeâ categorical embedding (8 classes). Uselayer_type.split(":")[0]for family-level (3 classes) or full string for fine-grained (8 classes).- Other slim columns â continuous embeddings, with
nullâ learned sentinel or per-family mask.
Geological caveats
- Channel topology is tree-like, not anastomosing. Avulsions split channels but never re-merge. The braided / jigsaw look in CB_LABYRINTH / CB_JIGSAW comes from heavily overlapping stamps in the cube, not from explicit topological merging. For facies-based generative training this doesn't matter; for downstream flow simulation, be aware.
- Per-cell poro/perm follow a Walker-1992 upward-fining ramp (highest at thalweg base, decaying upward and laterally toward banks) plus a per-event Kozeny-Carman-coupled draw plus a per-realization "regional rock quality" multiplier. Channels in the same cube vary in quality (different events had different sediment supply); different cubes span the full quality range.
- Channels have FACIES_PROPS-driven base poro/perm values that are conservative midpoints from fluvial-reservoir literature (Pyrcz & Deutsch 2002, Allen 1965, Walker 1992); they're not from a specific reservoir analog.
- Delta has no mouth bars painted (
paint_mouth_bars=False). The delta is a distributary network, not a clinoform with bar deposits. - Cubes are post-cropped (8 cells trimmed from each lateral edge, 9 from top + 9 from bottom of z) to remove engine boundary artifacts. The internal engine ran on a slightly larger grid before cropping.
See DATASHEET.md for full documentation.
Citation
@misc{SiliciclasticReservoirs_2026,
author = {Anonymous},
title = {{SiliciclasticReservoirs}: 1M Synthetic 3D Reservoir Geology Cubes for Conditional Generative Modeling},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/datasets/AnonymouScientist/SiliciclasticReservoirs}}
}
License
Creative Commons Attribution 4.0 International (CC-BY-4.0). Use, share, modify, redistribute â just attribute the source.
Contact / Issues
File issues at the dataset repository on HuggingFace.
đ Structured Schema (Zero-Fabrication)
| Feature Key | Data Type |
|---|---|
layer_type |
string |
shard_dir |
string |
sample_idx |
int64 |
Estimated Rows: 1,000,000
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
đĄī¸ Dataset Transparency Report
Technical metadata sourced from upstream repositories.
đ Identity & Source
- id
- hf-dataset--anonymouscientist--siliciclasticreservoirs
- slug
- anonymouscientist--siliciclasticreservoirs
- source
- huggingface
- author
- AnonymouScientist
- license
- CC-BY-4.0
- tags
- task_categories:image-segmentation, task_categories:image-classification, task_categories:text-to-image, task_categories:other, language:en, license:cc-by-4.0, size_categories:1m<n<10m, format:parquet, modality:text, modality:3d, library:datasets, library:pandas, library:polars, library:mlcroissant, region:us, geology, earth-sciences, reservoir, synthetic, voxel, 3d, flow-matching, diffusion, generative-modeling, subsurface, petroleum, groundwater, carbon-storage, conditional-generation
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
- pipeline tag
đ Engagement & Metrics
- downloads
- 32,969
- stars
- 0
- forks
- null
Data indexed from public sources. Updated daily.