Hnm Search Data
--- configs: - config_name: articles data_files: - split: train path: data/raw/articles.csv - config_name: customers data_files: - split: train path: data/raw/customers.csv - config_name: transactions data_files: - split: train path: data/raw/transactions_train.csv dataset_info: - config_name: artic...
| Entity Passport | |
| Registry ID | hf-dataset--rajeev-gupta--hnm-search-data |
| Provider | huggingface |
Cite this dataset
Academic & Research Attribution
@misc{hf_dataset__rajeev_gupta__hnm_search_data,
author = {Rajeev Gupta},
title = {Hnm Search Data Dataset},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/rajeev-gupta/hnm-search-data}},
note = {Accessed via Free2AITools Knowledge Fortress}
} π¬Technical Deep Dive
Full Specifications [+]βΎ
π¬ Why this score?
The Nexus Index for Hnm Search Data aggregates Popularity (P:0), Velocity (V:0), and Credibility (C:0). The Utility score (U:0) represents deployment readiness, context efficiency, and structural reliability within the Nexus ecosystem.
π Source Links (Click to verify)
ποΈ Data Preview
Row-level preview not available for this dataset.
Schema structure is shown in the Field Logic panel when available.
π Explore Full Dataset β𧬠Field Logic
Schema not yet indexed for this dataset.
Dataset Specification
configs:
- config_name: articles
data_files:- split: train
path: data/raw/articles.csv
- split: train
- config_name: customers
data_files:- split: train
path: data/raw/customers.csv
- split: train
- config_name: transactions
data_files:- split: train
path: data/raw/transactions_train.csv
dataset_info:
- split: train
- config_name: articles
features:- name: article_id
dtype: int64 - name: product_code
dtype: int64 - name: prod_name
dtype: string - name: product_type_no
dtype: int64 - name: product_type_name
dtype: string - name: product_group_name
dtype: string - name: graphical_appearance_no
dtype: int64 - name: graphical_appearance_name
dtype: string - name: colour_group_code
dtype: int64 - name: colour_group_name
dtype: string - name: perceived_colour_value_id
dtype: int64 - name: perceived_colour_value_name
dtype: string - name: perceived_colour_master_id
dtype: int64 - name: perceived_colour_master_name
dtype: string - name: department_no
dtype: int64 - name: department_name
dtype: string - name: index_code
dtype: string - name: index_name
dtype: string - name: index_group_no
dtype: int64 - name: index_group_name
dtype: string - name: section_no
dtype: int64 - name: section_name
dtype: string - name: garment_group_no
dtype: int64 - name: garment_group_name
dtype: string - name: detail_desc
dtype: string
- name: article_id
- config_name: customers
features:- name: customer_id
dtype: string - name: FN
dtype: float64 - name: Active
dtype: float64 - name: club_member_status
dtype: string - name: fashion_news_frequency
dtype: string - name: age
dtype: float64 - name: postal_code
dtype: string
- name: customer_id
- config_name: transactions
features:- name: t_dat
dtype: string - name: customer_id
dtype: string - name: article_id
dtype: int64 - name: price
dtype: float64 - name: sales_channel_id
dtype: int64
task_categories:
- name: t_dat
- text-ranking
- text-retrieval
- text-classification
language: - en
pretty_name: 'H&M Search Queries and Personalized Results '
size_categories: - 10M<n<100M
tags: - fashion
- e-commerce
- customer-behavior
- tabular
- recommendation-systems
- search
- ranking
HnM Search Dataset Created from Recommendations Dataset
This synthetic data-set is created using the recommendations dataset:
- https://huggingface.co/datasets/einrafh/hnm-fashion-recommendations-data (Use of this dataset is subject to the terms and conditions set forth on the original distribution page. This dataset is intended for non-commercial and research use.)
- https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations/data (DATA ACCESS AND USE: Non-Commercial Purposes & Academic Research.)
as base. The base dataset is a recommendations data set where transactions data has the articles purchased by the users. This dataset gives the search queries, which may have been issued by the user before buying the article, along with the candidate results.
The license for our additions is https://cdla.dev/permissive-2-0/
Search Queries Dataset
queries.csv:253685List of queries for transactions.qrels.csv:253685List of positive and negative article-ids which were retrieved for each query.
Base Dataset
articles.csv:105542List of unique products/articles with their properties/features.customers.csv:1371980List of unique customers/users with their properties/features.transactions_train.csv:31788324List of historical transactions/purchases of different articles by customers.
π Dataset Structure & Components
All search queries data is located in the folder 'data/search/' directory.
data/search/queries.csv
Queries generated from individual transactions (transactions_train.csv).
(253685 rows, 3 columns: query_id, transaction_id, and query_text)data/search/qrels.csv
Query results candidates-- positives (from the transaction) and close negatives article_ids (from articles.csv) .
(253685 rows, 3 columns: query_id, positive_ids, negatives_ids (space separated))
All raw (recommendations) data is located in the data/raw/ directory.
data/raw/transactions_train.csv
A historical record of all purchase transactions. This file serves as a central table connecting customers with the articles they purchased.
(31,788,324 rows, 5 columns)data/raw/customers.csv
This dimension table contains attributes for each unique customer.
(1,371,980 rows, 7 columns)data/raw/articles.csv
This dimension table contains highly detailed attributes for each unique product (article).
(105,542 rows, 25 columns)data/raw/images/
This directory contains product images, organized into subdirectories based on the first 3 digits of thearticle_id.
π Relationships Between Search Data
These files can be combined (joined) to create a comprehensive dataset for analysis:
query_id can be used to join the files queries.csv and qrels.csv to get the textual queries and the corresponding resultant articles.
Similarly, transaction_id (from queries.csv) can be used to get the details of corresponding transactions using transactions_train.csv.
positive_ids and negative_ids (from qrels.csv) can be used to join with articles.csv to get the details of the result articles (both positive-- which the user purchased-- and negatives)
πData Schema
Data schema for transactions_train.csv, 'customers.csv', and 'articles.csv' can be obtained from https://huggingface.co/datasets/einrafh/hnm-fashion-recommendations-data.
Here is the schema for the search data.
queries.csv
| column | Description | Type |
|---|---|---|
query_id |
Unique ID for the query(Primary Key) | object (String) |
transaction_id |
Unique ID for the transaction(Foreign Key) | object (String) |
query_text |
Text of the query | object (String) |
qrels.csv
| column | Description | Type |
|---|---|---|
query_id |
ID for the query(Foreign Key) | object (String) |
positive_ids |
ID for the positive result(Foreign Key) which the user clicked/purchased | object (String) |
negative_ids |
Space separated list of IDs for the negative result(Foreign Key) which the user didn't click/purchase | object (String) |
π Source
The base dataset is provided to the public by H&M Group through the Kaggle platform for analysis and research purposes. We have added search queries over the base dataset.
- Platform: Kaggle, H&M Personalized Fashion Recommendations
β οΈ License
The use of this dataset is subject to the terms and conditions stated on its original distribution page. This dataset is intended for non-commercial and research purposes.
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
π‘οΈ Dataset Transparency Report
Verified data manifest for traceability and transparency.
π Identity & Source
- id
- hf-dataset--rajeev-gupta--hnm-search-data
- source
- huggingface
- author
- Rajeev Gupta
- tags
- task_categories:text-rankingtask_categories:text-retrievaltask_categories:text-classificationlanguage:ensize_categories:10m
format:csvmodality:imagemodality:tabularmodality:textlibrary:datasetslibrary:pandaslibrary:polarslibrary:mlcroissantregion:usfashione-commercecustomer-behaviortabularrecommendation-systemssearchranking
βοΈ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
π Engagement & Metrics
- likes
- 0
- downloads
- 116,277
Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)