AI Papers Catalog
Browse and discover the latest AI research papers
arxiv--2511.04831v1
NVIDIA
We present Isaac Lab, the natural successor to Isaac Gym, which extends the paradigm of GPU-native robotics simulation into the era of large-scale multi-modal l
arxiv--2511.00088v2
NVIDIA
End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safe
arxiv--2509.25149v1
NVIDIA
Large Language Models (LLMs) today are powerful problem solvers across many domains, and they continue to get stronger as they scale in model size, training set
arxiv--2512.20856v1
NVIDIA
We introduce the Nemotron 3 family of models - Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities. The Nemo
arxiv--2512.20848v1
NVIDIA
We present Nemotron 3 Nano 30B-A3B, a Mixture-of-Experts hybrid Mamba-Transformer language model. Nemotron 3 Nano was pretrained on 25 trillion text tokens, inc
arxiv--2511.03929v2
NVIDIA
We introduce Nemotron Nano V2 VL, the latest model of the Nemotron vision-language series designed for strong real-world document understanding, long video comp
arxiv--2511.00062v1
NVIDIA
We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI. Built on a flow-based architecture, [Cosmos-Predi
arxiv--2512.09773v1
Romain Mussard
We introduce Stylized Meta-Album (SMA), a new image classification meta-dataset comprising 24 datasets (12 content datasets, and 12 stylized datasets), designed
arxiv--2511.17158v1
Caroline Malhaire
Objectives: To evaluate the association between pretreatment MRI descriptors and breast cancer (BC) pathological complete response (pCR) to neoadjuvant chemothe
hf-dataset--mlfoundations--mint-1t-arxiv
mlfoundations
--- license: cc-by-4.0 task_categories: - image-to-text - text-generation language: - en tags: - multimodal pretty_name: MINT-1T size_categories: - 100B
arxiv--2601.03798v1
Taisiia Tikhomirova
Understanding where transformer language models encode psychologically meaningful aspects of meaning is essential for both theory and practice. We conduct a sys
arxiv--2512.22106v1
Zubair Shah
Neural network pruning is widely used to reduce model size and computational cost. Yet, most existing methods treat sparsity as an externally imposed constraint
arxiv--2512.21506v1
Aiwei Zhang
As wearable sensing becomes increasingly pervasive, a key challenge remains: how can we generate natural language summaries from raw physiological signals such
arxiv--2512.20082v2
Chaithra
Financial sentiment analysis plays a crucial role in informing investment decisions, assessing market risk, and predicting stock price trends. Existing works in
arxiv--2512.17066v1
Suhaib Abdurahman
Human conflict is often attributed to threats against material conditions and symbolic values, yet it remains unclear how they interact and which dominates. Pro
arxiv--2512.16891v1
Haichao Zhang
Video Large Language Models (VLLMs) unlock world-knowledge-aware video understanding through pretraining on internet-scale data and have already shown promise o
arxiv--2512.13478v6
Kei Saito
Current AI systems exhibit a fundamental limitation: they resolve ambiguity prematurely. This premature semantic collapse--collapsing multiple valid interpretat
arxiv--2512.10443v1
Sabtain Ahmad
Clustered Federated Learning (CFL) has emerged as a powerful approach for addressing data heterogeneity and ensuring privacy in large distributed IoT environmen
arxiv--2512.09831v1
Chainarong Amornbunchornvej
This paper develops a geometric framework for modeling belief, motivation, and influence across cognitively heterogeneous agents. Each agent is represented by a
arxiv--2512.07801v3
Raunak Jain
LLM-based agents are increasingly deployed for expert decision support, yet human-AI teams in high-stakes settings do not yet reliably outperform the best indiv
arxiv--2511.22181v1
Maitrayee Keskar
We present a method for trajectory planning for autonomous driving, learning image-based context embeddings that align with motion prediction frameworks and pla
arxiv--2511.16964v1
Kirill Nagaitsev
Maximizing performance on available GPU hardware is an ongoing challenge for modern AI inference systems. Traditional approaches include writing custom GPU kern
arxiv--2511.14098v1
Adit Jain
In this paper, we model and analyze how a network of interacting LLMs performs collaborative question-answering (CQA) in order to estimate a ground truth given
arxiv--2511.12529v1
Sanchaita Hazra
Large Language Models have seen expanding application across domains, yet their effectiveness as assistive tools for scientific writing -- an endeavor requiring