🔍

Knowledge Retrieval & Data Analysis

5,392 Entities Indexed

Find embedding models, RAG solutions, and AI for document analysis, Q&A systems, and data insights.

⭐

Top Picks

Best models for RAG, embeddings, and document analysis

hf-dataset--g4kmu--t2-ragbench

by G4KMU

🛡️ 11.2 ⬇️ 13K

hf-model--gh-model--andrew-jang--raghub

by Unknown

🛡️ 8.8 ⬇️ 0K

hf-model--gh-model--applied-machine-learning-lab--awesome-personalized-rag-agent

by Unknown

🛡️ 8.8 ⬇️ 0K

hf-model--gh-model--appvision-ai--fast-bert

by Unknown

🛡️ 8.8 ⬇️ 0K

hf-model--gh-model--athina-ai--rag-cookbooks

by Unknown

🛡️ 8.8 ⬇️ 0K

Sort by:

Page 1 of 225

🤗 Hugging Face

--- license: cc-by-4.0 pretty_name: T2-RAGBench task_categories: - table-question-answering - question-answering language: - en tags: - pdf - finance - rag arxiv: 2506.12071 configs: - config_name: FinQA data_files: - split: train path: data/FinQA/train/* - split: dev path: data/FinQA/dev/* - split: test path: data/FinQA/test/* - config_name: ConvFinQA data_files: - split: turn_0 path: data/ConvFinQA/* - config_name: TAT-DQA data_files: - split: train path: data/TAT-DQA/train/* - split: dev p...

task_categories:table-question-answering task_categories:question-answering language:en license:cc-by-4.0 size_categories:10k<n<100k format:json modality:document modality:image modality:tabular modality:text library:datasets library:pandas library:polars library:mlcroissant arxiv:2506.12071 region:us pdf finance rag

Welcome to **RAGHub**, a living collection of **new and emerging frameworks, projects, and resources** in the **Retrieval-Augmented Generation (RAG)** ecosystem. This is a **community-driven project for r/RAG**, where we aim to catalog the rapid growth of RAG tools and projects that are pushing the boundaries of the field. Each day, it feels like a new tool or framework emerges, and choosing the right one is becoming more of an art than a science. Is the framework from three months ago still ...

ai artificial-intelligence large-language-models llm machine-learning natural-language-processing nlp open-source rag retrieval-augmented-generation

Awesome-Personalized-RAG-Agent This is the official repository of the paper "*A Survey of Personalization: From RAG to Agent*", ar...

agent awesome large-language-model llm llms personalization rag

!Python 3.6, 3.7 **New - Learning Rate Finder for Text Classification Training (borrowed with thanks from **Supports LAMB optimizer for faster training.** Please refer to https://arxiv.org/abs/1904.00962 for the paper on LAMB optimizer. **Supports BERT and XLNet for both Multi-Class and Multi-Label text classification.** Fast-Bert is the deep learning library that allows developers and data scientists to train and deploy BERT and XLNet based mo...

bert fast-bert fastai transformers python

        >If you find this repository helpful, please consider giving it a star⭐️ Welcome to the comprehensive collection of advanced + agentic Retrieval-Augmented Generation (RAG) techniques. RAG is a popular method that improves accuracy and relevance by finding the right information from reliable sources and transforming it into useful answers. This repository covers the most effective advanced + agentic RAG techniques with clear implementations and explanations. The mai...

ai chromadb cookbooks faiss langchain llm llms openai pinecone python qdrant rag tutorials weaviate jupyter notebook

** Updates on 2022.11.07 ** - KcELECTRA v2022 학습에 사용한, 확장된 텍스트 데이터셋(v2022.3Q)를 공개합니다. - - 기존 11GB -> 신규 45GB, 기존 0.9억건 -> 신규 3.4억건으로 기존 v1 데이터셋 대비 약 4배 증가한 데이터셋입니다. ** Updates on 2022.10.08 ** - KcELECTRA-base-v2022 (구 dev) 모델 이름이 변경되었습니다. - 기존 KcELECTRA-base(v2021) 대비 대부분의 downstream task에서 ~1%p 수준의 성능 향상이 있습니다. ** Updates on 2022.09.14 ** - emoji의 v2.0.0 업데이트됨에 따라 Preprocessing 코드가 일부 변경되었습니다. ** Updates on 2021.04.07 ** - KcELECTRA가 릴리즈...

bert bert-model korean-nlp nlp transformers

!GitHub An End-To-End Closed Domain Question Answering System. Built on top of the HuggingFace transformers library. **⛔ [NOT MAINTAINED] This repository is no longer maintained, but is being kept around for educational purposes. If you want a maintained alternative to cdQA check out: If you are interested in understanding how the system works and its implementation, we wrote an article on Medium with a high-level explanation. We also made a presentati...

artificial-intelligence bert deep-learning information-retrieval natural-language-processing nlp pytorch question-answering reading-comprehension transformers python

A curated resource map of tools, frameworks, techniques, and learning materials for building Retrieval-Augmented Generation (RAG) systems. This repository catalogs the RAG ecosystem and provides links to authoritative sources, tutorials, and implementations to help you explore and build RAG applications. **Retrieval-Augmented Generation (RAG)** is a sophisticated technique in Generative AI that enhances Large Language Models (LLMs) by dynamically retrieving and incorporating relevant context ...

artificial-intelligence generative-ai large-language-models machine-learning retrieval-augmented-generation

In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State Library open sources another BERT models 🎉 * 13.12.2021: Public release of Historic Language Model for Dutch. * 06.12.2021: Public release of smaller multilingual Historic Language Models. * 18.11.2021: Public release of multilingual and monolingual Historic Language Models. * 24.09.2021: Public release of cased/uncased Turkish ELECTRA and ConvBERT models, trained on mC4 corpus. * 17.08.2021: Public release of re-t...

bert bert-model electra german transformers

| | | | ------- | --------------------------------------------------------------------------------------...

agent agents ai gemini generative-ai gpt-4 information-retrieval large-language-models llm machine-learning nlp orchestration python pytorch question-answering rag retrieval-augmented-generation semantic-search summarization transformers mdx

A curated list of AI-powered web search software, focusing on the intersection of Large Language Models (LLMs) and web search capabilities. The list is organized as a timeline, containing software that support the following use cases: - Web search with LLM summarization and follow-up questions. - LLM chat with web search support via function calling. - Agent-driven web research for generation of reports. | Initial Commit | Software | Preview | | -----------------------------------------------...

ai ai-search-engine artificial-intelligence artificial-intelligence-projects awesome awesome-list generative-ai generative-ai-projects generative-ai-tools information-retrieval llm-inference metasearch question-answering rag retrieval-augmented-generation web-search html

⚡️BGE: One-Stop Retrieval Toolkit For Search and RAG !bge_logo

embeddings information-retrieval llm retrieval-augmented-generation sentence-embeddings text-semantic-similarity python

A simple, easy-to-hack GraphRAG implementation =3.9.11-blue">

gpt gpt-4o graphrag learning-by-doing llm rag python

No description available.

genai gpt gpt-4 graphrag knowledge-graph large-language-models llm rag retrieval-augmented-generation python

Googleが2018年に発表したBERTは、その性能の高さや利便性から、今やあらゆる自然言語処理タスクで汎用的に用いられるようになっています。 BERTは**事前学習済み言語モデル (Pretrained Language Model)** と呼ばれるモデルの一種で、大量のテキストで事前にモデルの学習をおこなっておくことで、様々なタスクに利用できる言語知識を獲得しています。この言語知識を転用することで、多様なタスクについて、今までよりも少ない学習データで非常に高い性能を発揮できることがわかっています。 BERTをテキスト分類などのタスクに適用する際は、BERTを微調整(fine-tuning)することでタスクを解きます。例えば、ある映画のレビューが好意的(positive)か否定的(negative)かを分類するタスクを考えると、微調整の流れは以下のようになります。 1. レビューテキストを事前学習済みのBERTに入力する 2. BERTから得られる出力を用いてpositiveかnegativeかの分類を行う 3. 分類タスクにおける損失を計算し、損失をBERTに逆伝播させ...

bert deep-learning japanese nlp python pytorch transformers

This package leverages the power of the 🤗Tokenizers library (built with Rust) to process the input text. It then uses TensorFlow.js to run the DistilBERT-cased model fine-tuned for Question Answering (87.1 F1 score on SQuAD v1.1 dev set, compared to 88.7 for BERT-base-cased). DistilBERT is used by default, but you can use other models available in the 🤗Transformers library in one additional line of code! It can run models in SavedModel and TFJS formats locally, as well as remote models than...

bert nlp nodejs question-answering tensorflow transformers typescript

A blazing fast inference solution for text embeddings models. Benchmark for BAAI/bge-base-en-v1.5 on an NVIDI...

ai embeddings huggingface llm ml rust

No description available.

agent agentic agentic-ai agentic-workflow ai ai-search deep-learning deep-research deepseek deepseek-r1 document-parser document-understanding graphrag llm mcp multi-agent ollama openai rag retrieval-augmented-generation python

!Python !Contributions welcome !GitHub RobBERT is the state-of-the-art Dutch BERT model. It is a large pre-trained general Dutch language model that can be fine-tuned on a given dataset to perform any text classification, regression or token-tagging task. As such, it has been successfully used by many researchers and practitioners for achieving state-of-the-art pe...

bert bert-model language-model nlp nlp-resources roberta transformers jupyter notebook

BertViz Visualize Attention in Transformer Models Quick Tour • Getting Started • Colab Tutorial • Paper BertViz is an interactive tool for visualizing attention in Transformer language models. It can be run inside ...

bert gpt2 machine-learning natural-language-processing neural-network nlp pytorch roberta transformer transformers visualization python

No description available.

bert bert-as-service clip-as-service clip-model cross-modal-retrieval cross-modality deep-learning image2vec multi-modality neural-search onnx openai pytorch sentence-encoding sentence2vec python

Your LLMs deserve better input. Reader does two things: - **Read**: It converts any URL to an **LLM-friendly** input with . Get improved output for your agent and RAG systems at no cost. - **Search**: It searches the web for a given query with . This allows your LLMs to access the latest world knowledge from the web. Check out the live demo Or just visit these URLs (**Read**) (**Search**)

llm proxy typescript

> "A playful and whimsical vector art of a Stochastic Tigger, wearing a t-shirt with a "GPT" text printed logo, surrounded by colorful geometric shapes. –ar 1:1 –upbeta" > > — Prompts and logo art was produced with PromptPerfect & Stable...

flamingo gpt-4 large-language-models large-multimadality-models llama llm-hosting llm-serve lmm-serve multi-modality opengpt self-hosting transformers python

💫 StarVector: Generating Scalable Vector Graphics Code from Images and Text

llm multimodal-large-language-models svg vlm python