Knowledge Retrieval & Data Analysis
5,392 Entities Indexed
Find embedding models, RAG solutions, and AI for document analysis, Q&A systems, and data insights.
Top Picks
hf-dataset--g4kmu--t2-ragbench
by G4KMU
hf-model--gh-model--andrew-jang--raghub
by Unknown
hf-model--gh-model--applied-machine-learning-lab--awesome-personalized-rag-agent
by Unknown
hf-model--gh-model--appvision-ai--fast-bert
by Unknown
hf-model--gh-model--athina-ai--rag-cookbooks
by Unknown
--- license: cc-by-4.0 pretty_name: T2-RAGBench task_categories: - table-question-answering - question-answering language: - en tags: - pdf - finance - rag arxiv: 2506.12071 configs: - config_name: FinQA data_files: - split: train path: data/FinQA/train/* - split: dev path: data/FinQA/dev/* - split: test path: data/FinQA/test/* - config_name: ConvFinQA data_files: - split: turn_0 path: data/ConvFinQA/* - config_name: TAT-DQA data_files: - split: train path: data/TAT-DQA/train/* - split: dev p...
Welcome to **RAGHub**, a living collection of **new and emerging frameworks, projects, and resources** in the **Retrieval-Augmented Generation (RAG)** ecosystem. This is a **community-driven project for r/RAG**, where we aim to catalog the rapid growth of RAG tools and projects that are pushing the boundaries of the field. Each day, it feels like a new tool or framework emerges, and choosing the right one is becoming more of an art than a science. Is the framework from three months ago still ...
Awesome-Personalized-RAG-Agent This is the official repository of the paper "*A Survey of Personalization: From RAG to Agent*", ar...
!Python 3.6, 3.7 **New - Learning Rate Finder for Text Classification Training (borrowed with thanks from **Supports LAMB optimizer for faster training.** Please refer to https://arxiv.org/abs/1904.00962 for the paper on LAMB optimizer. **Supports BERT and XLNet for both Multi-Class and Multi-Label text classification.** Fast-Bert is the deep learning library that allows developers and data scientists to train and deploy BERT and XLNet based mo...
>If you find this repository helpful, please consider giving it a star⭐️ Welcome to the comprehensive collection of advanced + agentic Retrieval-Augmented Generation (RAG) techniques. RAG is a popular method that improves accuracy and relevance by finding the right information from reliable sources and transforming it into useful answers. This repository covers the most effective advanced + agentic RAG techniques with clear implementations and explanations. The mai...
** Updates on 2022.11.07 ** - KcELECTRA v2022 학습에 사용한, 확장된 텍스트 데이터셋(v2022.3Q)를 공개합니다. - - 기존 11GB -> 신규 45GB, 기존 0.9억건 -> 신규 3.4억건으로 기존 v1 데이터셋 대비 약 4배 증가한 데이터셋입니다. ** Updates on 2022.10.08 ** - KcELECTRA-base-v2022 (구 dev) 모델 이름이 변경되었습니다. - 기존 KcELECTRA-base(v2021) 대비 대부분의 downstream task에서 ~1%p 수준의 성능 향상이 있습니다. ** Updates on 2022.09.14 ** - emoji의 v2.0.0 업데이트됨에 따라 Preprocessing 코드가 일부 변경되었습니다. ** Updates on 2021.04.07 ** - KcELECTRA가 릴리즈...
!GitHub An End-To-End Closed Domain Question Answering System. Built on top of the HuggingFace transformers library. **⛔ [NOT MAINTAINED] This repository is no longer maintained, but is being kept around for educational purposes. If you want a maintained alternative to cdQA check out: If you are interested in understanding how the system works and its implementation, we wrote an article on Medium with a high-level explanation. We also made a presentati...
A curated resource map of tools, frameworks, techniques, and learning materials for building Retrieval-Augmented Generation (RAG) systems. This repository catalogs the RAG ecosystem and provides links to authoritative sources, tutorials, and implementations to help you explore and build RAG applications. **Retrieval-Augmented Generation (RAG)** is a sophisticated technique in Generative AI that enhances Large Language Models (LLMs) by dynamically retrieving and incorporating relevant context ...
In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State Library open sources another BERT models 🎉 * 13.12.2021: Public release of Historic Language Model for Dutch. * 06.12.2021: Public release of smaller multilingual Historic Language Models. * 18.11.2021: Public release of multilingual and monolingual Historic Language Models. * 24.09.2021: Public release of cased/uncased Turkish ELECTRA and ConvBERT models, trained on mC4 corpus. * 17.08.2021: Public release of re-t...
| | | | ------- | --------------------------------------------------------------------------------------...
A curated list of AI-powered web search software, focusing on the intersection of Large Language Models (LLMs) and web search capabilities. The list is organized as a timeline, containing software that support the following use cases: - Web search with LLM summarization and follow-up questions. - LLM chat with web search support via function calling. - Agent-driven web research for generation of reports. | Initial Commit | Software | Preview | | -----------------------------------------------...
⚡️BGE: One-Stop Retrieval Toolkit For Search and RAG !bge_logo
A simple, easy-to-hack GraphRAG implementation =3.9.11-blue">
No description available.
Googleが2018年に発表したBERTは、その性能の高さや利便性から、今やあらゆる自然言語処理タスクで汎用的に用いられるようになっています。 BERTは**事前学習済み言語モデル (Pretrained Language Model)** と呼ばれるモデルの一種で、大量のテキストで事前にモデルの学習をおこなっておくことで、様々なタスクに利用できる言語知識を獲得しています。 この言語知識を転用することで、多様なタスクについて、今までよりも少ない学習データで非常に高い性能を発揮できることがわかっています。 BERTをテキスト分類などのタスクに適用する際は、BERTを微調整(fine-tuning)することでタスクを解きます。 例えば、ある映画のレビューが好意的(positive)か否定的(negative)かを分類するタスクを考えると、微調整の流れは以下のようになります。 1. レビューテキストを事前学習済みのBERTに入力する 2. BERTから得られる出力を用いてpositiveかnegativeかの分類を行う 3. 分類タスクにおける損失を計算し、損失をBERTに逆伝播させ...
This package leverages the power of the 🤗Tokenizers library (built with Rust) to process the input text. It then uses TensorFlow.js to run the DistilBERT-cased model fine-tuned for Question Answering (87.1 F1 score on SQuAD v1.1 dev set, compared to 88.7 for BERT-base-cased). DistilBERT is used by default, but you can use other models available in the 🤗Transformers library in one additional line of code! It can run models in SavedModel and TFJS formats locally, as well as remote models than...
A blazing fast inference solution for text embeddings models. Benchmark for BAAI/bge-base-en-v1.5 on an NVIDI...
No description available.
!Python !Contributions welcome !GitHub RobBERT is the state-of-the-art Dutch BERT model. It is a large pre-trained general Dutch language model that can be fine-tuned on a given dataset to perform any text classification, regression or token-tagging task. As such, it has been successfully used by many researchers and practitioners for achieving state-of-the-art pe...
BertViz Visualize Attention in Transformer Models Quick Tour • Getting Started • Colab Tutorial • Paper BertViz is an interactive tool for visualizing attention in Transformer language models. It can be run inside ...
No description available.
Your LLMs deserve better input. Reader does two things: - **Read**: It converts any URL to an **LLM-friendly** input with . Get improved output for your agent and RAG systems at no cost. - **Search**: It searches the web for a given query with . This allows your LLMs to access the latest world knowledge from the web. Check out the live demo Or just visit these URLs (**Read**) (**Search**)
> "A playful and whimsical vector art of a Stochastic Tigger, wearing a t-shirt with a "GPT" text printed logo, surrounded by colorful geometric shapes. –ar 1:1 –upbeta" > > — Prompts and logo art was produced with PromptPerfect & Stable...
💫 StarVector: Generating Scalable Vector Graphics Code from Images and Text