arXiv.org e[B!]新着記事・評価 - はてなブックマーク

Seven Failure Points When Engineering a Retrieval Augmented Generation System
3 users
arxiv.org

Software engineers are increasingly adding semantic search capabilities to applications using a strategy known as Retrieval Augmented Generation (RAG). A RAG system involves finding documents that semantically match a query and then passing the documents to a large language model (LLM) such as ChatGPT to extract the right answer using an LLM. RAG systems aim to: a) reduce the problem of hallucinat
- テクノロジー
- 2024/05/17 11:50
- あとで読む

Evaluation of Retrieval-Augmented Generation: A Survey
2 users
arxiv.org

Retrieval-Augmented Generation (RAG) has emerged as a pivotal innovation in natural language processing, enhancing generative models by incorporating external information retrieval. Evaluating RAG systems, however, poses distinct challenges due to their hybrid structure and reliance on dynamic knowledge sources. We consequently enhanced an extensive survey and proposed an analysis framework for be
- 学び
- 2024/05/15 08:07
- あとで読む
https://arxiv.org/pdf/2312.17149
2 users
arxiv.org
- 学び
- 2024/05/08 14:18
Better & Faster Large Language Models via Multi-token Prediction
2 users
arxiv.org

Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. More specifically, at each position in the training corpus, we ask the model to predict the following n tokens using n independent output heads, operating on top of a shared m
- 学び
- 2024/05/01 20:25
KAN: Kolmogorov-Arnold Networks
11 users
arxiv.org

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametriz
- テクノロジー
- 2024/05/01 16:37
- 機械学習
Mills' constant is irrational
2 users
arxiv.org

Let $\lfloor x\rfloor$ denote the integer part of $x$. In 1947, Mills constructed a real number $\xi$ greater than $1$ such that $\lfloor \xi^{3^k} \rfloor$ is always a prime number for every positive integer $k$. We define Mills' constant as the smallest real number $\xi$ satisfying this property. In this article, we determine that Mills' constant is irrational. Moreover, we also obtain partial r
- 学び
- 2024/05/01 15:58
Building a Large Japanese Web Corpus for Large Language Models
3 users
arxiv.org

Open Japanese large language models (LLMs) have been trained on the Japanese portions of corpora such as CC-100, mC4, and OSCAR. However, these corpora were not created for the quality of Japanese texts. This study builds a large Japanese web corpus by extracting and refining text from the Common Crawl archive (21 snapshots of approximately 63.4 billion pages crawled between 2020 and 2023). This c
- 学び
- 2024/04/30 17:29
From LLM to NMT: Advancing Low-Resource Machine Translation with Claude
2 users
arxiv.org

We show that Claude 3 Opus, a large language model (LLM) released by Anthropic in March 2024, exhibits stronger machine translation competence than other LLMs. Though we find evidence of data contamination with Claude on FLORES-200, we curate new benchmarks that corroborate the effectiveness of Claude for low-resource machine translation into English. We find that Claude has remarkable \textit{res
- テクノロジー
- 2024/04/23 21:56
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
2 users
arxiv.org

Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts. In this work, we argue that one of the primary vulnerabilities underlying these attacks is that LLMs often consider system prompts (e.g., text from an application developer) to be the same priority as text from untrus
- テクノロジー
- 2024/04/23 17:34
- あとで読む
http://arxiv.org/pdf/2309.04188
2 users
arxiv.org
- 学び
- 2024/04/23 15:47
- あとで読む
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
8 users
arxiv.org

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset
- テクノロジー
- 2024/04/23 11:46
- あとで読む
A Survey on Retrieval-Augmented Text Generation for Large Language Models
4 users
arxiv.org

Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enha
- 学び
- 2024/04/18 20:02
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
2 users
arxiv.org

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-te
- テクノロジー
- 2024/04/13 01:46
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
7 users
arxiv.org

Scaling laws describe the relationship between the size of language models and their capabilities. Unlike prior studies that evaluate a model's capability via loss or benchmarks, we estimate the number of knowledge bits a model stores. We focus on factual knowledge represented as tuples, such as (USA, capital, Washington D.C.) from a Wikipedia page. Through multiple controlled datasets, we establi
- 学び
- 2024/04/10 22:16
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
2 users
arxiv.org

Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream conce
- 学び
- 2024/04/09 11:39
ORPO: Monolithic Preference Optimization without Reference Model
2 users
arxiv.org

While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building
- テクノロジー
- 2024/04/05 09:52
ReALM: Reference Resolution As Language Modeling
10 users
arxiv.org

Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in ref
- テクノロジー
- 2024/04/03 15:14
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
3 users
arxiv.org

In this paper, we unveil that Language Models (LMs) can acquire new capabilities by assimilating parameters from homologous models without retraining or GPUs. We first introduce DARE to set most delta parameters (i.e., the disparity between fine-tuned and pre-trained parameters) to zeros without affecting the abilities of Supervised Fine-Tuning (SFT) LMs, which randomly Drops delta parameters with
- テクノロジー
- 2024/04/02 15:06
Jamba: A Hybrid Transformer-Mamba Language Model
3 users
arxiv.org

We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows reso
- テクノロジー
- 2024/04/01 22:38
JMultiWOZ: A Large-Scale Japanese Multi-Domain Task-Oriented Dialogue Dataset
2 users
arxiv.org

Dialogue datasets are crucial for deep learning-based task-oriented dialogue system research. While numerous English language multi-domain task-oriented dialogue datasets have been developed and contributed to significant advancements in task-oriented dialogue systems, such a dataset does not exist in Japanese, and research in this area is limited compared to that in English. In this study, toward
- 学び
- 2024/03/28 08:14
NonlinearSolve.jl: High-Performance and Robust Solvers for Systems of Nonlinear Equations in Julia
2 users
arxiv.org

Efficiently solving nonlinear equations underpins numerous scientific and engineering disciplines, yet scaling these solutions for complex system models remains a challenge. This paper presents NonlinearSolve.jl - a suite of high-performance open-source nonlinear equation solvers implemented natively in the Julia programming language. NonlinearSolve.jl distinguishes itself by offering a unified AP
- テクノロジー
- 2024/03/26 16:21
The Elements of Differentiable Programming
18 users
arxiv.org

Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization o
- テクノロジー
- 2024/03/23 16:06
- あとで読む
Chronos: Learning the Language of Time Series
2 users
arxiv.org

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M
- 学び
- 2024/03/22 13:12
Evolutionary Optimization of Model Merging Recipes
9 users
arxiv.org

We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically disc
- 学び
- 2024/03/21 09:47
RAFT: Adapting Language Model to Domain Specific RAG
6 users
arxiv.org

Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain su
- テクノロジー
- 2024/03/19 00:15
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
5 users
arxiv.org

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for la
- テクノロジー
- 2024/03/17 18:13
Large Language Models Are Neurosymbolic Reasoners
2 users
arxiv.org

A wide range of real-world applications is characterized by their symbolic nature, necessitating a strong capability for symbolic reasoning. This paper investigates the potential application of Large Language Models (LLMs) as symbolic reasoners. We focus on text-based games, significant benchmarks for agents with natural language capabilities, particularly in symbolic tasks like math, map reading,
- 学び
- 2024/03/13 12:59
Is Cosine-Similarity of Embeddings Really About Similarity?
2 users
arxiv.org

Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors
- 学び
- 2024/03/12 15:29
Stealing Part of a Production Language Model
5 users
arxiv.org

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Ba
- テクノロジー
- 2024/03/12 13:42
- セキュリティ

はてなブックマーク

はてなブックマーク

『arXiv.org e-Print archive』

Seven Failure Points When Engineering a Retrieval Augmented Generation System

Evaluation of Retrieval-Augmented Generation: A Survey

https://arxiv.org/pdf/2312.17149

Better & Faster Large Language Models via Multi-token Prediction

KAN: Kolmogorov-Arnold Networks

Mills' constant is irrational

Building a Large Japanese Web Corpus for Large Language Models

From LLM to NMT: Advancing Low-Resource Machine Translation with Claude

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

http://arxiv.org/pdf/2309.04188

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

A Survey on Retrieval-Augmented Text Generation for Large Language Models

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

ORPO: Monolithic Preference Optimization without Reference Model

ReALM: Reference Resolution As Language Modeling

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Jamba: A Hybrid Transformer-Mamba Language Model

JMultiWOZ: A Large-Scale Japanese Multi-Domain Task-Oriented Dialogue Dataset

NonlinearSolve.jl: High-Performance and Robust Solvers for Systems of Nonlinear Equations in Julia

The Elements of Differentiable Programming

Chronos: Learning the Language of Time Series

Evolutionary Optimization of Model Merging Recipes

RAFT: Adapting Language Model to Domain Specific RAG

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Large Language Models Are Neurosymbolic Reasoners

Is Cosine-Similarity of Embeddings Really About Similarity?

Stealing Part of a Production Language Model

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス

『arXiv.org e-Print archive』

このページはまだブックマークされていません

キーボードショートカット一覧

公式Twitter

はてなのサービス

このページはまだ
ブックマークされていません