GitHub - FMInference/FlexGen: Running large language models on a single GPU for throughput-oriented scenarios.

テクノロジーカテゴリーの変更を依頼記事元:

github.com/FMInference

38 usersがブックマークコメント

記事へのコメント3件

注目コメント
新着コメント

stealthinu FlexGen使うとでかいLLMでも3090一枚程度のGPUメモリリソースでも実用的な推論が可能になるらしい。計算回すループ順を工夫して少ないGPUメモリでも動くようにしてある。すげええ！

2023/02/23 リンク

misshiki “FlexGen は、LLM 推論のリソース要件を 1 つのコモディティ GPU (T4、3090 など) にまで下げ、さまざまなハードウェアセットアップの柔軟な展開を可能にすることを目的としています。”

ディープラーニング

2023/02/21 リンク

kns_1234 "GPUメモリに限りがある状況（16GB T4や24GB RTX3090など）でも大規模な言語モデルを高パフォーマンスで実行できる「FlexGen」"

2023/02/21 リンク

注目コメント算出アルゴリズムの一部にLINEヤフー株式会社の「建設的コメント順位付けモデルAPI」を使用しています

規約違反を報告

GitHub - FMInference/FlexGen: Running large language models on a single GPU for throughput-oriented scenarios.

In recent years, large language models (LLMs) have shown great performance across a wide range of... In recent years, large language models (LLMs) have shown great performance across a wide range of tasks. Increasingly, LLMs have been applied not only to interactive applications (such as chat), but also to many "back-of-house" tasks. These tasks include benchmarking, information extraction, data wrangling, and form processing. One key characteristic of these applications is that they are throughp