[B! Doc2Vec] sakidatsumonoのブックマーク

sakidatsumono id:sakidatsumono

Doc2Vecに関するsakidatsumonoのブックマーク (3)

Gensim: topic modelling for humans
models.keyedvectors – Store and query word vectors¶ This module implements word vectors, and more generally sets of vectors keyed by lookup tokens/ints,and various similarity look-ups. Since trained word vectors are independent from the way they were trained (Word2Vec, FastText etc), they can be represented by a standalone structure, as implemented in this module. The structure is called “KeyedVec
sakidatsumono 2019/04/21
Doc2Vec
リンク
gensimでDoc2Vec - 機械学習・自然言語処理の勉強メモ
Doc2Vecとは Doc2Vecは、任意の長さの文書をベクトル化する技術。文書やテキストの分散表現を獲得することができる。＊ベクトル同士の類似度を測定して、文書分類や似た文書を探すことができる。 Word2VecのCBoWにおける入力は、単語をone-hot表現した単語IDだけだったが、 Doc2Vecは、単語IDにパラグラフIDを付加した情報を入力とする。下図のイメージ下記、論文より抜粋 [1405.4053] Distributed Representations of Sentences and Documents 日本語での要約記事としてはこちらが分かりやすい。【論文紹介】Distributed Representations of Sentences and Documents from Tomofumi Yoshida www.slideshare.net Word
sakidatsumono 2019/04/15
Doc2Vec
リンク
Doc2Vecの仕組みとgensimを使った文書類似度算出チュートリアル
類似したコンテンツのタイトルは、女性アーティストだらけとなっている。浜崎あゆみは日本のレディー・ガガらしい。 Bag-of-wordsの欠点とDoc2Vecのメリット Bag-of-wordsは文書内の単語の出現回数をベクトルの要素とした分散表現だ。例えば、 { I, have, a, pen, I, have, an, apple } という単語区切りの文書があるとしよう。この文書をBag-of-wordsでベクトル化する。ベクトルの並び順をI, have, a, pen, an, appleとすると、 [2, 2, 1, 1, 1, 1] と表現することになる。単に出現頻度を計算しているだけなので、シンプルで計算効率よく分散表現を得ることが出来る。では、Bag-of-wordsの何が問題なのだろうか？Bag-of-wordsでは、単語の出現順序が考慮されず、同様の単語が使われていれば
sakidatsumono 2017/11/10
Doc2Vec

Word2Vec

NLP
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx