[B! word2vec] yubessyのブックマーク

Combing LDA and Word Embeddings for topic modeling

yubessy 2018/09/16

リンク

fastText++: Batteries Included

yubessy 2018/09/05

リンク

Introduction to Word Embeddings

yubessy 2018/08/25

リンク

絵で理解するWord2vecの仕組み - Qiita

皆さん、Word2vec の仕組みはご存知ですか？ Word2vec は gensim や TensorFlow で簡単に試せるので使ったことのある方は多いと思います。しかし、仕組みまで理解している方はそう多くないのではないでしょうか。そもそも本家の論文でも内部の詳細については詳しく解説しておらず、解説論文が書かれているくらいです。本記事では Word2vec のモデルの一つである Skip-Gram について絵を用いて説明し、概要を理解することを目指します。まずは Skip-Gram がどのようなモデルなのかについて説明します。 ※ 対象読者はニューラルネットワークの基礎を理解しているものとします。どのようなモデルなのか？ Skip-Gram はニューラルネットワークのモデルの一つです。Skip-Gram は２層のニューラルネットワークであり隠れ層は一つだけです。隣接する層のユニット

yubessy 2018/08/10

word2vec
NLP

リンク

From word2vec to doc2vec: an approach driven by Chinese restaurant process

yubessy 2018/07/21

word2vec

リンク

Text Embedding Models Contain Bias. Here's Why That Matters.

Text Embedding Models Contain Bias. Here's Why That Matters. Posted by Ben Packer, Yoni Halpern, Mario Guajardo-Céspedes & Margaret Mitchell (Google AI) As Machine Learning practitioners, when faced with a task, we usually select or train a model primarily based on how well it performs on that task. For example, say we're building a system to classify whether a movie review is positive or negative

yubessy 2018/04/14

word2vec

リンク

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

We formulate language modeling as a matrix factorization probl em, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural languag

yubessy 2017/11/18

リンク

Stop Using word2vec | Stitch Fix Technology – Multithreaded

Stop Using word2vec When I started playing with word2vec four years ago I needed (and luckily had) tons of supercomputer time. But because of advances in our understanding of word2vec, computing word vectors now takes fifteen minutes on a single run-of-the-mill computer with standard numerical libraries1. Word vectors are awesome but you don’t need a neural network – and definitely don’t need deep

yubessy 2017/11/09

リンク

Word Mover's Distance を使って文の距離を計算する - Ahogrammer

自然言語処理にとって文や文書間の類似度を計算するのは重要なタスクです。類似文(書)の計算には、盗作の検知、関連記事の検索、質問応答における質問文の多様性の吸収といった様々な応用があります。文書間の距離を計算する手法として Word Mover’s Distance があります。 Word Mover’s Distance は2015年に提案された手法です。Twitterのようなショートテキストに対して良い結果を示しているのが特徴です。具体的には Word2vec や GloVe 等で得られた単語の分散表現を使って文書間の距離を計算します。本記事では、Word Mover’s Distance を試してみることを目的としています。具体的には gensim という単語の分散表現や類似文書を計算できるPythonライブラリを用いて Word Mover’s Distance を計算しま

yubessy 2017/04/25

gensimで簡単にWMD使えるの知らなかった、便利

リンク

king - man + woman is queen; but why?

Intro word2vec is an algorithm that transf orms words into vectors, so that words with similar meanings end up laying close to each other. Moreover, it allows us to use vector arithmetics to work with analogies, for example, the famous king - man + woman = queen. I will try to explain how it works, with special em phasis on the meaning of vector differences, at the same time omitting as many technic

yubessy 2017/01/13

リンク

Word Mover's Distance: word2vecの文書間距離への応用 - yubessy.hatenablog.com

word2vecによって得られる語の分散表現を用いて文書間の距離（非類似度）を計算する手法についての論文を読みました。せっかくなので解説してみます。 [1] Kusner, Matt J., et al. “From word embeddings to document distances.” Proceedings of the 32nd International Conference on Machine Learning (ICML 2015). 2015. TL;DR この論文では Word Mover’s Distance(WMD) という文書間距離の計算手法を提案しています。提案手法は手っ取り早く言うと次のようなものです。文書A, B間の距離 = A, Bの語同士を対応付けることでAをBに変換するとき、対応付けのコストが最も低い場合のコストの総和語xを語yに対応付

yubessy 2017/01/10

三賀日ずっとこれ書いてた

リンク

Word2Vec のニューラルネットワーク学習過程を理解する · けんごのお屋敷

Word2Vec というと、文字通り単語をベクトルとして表現することで単語の意味をとらえることができる手法として有名なものですが、最近だと Word2Vec を協調フィルタリングに応用する研究 (It em2Vec と呼ばれる) などもあるようで、この Word2Vec というツールは自然言語処理の分野の壁を超えて活躍しています。実は It em2Vec を実装してみたくて Word2Vec の仕組みを理解しようとしていたのですが、Word2Vec の内部の詳細に踏み込んで解説した日本語記事を見かけることがなかったので、今更感はありますが自分の知識の整理のためにもブログに残しておきます。なお、この記事は Word2Vec のソースコードといくつかのペーパーを読んで自力で理解した内容になります。間違いが含まれている可能性もありますのでご了承ください。もし間違いを見つけた場合は指摘してもらえると

yubessy 2016/08/25

リンク

http://arxiv.org/pdf/1402.3722.pdf

arXiv:1402.3722v1[cs.CL]15Feb2014 word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method Yoav Goldberg and Omer Levy {yoav.goldberg,omerlevy}@gmail.com February 14, 2014 The word2vec software of Tomas Mikolov and colleagues1 has gained a lot of traction lately, and provides state-of-the-art word embeddings. The learning models behind the software are described in tw

yubessy 2016/08/18

word2vec

リンク

The Amazing Power of Word Vectors - KDnuggets

The Amazing Power of Word Vectors A fantastic overview of several now-classic papers on word2vec, the work of Mikolov et al. at Google on efficient vector representations of words, and what you can do with them. For today’s post, I’ve drawn material not just from one paper, but from five! The subject matter is ‘word2vec’ – the work of Mikolov et al. at Google on efficient vector representations of

yubessy 2016/05/27

リンク

Word2Vecを用いた研究 : ベクトル空間での操作で、単語から「ジェンダーの2元性」を排除する | POSTD

前回の投稿では、言語のword embeddingモデル（WEM）という新しいモデルの概要を説明し、基本的なWEM操作が簡単に実行できるR言語のパッケージを紹介しました。この記事はほとんど、デジタルヒューマニティーズのコミュニティの皆さん向けに書きました。本稿では、ratemyprofessors.comの教職員メンバーによる約1,400万のレビューを使ってトレーニングした1つのword2vecモデルについて、詳しく説明します。 ^(1) このモデルの注目点は、ジェンダー（性別）を示す言葉について分析する際に、こうした機械学習のモデルがどこまで役立つのかについて、具体的な研究ができるということです。この記事で、機械学習のモデルのトレーニングには興味のない方の関心も引くことができればうれしいと、私は思っています。コードを多少提示しますが、読み飛ばしてくださって構いません。では前回の投稿を

yubessy 2015/12/02

word2vec

リンク

パソコン工房のPCで遊ぼう第2弾！あんちべさんと一緒に Rakuten MA で形態素解析 - はてなニュース

（これまでのあらすじ）前回、パソコン工房から「統計処理用のPCのPR企画をやりたい」と依頼を受けて、はてなエンジニアと一緒にword2vecで遊んでみた編集部。読者の皆さまにも好評だったので、調子に乗って第2弾を実施することにしました。今回は「"word2vec"で艦これ加賀さんから乳を引いてみる」で一世を風靡した、あの統計屋さんが全面協力です！記事の最後にはプレゼントのお知らせも。（※この記事は株式会社ユニットコムによるPR記事です）皆さまは前回の記事を、覚えておりますでしょうか？ ▽ Python - Perl + Java = ？はてなブログのデータとパソコン工房のPCを使って「word2vec」で遊んでみた - はてなニュースパソコン工房の統計処理用PC × はてなブログのデータ × word2vec という記事でした。ブックマークコメントから感想をいくつかピックアップし

yubessy 2015/02/18

リンク

はてなブックマーク

タグ

関連タグで絞り込む (10)

word2vecに関するyubessyのブックマーク (16)

お知らせ

今週のはてなブックマーク数ランキング（2024年5月第2週）

今週のはてなブックマーク数ランキング（2024年5月第1週）

月間はてなブックマーク数ランキング（2024年4月）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス