[B! Internet][抽出] tsupoのブックマーク

tsupo id:tsupo

Internetと抽出に関するtsupoのブックマーク (4)

gist: 18326 — GitHub
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
tsupo 2008/10/22
via http://ido.nu/kuma/2008/10/22/pure-javascript-implemented-flash-swf-file-parser/ // Webページ中の「embedタグのSWFを全部読み込んで中に入っているjpegを一覧表示する」

tombloo

javascript

flash

jpeg

抽出

Internet

tool
リンク
Webページの本文抽出 (nakatani @ cybozu labs)
Webページの自動カテゴライズの続き。前回書いたとおり、パストラックで行っている Web ページのカテゴライズでは、Web ページの本文抽出がひとつの鍵になっています。今回はその本文抽出モジュールを公開しつつ、使っている技法をざっくり解説などしてみます。本モジュールの利用は至極簡単。require して analyse メソッドに解析したい html を与えるだけ。文字コードは UTF-8 です。【追記】大事なこと書き忘れ。本モジュールは Ruby1.8.5 で動作確認していますが、特別なことはしていないので、1.8.x なら動くと思います。 $KCODE="u" # 文字コードは utf-8 require 'extractcontent.rb' # オプション値の指定 opt = {:waste_expressions => /お問い合わせ|会社概要/} ExtractCont
tsupo 2007/11/29
Web ページには(略)、とにかく本文以外の「ゴミ」がわんさかついているので、本文を抽出するというより「いかにゴミを取り除くか」に注力しています / セクションターゲット対応重要 ← お金の力は偉大

textmining

pathtraq

抽出

scraping

Internet

summarySite
リンク
del.icio.usのポスト時にページを解析してタグを追加する - higeorange's blog
Dance Party 上の画像のように，tagthenet.net でページを解析して重要そうな単語をサジェストとして追加してくれるGreasemonkeyスクリプト。使えるかどうかは，tagthe.netの精度がどの程度かによるね。参考 tagthe.netのAPIについて追記 Operaで動くUserJavascriptを作った。全く同じじゃないけど。OperaのGMなんちゃらっての動かすためのuserjsを入れると元のスクリプトでも動くのかなぁ。 http://www14.plala.or.jp/operairc/customize/userjavascript/deliciousTagtheNet.js
tsupo 2007/01/09
tagyu の生まれ変わりかと思ったけど、別の人が作ってるんですね。UTF-8 対応とは書いてあるものの、日本語のページは駄目っぽい。

tag

tagging

形態素解析

特徴語

抽出

API

webServices

socialBookmark

Internet

computer
リンク
tagthe.net - Webservice that tags your resources
Welcome to tagthe.net! This is a simple webservice that helps you in tagging textual content on and off the web. There are two ways of using it: by simply pasting a URL or a text in the fields below or uploading a file by using the REST API tagthe.net then returns a set of tags based on the textual content you specified. The service is mainly designed for developers, building applications that mak
tsupo 2007/01/09
タグ候補を教えてくれるサービス。 http://www.tagyu.com/ の生まれ変わり? トップページがよく似てる。ちなみに、Tagyu はすでに運用停止しちゃってるみたい。

tag

tagging

抽出

形態素解析

タグ候補

socialBookmark

Internet

computer
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx