[B! pig] manboubirdのブックマーク

manboubird id:manboubird

pigに関するmanboubirdのブックマーク (71)

ASF JIRA
manboubird 2017/02/20
リンク
データ活用を効率化するHadoop WebUIと権限管理改善事例
データを利用するユーザは、増え続けるデータを高速かつ効率的に利用したいと考えています。その一方で、長く利用された仕組みが、そのニーズを満たすにはコストがかかり過ぎる場合があります。本講演では、ドワンゴのHadoopを用いた分析基盤が、このようなニーズに応えるために、どのように社内ユーザ向けWeb UIとHadoop権限管理を一新したのかを紹介します。
manboubird 2016/12/25
clouderaWorld

authorization

hadoop

slide

dowango

dataInfrastructure

hdfs

acl

pig

webUi
リンク
Christopher Olston - Publications
Papers by Christopher Olston Research community reports Machine learning Enterprise data management Continuous data processing workflows Data pipeline programming & debugging Large-scale data processing Web monitoring & crawling Web search Web application scalability Distributed data monitoring Data visualization Research community reports Numerous co-authors. The Beckman Report on Database Resear
manboubird 2016/06/25
researcher

pig

dataManagement

dataflow

dataIntegration

crawling

search

web
リンク
Sessionization in SQL, Hive, Pig and Python — Dataiku Academy 7.0 documentation
Why sessionization?¶ Sessionization is the act of turning event-based data into sessions, the ordered list of a user’s actions in completing a task. It is widely used in several domains, such as: Web analytics. This is the most common use, where a session is composed of a user’s actions during one particular visit to the website. You can think of this as a buying session on a e-business website fo
manboubird 2016/05/20
sessionization

hive

pig

python

sessionAnalysis
リンク
GitHub - Netflix/Surus
manboubird 2016/04/09
surus

pig

hive

udf

netflix

pmml

anomalyDetection

predictiveAnalysis
リンク
Introducing Surus and ScorePMML
Today we’re announcing a new Netflix-OSS project called Surus. Over the next year we plan to release a handful of our internal user defined functions (UDF’s) that have broad adoption across Netflix. The use cases for these functions are varied in nature (e.g. scoring predictive models, outlier detection, pattern matching, etc.) and together extend the analytical capabilities of big data. The first
manboubird 2016/04/09
surus

pig

hive

udf

netflix

pmml

anomalyDetection

predictiveAnalysis
リンク
GitHub - miguno/avro-hadoop-starter: Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
manboubird 2015/07/30
hive

pig

avro

hadoop
リンク
Cloudera Blog
manboubird 2015/07/18
cloudera

pig

tuning
リンク
Automated install of CDH5 Hadoop on your laptop with Ansible
manboubird 2014/09/22
ansible

cdh

setup

yarn

hive

Mahout

pig

Spark
リンク
February 2014 HUG : Pig On Tez
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with YarnDavid Kaiser
manboubird 2014/09/09
pig

tez

slide
リンク
Using Avro in MapReduce jobs with Hadoop, Pig, Hive
Apache Avro is a very popular data serialization format in the Hadoop techno logy stack. In this article I show code examples of MapReduce jobs in Java, Hadoop Streaming, Pig and Hive that read and/or write data in Avro format. We will use a small, Twitter-like data set as input for our example MapReduce jobs. Requirements Prerequisites Example data Avro schema Avro data files Preparing the input d
manboubird 2014/09/07
avro

hive

pig
リンク
Cloudera Blog
Enterprises see embracing AI as a strategic imperative that will enable them to stay relevant in increasingly competitive markets. However, it rem ains difficult to quickly build these capabilities given the challenges with finding readily available talent and resources to get started rapidly on the AI journey. Cloudera recently signed a strategic collaboration agreement with Amazon […] Read blog p
manboubird 2014/09/07
pig

Spark
リンク
PayPal Behavioral Analytics on Hadoop
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
manboubird 2014/07/27
paypal

timeSeriesAnalysis

reporting

slide

analytics

pig
リンク
[PIG-2167] CUBE operation in Pig - ASF JIRA
Computing aggregates over a cube of several dimensions is a common operation in data warehousing. The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" – which in addition to all dim1-2-3, produces aggregations for just dim1, just dim1 and dim2, etc. NULL is generally used to represent "all". A presentation by Arnab Nandi describes how one might implement efficient cubing in Ma
manboubird 2014/06/27
cube

pig
リンク
13 Things You Didn’t Know You Could Do with Pig | Mortar Blog | Data Science at Scale
manboubird 2014/06/17
pig

tips
リンク
Bits and pieces
manboubird 2014/06/15
cascalog

pig

comparizon

sql

mysql
リンク
Cloudera Blog
Enterprises see embracing AI as a strategic imperative that will enable them to stay relevant in increasingly competitive markets. However, it rem ains difficult to quickly build these capabilities given the challenges with finding readily available talent and resources to get started rapidly on the AI journey. Cloudera recently signed a strategic collaboration agreement with Amazon […] Read blog p
manboubird 2014/06/11
anomalyDetection

pig
リンク
https://media.blackhat.com/us-13/US-13-Hanif-Binarypig-Scalable-Malware-Analytics-in-Hadoop-Slides.pdf
manboubird 2014/06/11
anomalyDetection

pig

slide
リンク
BH Whitepaper.docx
manboubird 2014/06/11
anomalyDetection

pig

paper
リンク
Setting up spork with spark 0.8.1 - Apache Spark Documentation
manboubird 2014/05/28
Spark

pig

spork
リンク
1 2 3 4 次のページ

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx