Papers by Christopher Olston Research community reports Machine learning Enterprise data management Continuous data processing workflows Data pipeline programming & debugging Large-scale data processing Web monitoring & crawling Web search Web application scalability Distributed data monitoring Data visualization Research community reports Numerous co-authors. The Beckman Report on Database Resear
Why sessionization?¶ Sessionization is the act of turning event-based data into sessions, the ordered list of a user’s actions in completing a task. It is widely used in several domains, such as: Web analytics. This is the most common use, where a session is composed of a user’s actions during one particular visit to the website. You can think of this as a buying session on a e-business website fo
Today we’re announcing a new Netflix-OSS project called Surus. Over the next year we plan to release a handful of our internal user defined functions (UDF’s) that have broad adoption across Netflix. The use cases for these functions are varied in nature (e.g. scoring predictive models, outlier detection, pattern matching, etc.) and together extend the analytical capabilities of big data. The first
Apache Avro is a very popular data serialization format in the Hadoop technology stack. In this article I show code examples of MapReduce jobs in Java, Hadoop Streaming, Pig and Hive that read and/or write data in Avro format. We will use a small, Twitter-like data set as input for our example MapReduce jobs. Requirements Prerequisites Example data Avro schema Avro data files Preparing the input d
Enterprises see embracing AI as a strategic imperative that will enable them to stay relevant in increasingly competitive markets. However, it remains difficult to quickly build these capabilities given the challenges with finding readily available talent and resources to get started rapidly on the AI journey. Cloudera recently signed a strategic collaboration agreement with Amazon […] Read blog p
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
Computing aggregates over a cube of several dimensions is a common operation in data warehousing. The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" – which in addition to all dim1-2-3, produces aggregations for just dim1, just dim1 and dim2, etc. NULL is generally used to represent "all". A presentation by Arnab Nandi describes how one might implement efficient cubing in Ma
Enterprises see embracing AI as a strategic imperative that will enable them to stay relevant in increasingly competitive markets. However, it remains difficult to quickly build these capabilities given the challenges with finding readily available talent and resources to get started rapidly on the AI journey. Cloudera recently signed a strategic collaboration agreement with Amazon […] Read blog p
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く