[B! presto] yassのブックマーク

yass id:yass

prestoに関するyassのブックマーク (27)

Presto ベースのマネージドサービス Amazon Athena
Presto Meetup 201706 の発表資料です． https://techplay.jp/event/621143
yass 2017/08/26
Athena

presto
リンク
A Benchmark Test on Presto, Spark Sql and Hive on Tez
Presto、Spark SQLとHive on Tezの性能に関して、数万件から数十億件までのデータ上に、常用クエリパターンの実行スピードなどを検証してみた。 We conducted a benchmark test on mainstream big data sql engines including Presto, Spark SQL, Hive on Tez. We focused on the performance over medium data (from tens of GB to 1 TB) which is the major case used in most services.
yass 2016/11/26
presto

Spark SQL

tez

Hive

benchmark

hadoop
リンク
Presto anatomy
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
yass 2015/09/23
presto
リンク
Presto雑感 - wyukawa's diary
約1年間Prestoを運用していて気づいたことを書いてみようと思う。 Prestoが素晴らしいOSSプロダクトであることは間違いなくて、Hiveを使っている人はインストールして損は無いと思う。メリットは下記の通り Hiveに比べるとオンメモリで処理するので高速でアドホッククエリに向いている安定している。ストレージを持たないアーキテクチャなのでアップデートが簡単開発が活発。最近は以前に比べるとバージョンアップのスピードは落ちてきたがそれでも3週間に1回はバージョンアップしている。バグ報告すると数日で修正されたバージョンがリリースされる。開発がオープン。pull requestも受け付けておりコードレビューが丁寧コードが奇麗でモダンJavaの代表だと勝手に思ってる最近の変更を見る限りPrestoは安定性を重視しているように見え、これは僕のような管理者にとっては運用負荷が少なくな
yass 2015/09/09
presto

hadoop
リンク
SQL on Hadoop 比較検証【2014月11日における検証レポート】
Impala Meetup 2014/10/31 @Tokyo 講演資料【注意事項】本資料で紹介している検証結果は2014年当時のものです。当該ソフトウェアは成長や改善が早く、現時点のバージョンでは大きく異なる機能や性能となっています。 SQL on Hadoopの最新情報に基づくサービスやシステムインテグレーションにご興味をお持ちの方は、NTTデータ基盤システム事業本部 OSSプロフェッショナルサービス（電子メール： hadoop [AT] kits.nttdata.co.jp）にご相談ください。
yass 2014/11/05
Hadoop

benchmark

comparison

hive

Impala

sql

presto

impala

tez
リンク
SQL on Hadoop in Taiwan
Building large scale transactional data lake using apache hudiBill Liu
yass 2014/09/27
presto

hadoop

sql
リンク
PrestoとかAnsibleとかその辺の話を軽く書いてみる - wyukawa's diary
今日はPrestoとかAnsibleとかその辺の話を軽く書いてみようと思います。突っ込んだ話が出来るわけではないのであしからず。僕のところの環境ではPrestoを使っていて、PrestoはDataNodeやNodeManagerと同居してます。主なユースケースはアドホッククエリの実行です。とあるレポートを作りたいってなったときにデータの中身をチェックするのに使います。従来だとこれがHiveだったのですが、HiveだとMapReduceになって遅いので（ローカルモードで済む場合もあるけど）、その点Prestoは早くていいです。ただこれは僕の環境がスモールデータだからっていうのもあって、圧縮済み数百GBのデータに対してselectかけるとかだとPrestoといえども遅くなると思います。あとなにげに良いのがPresto CLI経由だとカラム名が表示されるのでどのデータがどのカラムなのかすぐ分か
yass 2014/08/03
" 以前は集計用RDBMSは必要かなあと思ってたんですけど、集計データを単純にselectするようなケースだったらPrestoでも十分速いので集計用RDBMSは無くてもいいかもって思い始めてます。"

ansible

presto

hive

hadoop
リンク
War of the Hadoop SQL engines. And the winner is ...? - Sonra
War of the Hadoop SQL engines. And the winner is …? You may have wondered why we were quiet over the last couple of weeks? Well, we locked ourselves into the basement and did some research and a couple of projects and PoCs on Hadoop, Big Data, and distributed processing frameworks in general. We were also looking at Clickstream data and Web Analytics solutions. Over the next couple of weeks we wil
yass 2014/07/28
" Right now I would run both batch style queries (ETL) and interactive queries on Hive Tez as Hive offers the richest SQL feature set, especially analytic functions and supports a wide set of file formats. "

hadoop

sql

hive

tez

impala

presto

spark

infinidb

drill
リンク
MPP on Hadoop, Redshift, BigQuery - Go ahead!
Twitterで「早く今流行のMPPの大まかな使い方の違い書けよ！」というプレッシャーが半端ないのでてきとうに書きます．この記事は俺の経験と勉強会などでユーザから聞いた話をもとに書いているので，すべてが俺の経験ではありません(特にBigQuery)．各社のSAの人とかに聞けば，もっと良いアプローチとか詳細を教えてくれるかもしれません．オンプレミスの商用MPPは使ったことないのでノーコメントです． MPP on HadoopでPrestoがメインなのは今一番使っているからで，Impalaなど他のMPP on Hadoop的なものも似たような感じかなと思っています．もちろん実装の違いなどがあるので，その辺は適宜自分で補間してください．前提アプリケーションを開発していて，そのための解析基盤を一から作る．簡単なまとめデータを貯める所が作れるのであれば，そこに直接クエリを投げられるPre
yass 2014/07/24
BigQuery

RedShift

Impala

Presto

Hadoop

mpp
リンク
Cloudera Blog
Riding the wave of the generative AI revolution, third party large language model (LLM) services like ChatGPT and Bard have swiftly emerged as the talk of the town, converting AI skeptics to evangelists and transf orming the way we interact with techno logy. For proof of this megatrend look no further than the instant success of ChatGPT, […] Read blog post
yass 2014/05/30
" Shark required more memory than available in the cluster to run the Reporting and Deep Analytics queries on RDDs (and thus those queries could not be completed) "

impala

hive

tez

shark

spark

presto

parquet

orcfile

benchmark

hadoop
リンク
Read Data in Parquet File Format by zhenxiao · Pull Request #1147 · prestodb/presto
yass 2014/05/16
presto

parquet
リンク
Netflix running Presto in the AWS Cloud
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Amazon Web Services
yass 2014/05/16
Netflix

presto

SequenceFile

parquet

hadoop
リンク
Presto in the cloud
yass 2014/05/16
Qubole

presto

hadoop

cloud
リンク
Announcing General Availability of Presto-as-a-Service | Qubole
yass 2014/04/30
Qubole

hadoop

presto

prestodb
リンク
CrateDB – The Enterprise Database for Time Series, Documents, and Vectors
/* Based on device data, this query returns the average * of the battery level for every hour for each device_id */ WITH avg_metrics AS ( SELECT device_id, DATE_BIN('1 hour'::INTERVAL, time, 0) AS period, AVG(battery_level) AS avg_battery_level FROM devices.readings GROUP BY 1, 2 ORDER BY 1, 2 ) SELECT period, t.device_id, manufacturer, avg_battery_level FROM avg_metrics t, devices.info i WHERE t.
yass 2014/04/19
" Crate Data is a distributed system that runs on one machine or a cluster of machines. Crate comes in one complete install package. It includes solid established open source components (Presto, Elasticsearch, Lucene, Netty) "

sql

presto

netty

lucene

elasticsearch

distributed
リンク
Presto Performance - Qubole Engineering Posts - Quora
Presto is an open source distributed SQL query engine, developed by Facebook. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses. Qubole started its Presto-as-a-Service program a few weeks ago to make it easily acces...
yass 2014/04/14
" Presto showed a speedup of 2-7.5x over Hive for these queries. "

presto

hadoop

hive

Qubole
リンク
What are the main differences between Facebook, Presto, and Amplab Shark?
Answer (1 of 2): 1. Primary Use Case: While both are intended for analytics, Shark's primary use case is providing SQL to an (extremely fast) in-memory database, with support also for on-disk (or abstract) data sources. Presto is designed to be a fast SQL engine for the latter, and does not have ...
yass 2014/02/14
" Presto has implemented some approximate aggregation operators with hard-coded characteristics (99% confidence intervals, fixed sampling, see BlinkDB "

presto

shark

spark

impala

comparison

blinkdb
リンク
Presto-as-a-Service:AWSでのインタラクティブなSQL実行
Rustが再評価される：エコシステムの現状と落とし穴 In this article, we share findings and insights about the Rust community and ecosystem and elaborate on the peculiarities and pitfalls of starting new projects with Rust or migrating to Rust from othe...
yass 2014/02/10
" 既にGithub上では2,000のスターが付き、350のフォークがあり、Impalaのような同種のプロジェクトよりも人気になっている。"

presto

hadoop

qubole
リンク
https://www.xtendsys.net/blog/post-1
yass 2014/02/09
presto

facebook

hadoop

hive
リンク
Hardware requirements for Presto
Most people are running Trino (formerly PrestoSQL) on the Hadoop nodes they already have. At Facebook we typically run Presto on a few nodes within the Hadoop cluster to spread out the network load. Generally, I'd go with the industry standard ratios for a new cluster: 2 cores and 2-4 gig of memory for each disk, with 10 gigabit networking if you can afford it. After you have a few machines (4+),
yass 2014/01/24
" At Facebook / We run our JVMs with a 16 gigabyte heap to leave most memory available for OS buffers / On the machines we run Presto we don't run MapReduce tasks / Most of the Presto machines we are on have 16 real cores and we use processor affinity to limit Presto to 12 cores "

presto

Facebook

hardware

server
リンク
1 2 次のページ