kimutanskのブックマーク - はてなブックマーク

Alerting 101: Timeseries checks

Looking for Datadog logos? You can find the logo assets on our press page.

kimutansk 2018/02/14

アラートをどうやって使い分けるかの基準。使いどころと、他の物を使うべき、のポイントはいいなぁ。

metrics

リンク

Comparison of the Open Source OLAP Systems for Big Data: ClickHouse, Druid, and Pinot

In this post I want to compare ClickHouse, Druid, and Pinot, the three open source data stores that run analytical queries over big volumes of data with interactive latencies. Warning: this post is pretty big, you may want to read just the “Summary” section in the end. Sources of InformationI learned the implementation details of ClickHouse from Alexey Zatelepin, one of the core developers. The be

kimutansk 2018/02/13

DruidやPinotの方がHadoopエコシステムに結合していてデータ階層化、テーブルの随時拡張等高機能ではあるものの、ClickHouseはシンプルに始められはすると。

リンク

Dynamometer: Scale Testing HDFS on Minimal Hardware with Maximum Fidelity

kimutansk 2018/02/12

HDFSのFSImageと、AuditLogを本番クラスタから読み込んで、アクセスパターンを再現、NameNodeの精度の高いパフォーマンステストを実行するというものですか。この考え方自体は色々応用できそうですね。

HDFS

リンク

KIP-101 - Alter Replication Protocol to use Leader Epoch rather than High Watermark for Truncation - Apache Kafka - Apache Software Foundation

kimutansk 2018/02/05

これを適用しても、メッセージが欠損するケースは軽減されるがやはりAckの設定次第で存在すると。あとはBroker間プロトコルだけではなくMessageFormatも更新しないと全適用されない・・？

kafka

リンク

Timely (and Stateful) Processing with Apache Beam

In a prior blog post, I introduced the basics of stateful processing in Apache Beam, focusing on the addition of state to per-element processing. So-called timely processing complements stateful processing in Beam by letting you set timers to request a (stateful) callback at some point in the future. What can you do with timers in Beam? Here are some examples: You can output data buffered in state

kimutansk 2018/02/04

Beamの文書、全般的にこれやるにはこれ必要だよな、というのがきちんと書かれているあたりがいいですね。そう簡単にはできないよ、これをやるためにはこの前提必要だよというは大事です。

beam

リンク

Stateful processing with Apache Beam

Beam lets you process unbounded, out-of-order, global-scale data with porta ble high-level pipelines. Stateful processing is a new feature of the Beam model that expands the capabilities of Beam, unlocking new use cases and new efficiencies. In this post, I will guide you through stateful processing in Beam: how it works, how it fits in with the other features of the Beam model, what you might use

kimutansk 2018/02/03

Beamを使うかどうかはさておき、こういう視点や区切りによって管理されるという考え方は読んでいて面白いですね。

stream
beam

リンク

How to select the first row of each group?

Window functions: Something like this should do the trick: import org.apache.spark.sql.functions.{row_number, max, broadcast} import org.apache.spark.sql.expressions.Window val df = sc.parallelize(Seq( (0,"cat26",30.9), (0,"cat13",22.1), (0,"cat95",19.6), (0,"cat105",1.3), (1,"cat67",28.5), (1,"cat4",26.8), (1,"cat13",12.6), (1,"cat23",5.3), (2,"cat56",39.6), (2,"cat40",29.7), (2,"cat187",27.9), (

kimutansk 2018/01/29

こういう様々な記述パターンが１質問に例と共にまとまっているのは相応にありがたいですね。

spark
sql

リンク

Building The Analytics Team At Wish Part 4— Recruiting

kimutansk 2018/01/28

データエンジニア、アナリストからなるデータ分析基盤チームのやるべきことと拡大する際に気を付けるべき４部作。面白い。最後にBigBrotherでのオファーがあるところもそれっぽくていいですねw

リンク

Foundations of streaming SQL [Strata NYC 2017]

Foundations of streaming SQL or: how I learned to love stream & table theory Tyler Akidau Apache Beam PMC Software Engineer at Google @takidau Covering ideas from across the Apache Beam, Apache Calcite, Apache Kafka, and Apache Flink communities, with thoughts and contributions from Julian Hyde, ...

kimutansk 2018/01/21

MapReduceをTable>Stream>Tableの流れで説明するの、わかりやすいですね。気になるのはAtWatermarkをどう判断するかですが、最適解があるわけでもないので、ドキュメントもきちんと読もう・・

リンク

Should You Put Several Event Types in the Same Kafka Topic? | Confluent

Should You Put Several Event Types in the Same Kafka Topic? If you adopt a streaming platform such as Apache Kafka, one of the most important questions to answer is: what topics are you going to use? In particular, if you have a bunch of different events that you want to publish to Kafka as messages, do you put them in the same topic, or do you split them across different topics? The most importan

kimutansk 2018/01/19

Topicをどう分けるかについての話。順序関係を保つ必要があるか否かが最重要判断基準など、内容は参考になりますし、あとはスキーマの適用戦略選択できるようになったのか。Schema Registry

kafka

リンク

10 Principles for Streaming Services | Confluent

kimutansk 2018/01/16

Topic名Entity化、書き込みは単一サービス、Publisher/Subscriber分離、逆コーンウェイ戦略を適用し、組織とコミュニケーションを設計などなど。面白い。

stream

リンク

Kafka 2018 - Securing Kafka the Right Way

kimutansk 2018/01/16

完全に読み違えてましたがConsume時にSSL通すとZeroCopy喪失すると。で、この資料のベンチマーク、よく見るとBrokerのCPU使用率、Consumeされる際に爆上がりなんですが・・・やばいですね。

リンク

How To Size Your Apache Flink Cluster Back-of-the-Envelope Calculation

kimutansk 2018/01/15

こんな感じでネットワーク流量を基に実際にどこまで使えそうなの？という限界性能出して落とし込んでいくアプローチも面白い。

flink
stream

リンク

https://jobs.zalando.com/tech/blog/rock-solid-kafka/index.html

kimutansk 2018/01/10

Zookeeperのクラスタでホスト吹っ飛んだ場合、完全な物理上で運用してると厄介でしたが、ENIやら適切な仮想化基盤使えばこのあたり対応できるんですよね。

リンク

Alerting 101: Status checks

In our Monitoring 101 series, we introduced a high-level framework for monitoring and alerting on metrics and events from your applications and infrastructure. In this series we’ll go a bit deeper on alerting specifics, breaking down several different alert types. In this post we cover four types of status checks that poll or ping a particular component to verify if it is up or down: Host checksSe

kimutansk 2018/01/09

各チェック項目がこの４つのカテゴリにどれに属するかや、あとはこの項目のうち必要なものを網羅しているかのチェックには使えそうです。Host/Service/Process/Networkですか。

monitoring

リンク

Monitoring 101: Investigating performance issues

Looking for Datadog logos? You can find the logo assets on our press page.

kimutansk 2018/01/09

Work->Resources->Eventsの順で掘り下げていくところや、Resourcesからそれを基としたWorkに落とし込むところ、あと忘れてはならないことなど、納得感がありますね。

monitoring

リンク

Monitoring 101: Alerting on what matters

Looking for Datadog logos? You can find the logo assets on our press page.

kimutansk 2018/01/09

「Page on symptoms」の個所、アラートレベルのPageと、情報のまとめであるPageでなんか頭の中がこんがらがってきますが・・・ただ、レベル決めの視点や特別扱いするべきものとかはその通りですか。

monitoring

リンク

Monitoring 101: Collecting the right data

Looking for Datadog logos? You can find the logo assets on our press page.

kimutansk 2018/01/08

監視についてひとまずデフォルトで流すのではなく、グループ化しての設計、適切な粒度決定、タグ付けでグループ化、レベル分けして対応を明確化と。

monitoring

リンク

Blog Posts | February 15, 2024 - October 25, 2022 | @lightbend

Kalix We run it for you. Event-driven microservices and API’s with no operations required. Akka Self-managed frameworks and runtimes for event-driven microservices and APIs.

kimutansk 2018/01/08

Kafka Streams用のScalaラッパーですか。Scalaユーザとしてはありがたい話ではありますが、Lightbendがこういうのを出してくるんですね。

kafka
stream

リンク

Performance Optimizations in Apache Impala

Apache Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. Impala provides low latency and high concurrency for BI/analytic read-mostly queries on Hadoop, not delivered by batch frameworks such as Hive or SPARK. Impala is written from the ground up in C++ and Java. It maintains Hadoop’s flexibility by utilizing standard componen

kimutansk 2018/01/05

この資料面白いですね。Impalaがやっていることもですが、比較的汎用的にSQL処理系に使える内容も相応に含まれている。最後のキャッシュミスが性能に与える影響もいい。

リンク

はてなブックマーク

タグ

kimutanskのブックマーク (6,279)

お知らせ

今週のはてなブックマーク数ランキング（2024年5月第2週）

今週のはてなブックマーク数ランキング（2024年5月第1週）

月間はてなブックマーク数ランキング（2024年4月）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス