[B! pandas] yubessyのブックマーク

yubessy id:yubessy

pandasに関するyubessyのブックマーク (18)

Vega-Altair: Declarative Visualization in Python — Vega-Altair 5.2.0 documentation
Vega-Altair is a declarative visualization library for Python. Its simple, friendly and consistent API, built on top of the powerful Vega-Lite grammar, empowers you to spend less time writing code and more time exploring your data.
yubessy 2019/05/12
visualization

Python

pandas
リンク
Pandasで行うデータ処理を100倍高速にするOut-of-CoreフレームワークVaex - フリーランチ食べたい
TL;DR アウトオブコア、かつマルチコアでデータ処理を行えるVaexの紹介です。 string関係のメソッドで平均して100倍以上の高速化が確認できました。(作者のベンチマークだと最大1000倍) 文字列処理以外でも数倍~数十倍の高速化が行えそうです。この記事では性能の比較のみ行い、解説記事は別で書こうと思います。 pandasより1000倍早いフレームワーク？今週、興味深い記事を読みました。重要な部分だけ抜き出すと次のような内容です。 Vaexの最近のアップデートでの文字列処理が超早くなった 32コアだとpandasと比べて1000倍早い towardsdatascience.com 1000倍って本当なの？って感じですよね。そもそも自分はVaex自体を知らなかったので調べてみました。ちなみに調べていて気づいたのですが、この記事の著者はVaexの作者なんですよね。疑っているわけ
yubessy 2019/04/14
python

pandas
リンク
multiprocessing in python - sharing large object (e.g. pandas dataframe) between multiple processes
I am using Python multiprocessing, more precisely from multiprocessing import Pool p = Pool(15) args = [(df, config1), (df, config2), ...] #list of args - df is the same object in each tuple res = p.map_async(func, args) #func is some arbitrary function p.close() p.join() This approach has a huge memory consumption; eating up pretty much all my RAM (at which point it gets extremely slow, hence mak
yubessy 2018/09/06
Python

pandas

multiprocessing
リンク
http://machinelearningexp.com/data-science-performance-of-python-vs-pandas-vs-numpy/
yubessy 2017/07/20
Python

NumPy

pandas
リンク
pandasql: Make python speak SQL
Introduction One of my favorite things about Python is that users get the benefit of observing the R community and then emulating the best parts of it. I'm a big believer that a language is only as helpful as its libraries and tools. This post is about pandasql, a Python package we (Yhat) wrote that emulates the R package sqldf. It's a small but mighty library comprised of just 358 lines of code.
yubessy 2016/10/15
Python

pandas

あとで読む
リンク
GitHub - c-bata/pandas-validator: Validation Library for pandas' DataFrame and Series.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
yubessy 2016/08/27
こういうの探してた

pandas

Python
リンク
Efficiently create sparse pivot tables in pandas?
I'm working turning a list of records with two columns (A and B) into a matrix representation. I have been using the pivot function within pandas, but the result ends up being fairly large. Does pandas support pivoting into a sparse format? I know I can pivot it and then turn it into some kind of sparse representation, but isn't as elegant as I would like. My end goal is to use it as the input for
yubessy 2016/07/07
pandas

sparse
リンク
Useful Pandas Snippets | Computers are for People
Even after almost two years of working with Pandas, the incredibly useful Python data analysis library, I still need to look up syntax for some common tasks. Finally got around to putting everything on a single “useful Pandas snippets” cheat sheet: these are essential tools for munging federal budget data.
yubessy 2016/05/02
Python

pandas
リンク
Jupyter Notebook Viewer
%load_ext watermark %watermark -a 'Sebastian Raschka' -v -d -p pandas
yubessy 2016/05/02
Python

pandas

あとで読む
リンク
Python Jupyter + pandas で DataFrame 表示をカスタマイズする - StatsFragments
先日 pandas v0.17.1 がリリースされた。v0.17.0 に対するバグフィックスがメインだが、以下の追加機能もあるためその内容をまとめたい。 HTML 表示のカスタマイズ Jupyer 上では pandasの DataFrame は自動的に HTML として描画される。この HTML に対して、さまざまな CSS を柔軟に設定できるようになった。このエントリでは、添付した公式ドキュメントとは少し違う例を記載する。 Style -- pandas documentation @TomAugspurger (コミッタの一人) 作成の Jupyter Notebook 重要公式ドキュメントにも記載がされているが v0.17.1 時点で開発中 / Experimental な追加のため、今後破壊的な変更が発生する可能性がある。ご要望やお気づきの点があれば GitHub issu
yubessy 2016/02/10
Jupyter

pandas

Python
リンク
12 Useful Pandas Techniques in Python for Data Manipulation
Introduction Python is fast becoming the preferred language in data science – and for good reason(s). It provides the larger ecosystem of a programming language and the depth of good scientific computation libraries. If you are starting to learn Python, have a look at learning path on Python. Among its scientific computation libraries, I found Pandas to be the most useful for data science operatio
yubessy 2016/01/08
pandas

tips
リンク
Jupyter から見た Treasure Data の使い方 - Qiita
Jupyter + Pandas-TD について何か書こうと思っていたところ、Cookpad の有賀さんによる素晴らしい紹介記事が！流れに便乗して、ここでは Pandas-TD の使い方をいくつか紹介したいと思います。データに素早くアクセスするために Pandas と Treasure Data を組み合わせるためにスタートした Pandas-TD ですが、最近はどちらかというとインタラクティブなデータ探索を楽にするために開発を続けています。その典型がマジック関数で、Jupyter を開いてすぐクエリを実行したいときに重宝します。時間を掛けてデータ分析するなら、素の Pandas 関数を使ってプログラミングする方がいいのですが、ちょっとしたログの調査のたびに Python でコードを書くのも面倒です。自動化できるところは自動化し、なるべく簡潔に欲しい結果を得られるようにするのがマジック
yubessy 2015/12/15
あとで読む

Jupyter

TresureData

pandas

あとで試す
リンク
Python pandas プロット機能を使いこなす - StatsFragments
pandas は可視化のための API を提供しており、折れ線グラフ、棒グラフといった基本的なプロットを簡易な API で利用することができる。一般的な使い方は公式ドキュメントに記載がある。 Visualization — pandas 0.17.1 documentation これらの機能は matplotlib に対する薄い wrapper によって提供されている。ここでは pandas 側で一処理を加えることによって、ドキュメントに記載されているプロットより少し凝った出力を得る方法を書きたい。補足サンプルデータに対する見せ方として不適切なものがあるが、プロットの例ということでご容赦ください。パッケージのインポート import matplotlib.pyplot as plt plt.style.use('ggplot') import matplotlib as mpl m
yubessy 2015/11/16
あとで読む

pandas

ビジュアライゼーショ

matplotlib

ggplot

Python
リンク
pandas 0.17.0 の主要な変更点 - StatsFragments
先日 10/9 に pandas 0.17.0 がリリースされた。直近のバージョンアップの中ではかなり機能追加が多いリリースとなった。重要な変更はリリースノートにハイライトとして列挙しているのだが、これらはある程度 pandas を使いこなしている方向けの記載となっている。そのため、ここではよりライトなユーザ向けに重要と思われる変更を書く。特に、ユーザ側のプログラムに影響がある以下の3点について記載する。ソート API の統合 ( sort_values / sort_index ) 重複削除 API の改善 ( drop_duplicates / duplicated ) .plot アクセサの追加準備 import numpy as np import pandas as pd np.__version__ # '1.10.1' pd.__version__ # u'0
yubessy 2015/10/18
pandas

python
リンク
GitHub - adamhajari/spyre: a web application framework for python
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
yubessy 2015/07/27
pandas

Python

spyre

あとで試す
リンク
Python pandas パフォーマンス維持のための 3 つの TIPS - StatsFragments
pandas でそこそこ大きいデータを扱う場合、その処理速度が気になってくる。公式ドキュメントではパフォーマンス向上のために Cython や Numba を使う方法を記載している。 Enhancing Performance — pandas 0.16.2 documentation が、軽く試したいだけなのにわざわざ Cythonや Numba を使うのは手間だし、かといってあまりに遅いのも嫌だ。そんなとき、pandas 本来のパフォーマンスをできるだけ維持するためのポイントを整理したい。 pandas に限らず、パフォーマンス改善の際にはボトルネックの箇所によってとるべき対策は異なる。pandas では速度向上/エッジケース処理のためにデータの型や条件によって内部で処理を細かく分けており、常にこうすれば速くなる！という方法を出すのは難しい。以下はこの前提のうえで、内部実装からみ
yubessy 2015/07/12
Python pandas パフォーマンス維持のための 3 つの TIPS

pandas

python
リンク
10 Minutes to Pandas — pandas 0.19.2 documentation
yubessy 2015/05/29
pandas

python

あとで読む
リンク
Python pandas 関連エントリの目次 - StatsFragments
このブログ中の pandas 関連のエントリをまとめた目次です。最近 pandas 開発チームと PyData グループの末席に加えていただき、パッケージ自体の改善にもより力を入れたいと思います。使い方についてご質問などありましたら Twitter で @ ください。目次につけた絵文字は以下のような意味です。 🔰: 最初に知っておけば一通りの操作ができそうな感じのもの。 🚧: v0.16.0 時点で少し情報が古く、機能の改善を反映する必要があるもの。 🚫: 当該の機能が deprecate 扱いとなり、将来的に代替の方法が必要になるもの。基本簡単なデータ操作を Python pandas で行う 🔰 Python pandas でのグルーピング/集約/変換処理まとめ 🔰 また、上記に対応した比較エントリ: R {dplyr}, {tidyr} Rの data.tab
yubessy 2015/04/29
pandas

python
リンク
1