[B! python] hagino_3000のブックマーク

Jupyter Notebook Viewer

hagino_3000 2021/09/02

"Probabilistic-Programming-and-Bayesian-Methods-for-Hackers" MCMC, PyMC

python

リンク

GX: a proactive, collaborative data quality platform

Have confidence in your data,no matter whatBuilt on the strength of a robust worldwide data quality community, the Great Expectations platform is revolutionizing data quality and collaboration. A shared understanding of your data Getting everyone on the same page is essential to deriving business value from data. Great Expectations offers an intuitive approach to testing data that automatically ge

hagino_3000 2021/08/19

data profiling

python

リンク

データサイエンスレガシーコードに立ち向かう #reprotech

2019/04/04に開催された、 Repro Tech #7 Practical AI Supported by NAVITIME で発表した資料です。 https://repro-tech.connpass.com/event/124326/

hagino_3000 2019/04/08

python

リンク

Python の Flake8 と Mypy のプラグイン作った - Memo

tl;dr Flake8 と Mypy のチェックが動く Vim プラグイン作った。 Flake8 のプラグインは何個あるんだって感じだが、GitHub - kevinw/pyflakes-vim: on the fly Python checking in Vim with PyFlakespyflakes-vim はとっくに deprecated だし、Python3 の type-hint な文法は解釈できないし、既存の Flake8 なプラグイン(Syntactic, neomake) は保存をしないとチェックが走らないし、Syntactic に至っては遅いし、khuno.vim は保存しなくてもリアルタイムにチェックしてくれるが、QuickFix 使わず独自のバッファでエラーリスト管理してるし、どれも気に入らないから結局自分で作った。 GitHub - heavenshell/v

hagino_3000 2017/10/18

これはナイス

python

リンク

pytest-quickcheck

Classifiers Development Status 4 - Beta Intended Audience Developers Operating System MacOS :: MacOS X Microsoft :: Windows POSIX Programming Language Python Python :: 2.7 Python :: 3 Python :: 3.7 Python :: 3.8 Python :: 3.9 Python :: 3.10 Topic Software Development :: Libraries Software Development :: Quality Assurance Software Development :: Testing Utilities Requirements Python 2.7 or 3.7 and

hagino_3000 2017/09/16

python

リンク

GitHub - cookiecutter/cookiecutter-django: Cookiecutter Django is a framework for jumpstarting production-ready Django projects quickly.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

hagino_3000 2017/05/17

python

リンク

Python pandas 欠損値/外れ値/離散化の処理 - StatsFragments

データの前処理にはいくつかの工程がある。書籍「データ分析プロセス」には欠損など前処理に必要なデータ特性の考慮とその対処方法が詳しく記載されている。が、書籍のサンプルは R なので、Python でどうやればよいかよく分からない。同じことを pandas でやりたい。データ分析プロセス (シリーズ Useful R 2) 作者: 福島真太朗,金明哲出版社/メーカー: 共立出版発売日: 2015/06/25メディア: 単行本この商品を含むブログ (2件) を見るとはいえ、pandas 自身は統計的 / 機械学習的な前処理手法は持っていない。また Python には R と比べると統計的な前処理手法のパッケージは少なく、自分で実装しないと使えない方法も多い。ここではそういった方法は省略し、pandas でできる前処理 / 可視化を中心に書く。また、方法自体の説明は記載しないので、詳細

hagino_3000 2016/02/07

リンク

私が選ぶ2015年の”新しい”Pythonモジュールトップ5 | POSTD

最近、このモジュールを妻に紹介したところ、そのシンプルさと実用性に驚いていました。 joblib joblibの存在は以前から知ってはいたものの、実際のところはよく理解しておらず、いろいろな機能を寄せ集めたようなモジュールだと思っていました。まあ、その印象は今もあまり変わりませんが、実は非常に便利なモジュールだったのです。私は Flowminder の同僚から再度joblibを勧められて、このモジュールをデータ分析用のコードに幅広く使用しました。では、その機能について紹介しましょう。joblibは大きく分けて、キャッシング、並列化、永続化（データの保存と読み込み）の3つの機能から成ります。実を言うと、私はまだ並列プログラミングの機能は使ったことがないのですが、あとの2つの機能は頻繁に使ってきました。キャッシング機能とは、シンプルなデコレータを使って、関数を簡単に”メモ化”する

hagino_3000 2016/01/31

python

リンク

Luigi逆引きリファレンス - Qiita

import luigi class MyTask(luigi.Task): date = luigi.DateParameter() def requires(self): return MyDependentTask(self.date) def run(self): with self.output().open('w') as output: with self.input().open('r') as input: for line in input: ret = do_something(line) output.write(ret) output.write('\n') def output(self): return luigi.LocalTarget('./out2_{0}.txt'.format(self.date.isoformat())) class MyDepen

hagino_3000 2016/01/20

PythonのLuigiあれこれメモ作った

python

リンク

GitHub - amontalenti/elements-of-python-style: Goes beyond PEP8 to discuss what makes Python code feel great. A Strunk & White for Python.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

hagino_3000 2016/01/08

よさそう。でも Flat is better than nestedの例はネストした形で書いちゃうかなあ

python

リンク

Pythonで整数計画問題・線形計画問題を解く（PuLP編） - Kazuhiro KOBAYASHI

実装方法その2 PuLPのDocumentを全面的に参考にして，集合分割問題（vehicle routing probl emのための）の異なる実装方法を試してみた．こららの方が使いやすいことがあるかもしれない．定式化はその１とおなじなので，その１のところの記述を参考にしてください． (以前書いた実装方法その1へ飛ぶ) これから出てくるコードは，表示のために適宜改行しているので，コピーペーストすると動かないので，適宜改行を削除してください．まず，集合分割問題の定義は次のとおり 1: def DefMasterProbl em(vehicles, vehicle_feasible_routes,cargos,route_cost, vartype): 2: feasible_routes={} 3: for v in vehicles: 4: feasible_routes+=vehicle_

hagino_3000 2015/12/30

python
pulp

リンク

Python pandas パフォーマンス維持のための 3 つの TIPS - StatsFragments

pandas でそこそこ大きいデータを扱う場合、その処理速度が気になってくる。公式ドキュメントではパフォーマンス向上のために Cython や Numba を使う方法を記載している。 Enhancing Performance — pandas 0.16.2 documentation が、軽く試したいだけなのにわざわざ Cythonや Numba を使うのは手間だし、かといってあまりに遅いのも嫌だ。そんなとき、pandas 本来のパフォーマンスをできるだけ維持するためのポイントを整理したい。 pandas に限らず、パフォーマンス改善の際にはボトルネックの箇所によってとるべき対策は異なる。pandas では速度向上/エッジケース処理のためにデータの型や条件によって内部で処理を細かく分けており、常にこうすれば速くなる！という方法を出すのは難しい。以下はこの前提のうえで、内部実装からみ

hagino_3000 2015/10/19

リンク

PyConJP 2015: pandas/Daskについてお話させていただきました - StatsFragments

10日、11日と PyCon JP に参加させていただきました。ご参加いただいた皆様、スタッフの皆様ありがとうございました。資料はこちらになります。 pandas internals パフォーマンス向上のための pandas 内部実装の説明といくつかの TIPS について。そのうち翻訳するかもしれません。 speakerdeck.com Dask: 軽量並列分散フレームワーク (LT) speakerdeck.com 元ネタ以下のエントリをベースに、それぞれ新しい内容を追加しています。 sinhrks.hatena blog.com sinhrks.hatena blog.com

hagino_3000 2015/10/15

リンク

What Python can learn from Erlang - 2015.2

What can we learn from Erlang for building reliable high concurrency services in Python? This talk shows some techniques used in Erlang and how they can be used to solve probl ems in a more efficient way in Python.

hagino_3000 2015/10/10

pycon
python

リンク

Welcome to Invoke! — Invoke documentation

Welcome to Invoke!¶ Invoke is a Python (2.7 and 3.4+) library for managing shell-oriented subprocesses and organizing executable Python code into CLI-invokable tasks. It draws inspiration from various sources (make/rake, Fabric 1.x, etc) to arrive at a powerful & clean feature set. To find out what’s new in this version of Invoke, please see the changelog. The project maintainer keeps a roadmap on

hagino_3000 2015/09/16

fablicライクなデプロイツール

リンク

Python のバージョン毎の違いとその吸収方法について - CUBE SUGAR CONTAINER

この記事の目指すところ現在 Python はバージョン 2.x 系と 3.x 系という、一部に互換性のないふたつのメジャーバージョンが併用されている。その上で、この記事にはふたつの目的がある。ひとつ目は、2.x 系と 3.x 系の違いについてまとめること。現状、それぞれのバージョン毎の違いはまとまっているところが少ない。自分用に、このページだけ見ればひと通り分かる！っていうものがほしかった。ふたつ目は、2.x 系と 3.x 系の違いを吸収するソースコードの書き方についてまとめること。こちらも Web 上にナレッジがあまりまとまっていない。これについては今 python-future というパッケージがアツい。尚、サポートするバージョンは以下の通り。 2.x 系: 2.6 と 2.7 3.x 系: 3.3 と 3.4 本題に入る前に、最近の Python 事情についてまとめ

hagino_3000 2015/09/07

python

リンク

Large Scale Non-Linear Learning (Pygotham 2015)

Out of core learning with scikit-learn, and why not to use a cluster.

hagino_3000 2015/08/18

リンク

投票: アドネットワークのデータ解析チームを支える技術 | PyCon JP 2015 in TOKYO

# アドネットワークとはアドネットワークとはインターネットのディスプレイ広告領域において、複数の広告主と複数のメディア(広告枠)を束ねて広告を配信する仕組みです。アドテクやプログラマティック取引という言葉があるように、ネット広告分野では様々な技術が利用されています。 ## データ解析チームとPython データ解析チームのミッションは配信アルゴリズムの改良やデータ分析によって効果的な施策を導き出す事により収益性を向上させる事です。その中でデータ収集からアルゴリズムの実験、効果検証に至るまで様々な所でPythonを利用しています。次のタスクを例に、どの様にPythonを活用しているかを紹介します。 - 分析基盤の構築 - データ収集 - 配信の最適化 - CTR予測 - クリックデータ分析 - CPC最適化 - 多腕バンディット問題 - レポート作成主に次に挙げる物を使っています - N

hagino_3000 2015/07/16

プロポーザル通りました

python

リンク

The Python Challenge

The first programming riddle on the net There are currently 33 levels. Click here to get challenged What people have said about us: "These sorts of things are in my opinion the best way to learn a language.", brberg at Media Cloisters "It's the best web site of the year so far.", Andy Todd at halfcooked "Addictive way to learn the ins and outs of Python.. a must for all programmers!", salimma at s