Anna Herlihy - Monary Really Fast Analysis with MongoDB and NumPy

preview_player
Показать описание
PyData NYC 2014
MongoDB is a scalable, flexible way to store large data sets. Python and NumPy provide a comprehensive toolkit for analysis. But they don't work well together: the official Python driver for MongoDB is inefficient at loading MongoDB data into NumPy arrays. Enter Monary. It's a fast, specialized driver written in C, that copies data directly from MongoDB documents into NumPy arrays. This talk will provide an introduction Monary, and practical demonstrations of Monary's speed benefits and uses. We'll use Monary to store data about millions of New York taxi rides in MongoDB, and we'll analyze it using scientific Python tools to find surprising outcomes about stingy riders and long-suffering drivers. The combination of MongoDB, Monary, and NumPy is very powerful: it's a data analysis pipeline that is scalable, convenient, and completely free and open source. 00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.

Рекомендации по теме
Комментарии
Автор

Could it not be the case that there are fewer trips on weekends - simply because more cab drivers work Monday-Friday ? i.e. cab availability is lower ?

paulmcgrath
Автор

Wow the reason of fast speed is a python dict.. Great!!

PyMongo
150, 000/s
Monary
1, 700, 000/s

KyunghoonKim