Statistical Data Analysis in Python, SciPy2013 Tutorial, Part 1 of 4

preview_player
Показать описание
Presenter: Christopher Fonnesbeck

Description

This tutorial will introduce the use of Python for statistical data analysis, using data stored as Pandas DataFrame objects. Much of the work involved in analyzing data resides in importing, cleaning and transforming data in preparation for analysis. Therefore, the first half of the course is comprised of a 2-part overview of basic and intermediate Pandas usage that will show how to effectively manipulate datasets in memory. This includes tasks like indexing, alignment, join/merge methods, date/time types, and handling of missing data. Next, we will cover plotting and visualization using Pandas and Matplotlib, focusing on creating effective visual representations of your data, while avoiding common pitfalls. Finally, participants will be introduced to methods for statistical data modeling using some of the advanced functions in Numpy, Scipy and Pandas. This will include fitting your data to probability distributions, estimating relationships among variables using linear and non-linear models, and a brief introduction to Bayesian methods. Each section of the tutorial will involve hands-on manipulation and analysis of sample datasets, to be provided to attendees in advance.

The target audience for the tutorial includes all new Python users, though we recommend that users also attend the NumPy and IPython session in the introductory track.

Outline

Introduction to Pandas (45 min)

Importing data
Series and DataFrame objects
Indexing, data selection and subsetting
Hierarchical indexing
Reading and writing files
Date/time types
String conversion
Missing data
Data summarization
Data Wrangling with Pandas (45 min)

Indexing, selection and subsetting
Reshaping DataFrame objects
Pivoting
Alignment
Data aggregation and GroupBy operations
Merging and joining DataFrame objects
Plotting and Visualization (45 min)

Time series plots
Grouped plots
Scatterplots
Histograms
Visualization pro tips
Statistical Data Modeling (45 min)

Fitting data to probability distributions
Linear models
Spline models
Time series analysis
Bayesian models

Required Packages

Python 2.7 or higher (including Python 3)
pandas 0.11.1 or higher, and its dependencies
NumPy 1.6.1 or higher
matplotlib 1.0.0 or higher
pytz
IPython 0.12 or higher
pyzmq
tornado
Рекомендации по теме
Комментарии
Автор

Actual tutorial starts at 15.50. Upvote to save humanity some time.

robinsretrorestoration
Автор

Awesome simple and fast paced tutorial for using pandas. Saved me a lot of time!

HingolCalero
Автор

Very good tutorial. I learn panda using this tutorial, which saves me a lot of time. Thanks for uploading the videos!

Aprilreadygo
Автор

Excellent tutorial. Very helpful in learning more about Pandas.

xyznumber
Автор

A very good tutorial. The video quality is also high.

jiym
Автор

For statistics R is better for me, what do you think ?

eltorofuertos
Автор

This is a real good tutorial for beginners. Where can one get the data to follow the this tutorial.

elijahatuku
Автор

Help me out guys. I'm a beginner "programmer" (I don't deserve that title yet). I am interested in the learning data analysis in Python. This talk seems interesting but what are the prerequisites for it. Can a newbie follow and learn much from it? I don't want to invest time in it just to find that most of it way over my head.

mustafaadam
Автор

!cat data/microbiome.csv says cat not an internal or external recognized command

anshitsingh
Автор

Too much interruption/interaction from/with the audience :/

ZetaReticulli