filmov
tv
Thomas J. Fan- Parallelism in Numerical Python Libraries | PyData NYC 2022

Показать описание
Python libraries such as NumPy, SciPy, or scikit-learn can run computational routines on multiple CPU cores. These libraries implement parallelism with a wide range of programming interfaces. We will learn when and how to use these interfaces by examining how Python libraries implement parallelism. Specifically, we will discuss high-level interfaces such as Python's multithreading and multiprocessing modules. We will cover lower-level parallel primitives such as pthreads and OpenMP. Some libraries use an ahead-of-time compiler like Cython or a just-in-time compiler like Numba to parallelize their computational routines. Throughout this talk, we will explore each interface's advantages, disadvantages, and potential issues for writing parallelized code. When multiple forms of parallelism run simultaneously, controlling how many cores your program uses is essential to prevent oversubscription. We will learn to use context managers with threadpoolctl, environment variables, and library-specific APIs to control parallelism. This talk is for an intermediate audience that wants to understand parallelism in the PyData stack.
Bio:
Thomas J. Fan
Thomas J. Fan is a Staff Software Engineer at Quansight Labs and is a maintainer for scikit-learn, an open-source machine learning library for Python. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas has a Masters in Mathematics from NYU and a Masters in Physics from Stony Brook University.
===
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Bio:
Thomas J. Fan
Thomas J. Fan is a Staff Software Engineer at Quansight Labs and is a maintainer for scikit-learn, an open-source machine learning library for Python. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas has a Masters in Mathematics from NYU and a Masters in Physics from Stony Brook University.
===
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.