Here’s a Better Way to Manage Configs in Your Data Science Projects

preview_player
Показать описание
Dealing with configuration management neatly in your project is going to save you a lot of time. In this video, I show you how to use Hydra to make managing configurations a breeze.

🎓 Courses:

👍 If you enjoyed this content, give this video a like. If you want to watch more of my upcoming videos, consider subscribing to my channel!

👀 Channel code reviewer board:
- Yoriz
- Ryan Laursen
- Sybren A. Stüvel
- Dale Hagglund

🔖 Chapters:
0:00 Intro
0:51 Explaining the example
5:01 What makes dealing with configs hard?
6:30 Accessing configuration settings
8:59 Configuration settings with Hydra
24:09 Using multiple files and folders
27:11 Tips for dealing with configuration settings

#arjancodes #softwaredesign #python

DISCLAIMER - The links in this description might be affiliate links. If you purchase a product or service through one of those links, I may receive a small commission. There is no additional charge to you. Thanks for supporting my channel so I can continue to provide you with free content each week!
Рекомендации по теме
Комментарии
Автор

For those looking for a solution that doesn't require you to create the store, work with decorators, etc. while still allowing dynamic, nested configs, I would recommend looking more into Omega Conflig. It's the library that Hydra is built on top of, and allows for defaults, nested configurations, multiple files, etc. but you can then create your own method or class to customize how configs are read and passed around the program.

Vanessa-vzcw
Автор

Arjan, I really appreciate that you cover topics that no one else does and/or you do so in a very in depth way. The level of professionalism that you bring to these topics is unlike anything else out there and I’m loving it.

cetilly
Автор

I’ve seen a few people saying they think the overhead is more work than benefit. I would probably agree for some workflows but I have to say for machine learning or any kind of work with tons of configuration parameters and experimentation this library is a life saver. It cleans up the code a lot

kayb
Автор

I am usually onboard with what you have to say. But here you define the configuration both in the yaml-file and in the data class, seems like a lot of added code to solve this problem.

niklase
Автор

Great video!
I agree with Hydra being a bit too convoluted, that's why I limit my projects to work with Omegaconf (which is also how Hydra works under the hood). You should give it a try.

Xaelum
Автор

I'm a big fan of JSON config files coupled with Pydantic to cast the configuration into a type-friendly structure. As a previous video of yours emphasized, this also carries the ability for the config to be validated and/or casted (i.e. automatic conversion to Path types from strings).

mauricepasternak
Автор

Nice video as per usual!

One small thing which can be improved related to pathlib is the following syntactic sugar:

data_path = Path(root_dir) / data_file

this way you don't have to use f-strings and it's a bit clearer in my opinion.

mrswats
Автор

Excellent observation! I am a configuration engineer, honestly that is really just my title, but... yes indeed, configuartions at the top push values down into lower base classes which leads to functions with lots of args. This does reduce cohesion and can make your code a bit... noisy. Then again... some globals littered around everywhere introduce coupling...

kayakMike
Автор

I'm disappointed that Pydantic isn't used or mentioned in this video. That seems like a perfect fit for configuration PLUS type checking.

Define your configuration structure as a Pydantic model, load a configparser, JSON or yaml file, and just go. Type hints are automatic. Seems obvious. 😊

traal
Автор

When I saw the title I got excited to hopefully find something new, but I don't know if this video scratched that itch.
I'd have hoped for a comparison of different approaches. Putting configuration constants in a file is not great, but how about a global settings.py file with a Config object, possibly a dataclass, that you can import into each module? How about passing it as arguments via dependency injection? What specifically makes hydra preferable? I guess what I had hoped for is higher level concepts and design principles, instead we got a showcase for one very specific implementation.

What I currently use is a settings.py file that is user editable (it could also be yaml or json files, doesn't really matter all too much). Then there's a config.py file that has the config_factory which can create different objects for different configurations, e.g. production vs development environments, it also uses python-dotenv to get "secrets" that should never end up in a code repository, the main project file will then import the config-factory and create a global object that can be imported into every sub-module.

frustbox
Автор

opinion: the most elegant solution is to put any config in a config.py file. Any (more dynamic) execution time variables could be passed via command-line arguments (e.g. using the fire lib)

walterppk
Автор

Hi Arjan, great video as always. The cuts when you speak seem a little jarring, more like a stop motion video from time to time. Maybe it's just me, but I thought I should point it out. Great content as always!

SarveshShah
Автор

to me it does not look very convenient to decorate a function just to have the configuration variables load in
it clutters the code and decorating also messes with the debug trace

Tweakimp
Автор

Why not just use configparser? It comes with Python's standard library and is also structured.

kjeldgaard
Автор

The biggest issue with this approach is that it's usually impractical to define schema in such configuration files. You need a lot of flexibility in DS projects so many config fields are often determined on-fly. Say you choose a specific optimizer that works well with different learning rate schedules. Just go over Keras/Torch schedules and see the variety of arguments they accept in their constructors. There is no way to built a consistent schema for all of them. One solution would be to have a different dataclass for each schedule, but then your main config needs some huge awkward Union.
It's much easier to just use dicts for all nested configurations and unpack them with double stars when calling constructors and factory methods.

pawelkubik
Автор

Love your videos. And wow, that microphone is in a very very good mood!

fuba
Автор

Since configuration is essentially partial function application, I often use OOP version of partial application for configuring scripts.
Each distinct script part is a separate class with __init__ and __call__ methods only.
__init__ method accepts untyped dict or reads json/toml/yaml from a specified location and assigns data to typed fields.
then the __call__ method becomes an efficient partially applied function.
With such setup you can easily have multiple configurations for the same script, either from separate config files or created on the fly.
And if you need a new configuration approach, you just create a classmethod for it.
Hydra looks like a fancy and overcomplicated way to do that same thing.

Daniel_Zhu_af
Автор

Really appreciate you introducing Hydra. It does look a bit more involved for what you get out of it. I also do NOT like YAML. I work as an AWS Cloud Architect for 7 years and hate it. I prefer anything but YAML. It is unnecessary and just provides extra syntax to deal with. Many of the frameworks for clouds are now starting to create libraries that output the JSON and YAML so you don't have to write any YAML.

For now, I'm a python-dotenv, ConfigParser, and Pathlib fan. I write a good README file that explains the parameters to config/alter. And then I have a config.py file that handles and parses all my text/json files of which I then import into my files as needed in the project.

paul_devos
Автор

How does Hydra work with pytest? That is, I'd like Hydra to provide a configuration object to a test without pytest thinking that the argument is supposed to be a fixture?

nickeldan
Автор

Really awesome video; exactly what I was looking for. Thanks Arjan!

cameronball