Test Multiple Variables at Once to Optimize Anything

Показать описание

In this video I explore a multivariate experimental method called orthogonal (Taguchi) arrays.

Thanks to Guillotined Chemistry and Xoltri for their contributions that helped make this video happen. You'll find lots of useful info on their channels, especially in reference to making senko hanabi:

Thank you so much to those of you who support this channel on Patreon! Your support really helps give me confidence to spend my time researching projects that are of value for more than just video views. Shoutout to my top patrons: Eugene Pakhomov, Peter Gordon, Evan Hughes, Teague Lasser, Matthias S., Michel Pastor, PabloXIII, Parker Jones, Simone Chiesi, Steve C, Yanko Yankulov, Walter Montalvo, Carl Katzenberger, Damián Arrillaga, Dan L, Edward Unthank, Gusbear, Jon Hartmann, Kejie YU, Kirk Werklund, Lisa L, Mark Roth, PabloXIII, Santiago Perez, Steve C, Thibaud Peverelli, Tristan Tonks, WilSkarlet, Yanko Yankulov

Thanks everyone for watching!

NightHawkInLight

Рекомендации по теме

Комментарии

As a programmer i also want to highlight the method of trying the same thing again, without changing anything, expecting different results.

guntereisenherz

I'm gonna go out on a limb here and guess that this is the only video in the entire universe that covers boiling eggs and making Senko Hanabi in the same video.

isavedtheuniverse

I spent 20 years working materials and process development in the semiconductor industry and these design of experiment (DOE) methods are foundational to quickly identifying meaningful improvements to manufacturing processes.Great work articulating and demonstrating these methods.

nlabanok

As an engineer who spent many years in R&D, I just wanted to say thanks. Introducing system optimization and DOE concepts can help people understand that there are so many tools available to them. And, as you illustrated, their use is not confined just to labs and manufacturing. Your clear examples and approachable teaching style make your content so impactful to the community. Thanks again.

mhuebner

Another commenter has mentioned Bayesian Optimization, I just wanted to expand a bit on what the difference is, and why you might prefer a more general optimization framework over Taguchi arrays. First, it's important to mention that Taguchi arrays are only accounting for pairwise interactions: for any two variables we choose, all possible combinations of those two variables will appear in the array. In the wikipedia article on Orthogonal Arrays, they generalize this idea to choosing t variables, where t=2 is the Taguchi array, and t=n is the full factorial analysis. But we could also generate arrays for intermediate variables of t. These arrays would be helpful if we expect up to t variables to interact together to create an effect, which is often the case. But any time we are choosing t<n, we are losing the ability to detect certain effects.

Second of all, there is an interesting question of how many experiments to do for each entry in the array. If you imagine that you were making millions of sparklers, would it really make sense to only make 9 different versions of the formula? At some point there are diminishing returns for repeating the same formula more times, and it might be more valuable to spend those resources on experiments that could detect (weaker or less common) interaction effects of larger numbers of variables. Just like you can average over the Taguchi arrays, you could still average over these experiments to measure the effects of a single variable at a time, if the changes in the other variables "cancel each other out". Conversely, if you are running an experiment with a large number of variables, you might not have enough experiments to fill out every entry in the Taguchi array (i.e. you would run less than 1 experiment per row).

Third of all, when you are using Taguchi arrays, you are forced to discretize your variables, to a reasonable number of bins. You addressed this in the video, when you talked about how it only tells you directionally how to change the formula, but not how far you can go with the changes. But ultimately, what you care about is optimizing the continuous value. Actually, choosing discrete bins can be quite harmful to the optimization process. Imagine that the lampblack parameter was the ONLY parameter that had any effect on your experiment. You would end up running 9 different experiments, and at the end of the day, the only information you would have is which of the three different lampblack levels was the best. But really, you could have varied the level of lampblack (and all the other parameters) in all 9 experiments, so that you would have information about 9 different levels of lampblack, and you would be able to optimize the level of lampblack much more accurately with the same number of experiments. Note that we can have 9 different values for all parameters at the same time, using the same 9 experiments. Doing this makes it harder to directly measure the effect of each variable, , since the other variables are not controlled, but you can still do it approximately. (In other words, you are trading off how well you can explain why a certain combination of parameters gives better results, for the ability to find better combinations of parameters more quickly).

Once you've generalized your experiments in this way (continuously valued parameter values with Finally, when you are doing Bayesian Optimization/Hyperparameter tuning, you're going to have some "smarts" behind choosing the next experiment to run, based on the results of the experiments you have run *so far*. For example, if we imagine we are doing a tuning of a large number of variables, where we are going to run 100 experiments total, and after the first 50 experiments, you can see that adding a lower level of sulfur always made the sparkler completely fail. Would it still make sense to continue experimenting with lower levels of sulfur for the remaining 50 experiments? Probably not, because it's probably going to make those experiments fail too, and you won't be able to learn as much about the other variables from those experiments. That is basically the idea that these "smarter" optimizers will be applying, they are going to focus experiments in areas of the parameter space that seem promising, and spend less experiments on ideas that seem unlikely to succeed based on past experiments.

This all might seem super abstract or complex, but the reality is that you can just treat it as an off the shelf black box! There are open source tools (optuna, hyperopt, bayesian-optimization) that implement everything I discussed and more, and you can just feed in your parameter ranges, and your experiment data, and it will suggest the next experiment or set of experiments for you to run. Just keep feeding in your data, and it will guide you to optimal parameter values!

jeremybub

My father (an engineer) tried to explain this to me (a chemist) a couple of years ago. Unfortunately, he didn't understand how it worked and couldn't explain the process. This video was extremely informative and makes me want to study up on Design of Experiments, since I have heard that it is an extremely useful field for scientists. Thanks for sharing this with us!

DucktorThallium

I have used these methods in industry. Your presentation and explanation of the methods are really wonderful. I can see this inspiring many young scientists to improve their methods and develop better ways of doing nearly anything.

ronwoodward

Senko, microspheres, and process improvement all in one video is such a heavy hitter, this is peak YouTube content

ostahlarune

I could use this to improve my air cannons in the future. I've always wondered how much air pressure affected velocity, and to what extent. But along with that there's also tank volume, projectile mass, barrel length, valve response time, flow rate, and how tightly the projectile sits in the barrel. Now I can test all of these variables without doing hundreds of tests!

derrick

I'm an industrial quality analyst and I am OBSESSED with Genichi Taguchi, especially his Loss Function. It's joked in QC that you have to have a doctorate in mathematics in order to fully understand his ideas, even though he wasn't a doctor, just an ingenious engineer. He's a founding father of Quality Control, as well as Kaoru Ishikawa (inventor of the Fishbone Diagram), Dr. W. Edwards Deming (the "father of quality control"), Dr. Walter Shewhart (the inventor of the Control Chart and a mentor to Deming), etc.

Thanks for helping me understand Taguchi's experimentation method! I was never going to get it by just reading. the math is just insane

hedgeearthridge

This is one of the best YouTube channels. The clear method of communication, worthwhile projects, and deep insights like this make me feel like I get more from these videos than most others. Keep it up, I love what you're doing.

lambda_calc

As a process engineer, I've completed plenty of DOEs ... we use a statistical software call JMP. Among other things, JMP will design and analyze DOEs, telling you which of your factors are significant. It even has prediction sliders, letting you virtually adjust the levels for each factor.

brianmrzyglod

I feel so guilty for never commenting about design of experiments (DOE) before. I use these daily in research, Taguchi array is the one I use to introduce persons to DOE. There are so many different methods, especially for non-linear (non-orthogonal) analysis. Another cool one is the Mixture DOE, which is mainly for recipes.

arshadmohammed

As an egg farmer that tested egg pealing extensively, you are correct about egg age being the biggest factor. The factor you missed is cracking the eggs before cooling. My theory is flash cooling only loosens the shell if it is broken an can move as it contracts. Some say its because water can get in. But either way, cracking before cooling seems to help

RebelCowboysRVs

According to J. Kenji Lopez-Alt who ran a ton of experiments on egg peeling, the number 1 variable that makes eggs easier to peel is the temperature of the water in the pot when you start boiling your eggs. If you place the eggs in a cold pot of water and slowly bring it up to temperature, they will tend to be much more difficult to peel. Always boil the water first, then place the eggs into the boiling water.

--sql

I don't know whose gonna read but I'll leave it out here. Systems design engineer here. I do DoEs day in and day out, Taguchi method is an awesome tool to narrow down the design search space. When there are a handful of inputs Xs for a measured output Y you can control and intuitively understand, this is all you need to optimize <anything>. When there are a large number of Xs, there's something engineers do which is called parameter sensitivity/pareto analysis. There are many fancy ways to do this pareto chart but I try to keep the intuitive part alive. I min-max normalize the inputs and output between 0 to 1 and do a multiple linear regression. The coefficients of Xs gives its impact on Y and its sign gives the direction. This helps eliminate less influential inputs to further reduce the dimensionality of search space. Also, a cool way to visualize them is using a parallel-coordinates chart 😁

naviinprabhu

2k factorial and design of experiments is amazing. Sadly people (especially in industry) are not willing to make the effort to utilize these methods especially when the tests are time consuming or expensive. You did a great job explaining the benefits.

askquestionstrythings

Multivariate analysis was one of my favorite things about my STEM classes. It's a shame it's been missing from hobby science!

Jinakaks

How did I not know about that. It's so useful. Thanks for enlightening me.

MaxWithTheSax

I know absolutely nothing about the method you used to improve your sparklers..however, the second I saw your score sheet at @20:00 alarm bells start going off inside my head. You are scoring your sparklers on non uniform parameters which which in itself is not a mistake but because your values are so different you are effectively washing out all contribution from effects which have a numerically smaller measurement. This is a very important factor to consider and correct for when using something like PCA on a larger dataset. Image for example if we want to guess a persons likely hood to get a heart disease based on their eye color, height, weight and age. If we measure height in millimeters rather than centimers we will have HUGE values which will overshadow the weight in Kg and the age in years! Even if we only measure in centimeters or meters we still have vastly different scales. Furthermore we cant assign numbers to eye color as 1=green and 2=blue because then blue eyes are more important than green just because we chose so!
This means in order to have a well rounded sparkler you must standardize your scoreboard no matter how you measure success. You do this by subtracting the mean value of a score column from itself and then divide by the standard deviation. Do this for each column of results and now every column will have the same equal weight in your scoreboard.
This would allow "climbing ember" scores to have an equal impact on your evaluation compared to the "bursting sparks" score no matter how big or small the numbers you assign are.

dankelpuff

Test Multiple Variables at Once to Optimize Anything

Test Multiple Variables at Once to Optimize Anything

How to test multiple variables against a value?

Discover how Hotelling's T-square test compare multiple variables at once, #statistics, #Tsquar...

Normality Testing for Dependent Variables Across All Levels of Independent Variables in SPSS

How to test multiple variables against a single value Python?

Assigning single value to multiple variables at once (Selenium Python - Session 34)

How to Calculate a Correlation between Multiple Variables

The right way to declare multiple variables in Python

Linear regression full course session 122

performance - Declaring multiple variables in JavaScript

How to Tabulate Multiple Variables with Loop Command in STATA

#Python to check whether multiple variables have the same value

Excel Tutorial - Multiple conditions within an IF function

Special Lecture: Single Variable out to Multiple Variables

Local Extrema, Critical Points, & Saddle Points of Multivariable Functions - Calculus 3

Dummy Variables in Multiple Regression

Limits of Multivariable Functions - Calculus 3

How to Perform Paired T-Tests in R for Multiple Variables Like Sex-Based Comparisons

Bar Charts for Two Categorical Variables (part 1) | Stata Graphics

Excel IF Formula: Simple to Advanced (multiple criteria, nested IF, AND, OR functions)

Jamovi - Adding Levels to multiple variables at once

Dummy variables - interaction terms explanation

Risk ratio for two categorical variables in STATA #Shorts

Machine Learning Tutorial Python - 3: Linear Regression Multiple Variables