Hanna Meyer: 'Machine-learning based modelling of spatial and spatio-temporal data'

preview_player
Показать описание
Remote sensing is a key method in bridging the gap between local observations and spatially comprehensive estimates of environmental variables. For such spatial or spatio-temporal predictions, machine learning algorithms have shown to be a promising tool to identify nonlinear patterns between locally measured and remotely sensed variables. While easy access to user-friendly machine learning libraries fosters their use in environmental sciences, the application of these methods is far from trivial. This holds especially true for spatio-temporal since its dependencies in space and time bear the risk of overfitting and considerable misinterpretation of the model performance.
In this introductory lecture I will introduce the idea of using machine-learning for the (remote sensing based) monitoring of the environment and how they can be applied in R via the caret package. In this context error assessment is a crucial topic and I will show the importance of "target-oriented" spatial cross-validation strategies when working with spatio-temporal data to avoid an overoptimistic view on model performances. As spatio-temporal machine-learning models are highly prone to overfitting caused by misleading predictor variables, I will introduce a forward feature selection method that works in conjunction with target-oriented cross-validation from the CAST package.
In summary this talk aims at showing how "basic" spatial machine-learning tasks can be performed in R, but also what needs to be considered for more complex spatio-temporal prediction tasks in order to produce scientifically valuable results. Based on this talk, we will go into a practical session on Tuesday, where machine-learning algorithms will be applied to two different spatial and spatio-temporal prediction tasks.

Рекомендации по теме
Комментарии
Автор

i think there was a misunderstanding in the last question asked at the end 50:31 :
there are of course no data available in the response variable in the more remote areas of antarctica. the question was how a different approach to cross validation will get better predictions for those areas.

perfectmoments
Автор

Awesome video. Big shout out from brazil

theforester_
Автор

Thanks for sharing, it's helpful for me!

gezahagnnegash
Автор

thanks for posting, very helpfull and interesting

ritwek
Автор

In the end, there is a mixing of two factors here: features and the CV method. Therefore it is not possible to understand what the effect of the CV method is.

In the lecture, it seems that the problem is with the features of the coordinates, which cause overfitting, and indeed in the solution there was a reference to this with the Feature Selection by FFS, where the aforementioned features were indeed removed from the training. Therefore, whether one or another method is used for CV, the factors For overfitting are the features and not the CV method, at least in this case.

Only if the model was trained with the help of Spatial CV together with the features of the coordinates and did not reach overfitting, would it be possible to conclude that indeed the CV method is the cause and solution for this.

natannvw
Автор

This seems to be completely disconnected from the field of climate informatics, and all the sophisticated methods they use there, no mention of phsyically informed, deterministic models which already make good global predictions, all things regarding data assimilation, it seems weird to ignore this. This talk boils down to quite simple things: we have observations and we model them with simple ML models becasue they can deal with complex relationships. We validate these algorithms appropriately. Not much more than that, when a key issue, *what exactly it is you are trying to model* aside from tree species, is right there for discussion.

TheSwordfish-gr