Data Preprocessing with MATLAB

preview_player
Показать описание
Data preprocessing is the task of cleaning and transforming raw data to make it suitable for analysis and modeling. Preprocessing steps include data cleaning, data normalization, and data transformation. The goal of data preprocessing is to improve both the accuracy and efficiency of downstream analysis and modeling.

Raw data often includes missing values and outliers, which can lead to erroneous conclusions during analysis. You can use MATLAB® to apply data preprocessing techniques such as filling missing data, removing outliers, and smoothing, enabling you to visualize attributes such as magnitude, frequency, and nature of periodicity.

Data preprocessing techniques can be grouped into three main categories: data cleaning, data transformation, and structural operations. These steps can happen in any order and iteratively.

Choosing the right data preprocessing approach is not always obvious. MATLAB provides both interactive capabilities (apps and Live Editor tasks) and high-level functions that make it easy to try different methods and determine which is right for your data. Iterating through different configurations and selecting the optimal settings will help you prepare your data for further analysis.

Related Resources:

--------------------------------------------------------------------------------------------------------

© 2024 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc.
Рекомендации по теме
Комментарии
Автор

Very good presentation. I hope "Steve" gets his act together soon!

nandi
Автор

The way we store data is (needlessly, imo) flawed, tracing back to the pen-and-paper notations, as "Steve" failed to do. We often assume a time-series, and a constant sampling interval. There is no annotation, especially regarding timestamps, the source of the data, the manipulations (interpolation, filtering, etc) that may have been applied to the data, and statistical limitations inherent to the data itself (e.g. uncertainty). This can lead to major boo-boos. To give a concrete example - if a "smoothing" filter is used, as illustrated at 8:01, without annotation, it may lead to problems in that this filter is not a causal filter. The smoothed data may provide evidence of an event *before* the event happens. Without annotation, this shall cause headaches.

Agreed that annotation would bloat data by several-X, but with our progressive (over)reliance on data, we ought to find a rigorous and standardized way to do this.

AdityaMehendale
Автор

I'd hate to have been "Steve".

AdityaMehendale
Автор

Let us know your follow-up data processing questions. I can connect you with the correct answers.

HansScharler
join shbcf.ru