SQL Tutorial : Finding and Resolving Missing Data

preview_player
Показать описание

---

In this lesson, you are going to look at how to identify and resolve missing data.

Missing data are a common occurrence. In SQL, missing data is represented as NULL.

Since NULL is not a value, you cannot use the EQUAL sign or any other operators to find or compare NULLs, instead you must use IS NULL and IS NOT NULL.

To check if there is a missing value in a column, you can use IS NOT NULL.

For example, to return all rows without any missing values in InternetUse, you can add the clause WHERE InternetUse IS NOT NULL as shown here on the last line of the query.

To return all rows where InternetUse is null, use the WHERE InternetUse IS NULL as shown here in this query.

Note that there is a difference between a blank value and a NULL value. Sometimes, missing values may be formatted as blanks. A blank is two single quotes with no space in between and is only found in a text field. An empty row value will show up as a blank in your query. It does not contain a value, but it will not show up as NULL. The best way to find if a column contains a blank value, is to look for a column where the Length which is represented as LEN greater than 0

As a blank value is a field with nothing in it, you can exclude blank values by returning rows only where the length of the field is greater than 0. To do this, you can use the LEN function with a left parenthesis and a right parenthesis as shown here to only return columns which have a character length greater than zero on the relevant column.

Sometimes, you want to replace missing values in a column with a specific value. To do so, you can use the ISNULL function.

You pass the column name and the value to replace to ISNULL and this will replace the missing values in the column with the given value, as shown here. Note that you can replace all missing values with only one value.

You can also use ISNULL to replace missing values in one column based on a different column. Here we replace all missing values in TradeGDPPercent with the corresponding row values from ImportGoodPercent. The replacement will only be performed if the value in TradeGDPPercent is null.

Another way to replace NULL values is to use the COALESCE statement. You can pass multiple values to COALESCE, and it will always return the first non-NULL value. It is similar to using a series of if-else statements to return the first non-missing value.

Here's a query to show how COALESCE works.
For the first row, since the value in TradeGDPPercent is NULL and the value in ImportGoodPercent is not NULL, NewPercent is given the value from the ImportGoodPercent column. For the second row, both TradeGDPPercent and ImportGoodPercent are NULL, so NewPercent column is assigned the value N/A. Finally, in the third row, both TradeGDPPercent and ImportGoodPercent are not NULL, so the first not NULL value, the value from TradeGDPPercent column is assigned to NewPercent.

Time to put this into practice

#DataCamp #SQLServerTutorial #IntermediateSQLServer #SQL
Рекомендации по теме