Python Tutorial: Why unit test?

Показать описание

---
Hello and welcome to Unit Testing for Data Science in Python! My name is Dibya. I am a Test Automation Engineer and I will be your instructor for this course.

Consider this question. Suppose we have just implemented a Python function. How can we test whether our implementation is correct?

The easiest way is to open an interpreter, test the function on a few arguments and check whether the return value is correct. If correct, we can accept the implementation and move on. Right?

While testing on the interpreter is easy, it is actually very inefficient. This will become clear if we think about the big picture of a function's life cycle in a data science project.

The life cycle of a function typically looks like this.

We implement the function and then test it. If the tests pass, we accept the implementation.

If the tests fail, we fix any bugs that we found and test again.

Later, we might get a new feature request or we may be asked to refactor the function.

So we implement the new feature or refactor the code, and then test it again.

Alternatively, someone might discover a previously unseen bug. In that case, we fix that bug and test again.

Notice how many times a function needs to be tested during the life cycle. Every time we modify the function, either to fix bugs or to implement new features, we have to test it.

If the project continues for a few years, we might be testing the function about a hundred times, maybe more.

Let's look at an example function. It's called row_to_list(). It takes a single argument, which is a Python string.

The string is a single row in a data file which contains data on housing area and market price of the house.

The string contains the housing area in square feet, followed by a single tab, followed by the housing price in dollars and ending with a newline character.

If the string follows this format, the function returns a list of length 2, containing the housing area and the housing price.

But this data file is not clean, and some rows in this data file do not follow this format. The third row has missing area, while the penultimate row is missing the tab between the area and price. For these invalid rows, the function should return None.

To test this function, we will have to try all the arguments that we listed and check if the function returns the correct value.

Remembering that a function needs to be tested about 100 times in its life cycle, and assuming it takes 5 minutes to test each time, we will spend 8 hours just testing this function on the interpreter.

Unit tests automate this repetitive and tedious testing process. Unit tests will reduce the testing time over the life cycle of a single function to 1 hour instead of 8.

Now imagine how much time we would save when we unit test all the functions in a project!

This makes unit testing a must-have skill for productive data scientists.

This course will teach you how to write unit tests. We have created an example data science project which predicts prices from the housing area using linear regression, as we can see in the plot.

The complete code for this project, including actual implementations for functions like row_to_list(), is available in a public GitHub repository. We will share the link to the repository at the end of the course.

Here is the folder structure of this project.

During this course, we are going to write a complete unit test suite for this example project. It's going to be fun and exciting.

This will prepare you to write unit tests for your own projects.

Sounds good? Then let's get started by practicing the concepts covered in this lesson.

#Python #PythonTutorial #DataCamp #Unit #Testing #DataScience