filmov
tv
Theory: Building a data science team

Показать описание
---
In this lesson, you'll learn how to build and structure your data team to meet your organization's needs.
You might be surprised to learn that "Data Science" isn't a single field; it's actually three different jobs: Data Engineer, Data Analyst, and Machine Learning Scientist.
Let's explore each one.
Data engineers control the flow of information: they build specialized data storage systems and the infrastructure to ensure that the data is easy to obtain and process.
Most data engineers are very familiar with SQL, which they use to store and manage big data.
They also use one of the following programming languages like Java, Scala, or Python to process data and automate data-related tasks.
Data analysts describe the present via data. They do this with dashboards, hypothesis tests, and visualization. They often have some background in statistics or computer science, but tend to have less engineering experience than data engineers and less math experience than data scientists.
Data analysts use spreadsheets to perform simple analyses on small quantities of data.
They use SQL, the same language used by data engineers, for larger analyses. While data engineers build and configure SQL storage solutions, data analysts use existing databases to consume and summarize data.
Analysts also use Business Intelligence, or BI, Tools, such as Tableau, Power BI, or Looker, to create dashboards and share their analyses.
Machine learning is perhaps the buzziest part of Data Science; it's used to extrapolate what's likely to be true from what we already know.
These scientists use training data to classify larger, unrulier data.
Machine learning can tell us how much money a stock might be worth next week, which images contain a car, or what sentiments are expressed by a set of Tweets.
Machine learning scientists use either Python or R to create their predictive models. Both are great programming languages for data science, and a candidate who knows one language can likely read code in the other language.
Remember, programming languages aren't as difficult to learn as spoken languages. If someone knows how to speak French, it might take them years to learn to speak German.
Programming languages are more similar to power tools. If you know how to use a power drill, you don't necessarily know how to use an electric saw, but you can probably learn with a little training!
To recap:
data engineers store and maintain data,
data analysts visualize and describe data,
and machine learning scientists model and predict with data.
Each position uses a slightly different set of tools to achieve their goals.
Once you've hired some data professionals, there are three main ways you can structure your data team: isolated, embedded, or hybrid.
An isolated data team can contain one or multiple types of data employees without any other teams like engineering or product.
This is a great structure for training new team members and quickly changing which project each member is working on.
Alternatively, it can be helpful to use an embedded model where each data employee is part of a squad, which also contains engineers and product managers.
This model lets each data employee gain experience on a specific business project, making them a valuable expert.
The hybrid model looks similar to the embedded model, but with additional sync for all data employees across all squads.
This additional layer of organization allows for uniform data processes and career development, regardless of which project an employee is assigned to.
You're now familiar with three key data team members and three types of team structures. Let's practice!
#DataCamp #DataScienceforBusiness #DataScience