filmov
tv
System Design: Building Data Intensive Applications Lesson #012- MapReduce Querying

Показать описание
Hello Everyone,
Here is this is System Design: Building Data Intensive Applications Lesson #012- MapReduce Querying:
Timestamps:
Concept: 0:00
Concept Example: 8:13
Practice Problem 1: 12:41
Practice Problem 2: 16:18
If have any questions below feel free to put them in the comment section. Thank you!
Best,
Gunnar
=========================================================///
Practice Problem 1: Monthly Report of Shark Sightings
Scenario: Your database has observed marine life across several years, including sharks, dolphins, and whales. You need to generate a report detailing the number of dolphin sightings per month.
Solution Guide:
1. Write the Map Function: First, adapt the map function to filter observations for dolphins instead of sharks. This involves checking the `family` attribute of each document and emitting the year-month key along with the count of dolphins observed.
2. Adjust the Reduce Function: No changes are necessary for the reduce function, as its job is simply to sum the values of observations that share the same key.
3. Modify the Query Parameter: Change the query parameter to filter for dolphins (`{ family: "Dolphins" }`) instead of sharks.
The adapted MapReduce operation would look like this:
function map() {
},
function reduce(key, values) {
},
{
query: { family: "Dolphins" },
out: "monthlyDolphinReport"
}
);
For Practice Problem 2, we're focusing on generating a detailed report on whale sightings, grouped by species and quarter. Below is a full example of how you could write the MapReduce operation in MongoDB to accomplish this task.
Practice Problem 2
Given the task, we'll break down the operation into its MapReduce components. Remember, the goal is to aggregate whale sightings by species and quarter. Here's how you could set up the entire operation:
// Map Function: Emit the year-quarter-species as key and count 1 for each sighting
function() {
emit(key, 1); // Emit count of 1 for each sighting
},
// Reduce Function: Sum up all sightings for each key
function(key, values) {
},
{
query: { family: "Whales" }, // Filter for whale observations
out: "quarterlyWhaleReport" // Specify the output collection
}
);
Explanation of the Code:
- Map Function: This function is executed once for each document (in this case, each whale sighting) that matches the specified query. It constructs a composite key from the year, quarter (calculated from the month), and the species of the whale. This key is used to group sightings. The function then emits this key along with a value of `1` to signify a single sighting.
- Reduce Function: MongoDB calls this function once for each unique key emitted by the map function. It receives an array of values that were emitted for that key. The purpose of the reduce function here is simple: sum these values. The result is the total number of sightings for each species of whale in each quarter.
- Query Filter: By specifying `{ family: "Whales" }`, we ensure that the MapReduce operation only considers documents related to whales, ignoring any other types of marine life observed.
- Output: The result of this MapReduce operation is stored in a collection named `quarterlyWhaleReport`. Each document in this collection represents the total sightings of a particular species of whale for a specific quarter of a particular year, as determined by the composite key.
Here is this is System Design: Building Data Intensive Applications Lesson #012- MapReduce Querying:
Timestamps:
Concept: 0:00
Concept Example: 8:13
Practice Problem 1: 12:41
Practice Problem 2: 16:18
If have any questions below feel free to put them in the comment section. Thank you!
Best,
Gunnar
=========================================================///
Practice Problem 1: Monthly Report of Shark Sightings
Scenario: Your database has observed marine life across several years, including sharks, dolphins, and whales. You need to generate a report detailing the number of dolphin sightings per month.
Solution Guide:
1. Write the Map Function: First, adapt the map function to filter observations for dolphins instead of sharks. This involves checking the `family` attribute of each document and emitting the year-month key along with the count of dolphins observed.
2. Adjust the Reduce Function: No changes are necessary for the reduce function, as its job is simply to sum the values of observations that share the same key.
3. Modify the Query Parameter: Change the query parameter to filter for dolphins (`{ family: "Dolphins" }`) instead of sharks.
The adapted MapReduce operation would look like this:
function map() {
},
function reduce(key, values) {
},
{
query: { family: "Dolphins" },
out: "monthlyDolphinReport"
}
);
For Practice Problem 2, we're focusing on generating a detailed report on whale sightings, grouped by species and quarter. Below is a full example of how you could write the MapReduce operation in MongoDB to accomplish this task.
Practice Problem 2
Given the task, we'll break down the operation into its MapReduce components. Remember, the goal is to aggregate whale sightings by species and quarter. Here's how you could set up the entire operation:
// Map Function: Emit the year-quarter-species as key and count 1 for each sighting
function() {
emit(key, 1); // Emit count of 1 for each sighting
},
// Reduce Function: Sum up all sightings for each key
function(key, values) {
},
{
query: { family: "Whales" }, // Filter for whale observations
out: "quarterlyWhaleReport" // Specify the output collection
}
);
Explanation of the Code:
- Map Function: This function is executed once for each document (in this case, each whale sighting) that matches the specified query. It constructs a composite key from the year, quarter (calculated from the month), and the species of the whale. This key is used to group sightings. The function then emits this key along with a value of `1` to signify a single sighting.
- Reduce Function: MongoDB calls this function once for each unique key emitted by the map function. It receives an array of values that were emitted for that key. The purpose of the reduce function here is simple: sum these values. The result is the total number of sightings for each species of whale in each quarter.
- Query Filter: By specifying `{ family: "Whales" }`, we ensure that the MapReduce operation only considers documents related to whales, ignoring any other types of marine life observed.
- Output: The result of this MapReduce operation is stored in a collection named `quarterlyWhaleReport`. Each document in this collection represents the total sightings of a particular species of whale for a specific quarter of a particular year, as determined by the composite key.