[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/concept/kbhmapreduce.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "MapReduce" source: https://www.jemoka.com/posts/kbhmapreduce/ --- MapReduce is an distributed algorithm. https://www.psc.edu/wp-content/uploads/2023/07/A-Brief-History-of-Big-Data.pdf Map: \((in\_key, in\_value) \Rightarrow list(out\_key, intermediate\_value)\). Reduce: Group map outputs by \(out\_key\) \((out\_key, list(intermediate\_value)) \Rightarrow list(out\_value)\) example of MapReduce Say, if you want to count word frequencies in a set of documents. Map: \((document\_name, document\_contents) \Rightarrow list(word, #\ occurrences)\) You can see that this can be distributed to multiple processors. You can have each processor count the word frequencies in a single document. We have now broken the contents into divide and conquerable groups. Reduce: \((word, list\ (occurrences\_per\_document)) \Rightarrow (word,sum)\) We just add up the occurrences that each of the nodes’ output for word frequency.