[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/concepts/rdd.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "Resilient Distributed Dataset" type: concept related: [Vector, Spark, Rdd] source: https://www.jemoka.com/posts/kbhrdd/ confidence: high status: active --- Importantly, we have to keep our data under something that can be called RDD: “Resilient Distributed Dataset”; it is a theoretical dataset, but you don’t actually load it. RDDs are has a single vector datastore under, but there are special RDDs that store key-value info. For Spark, RDDs are stored as operational graphs which is backtraced eventually during computational steps. Pair RDD A Pair RDD is an RDD that stores two pairs of vectors: you have a key and you have an value per entry.