[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/concepts/term_document_matrix.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "Term-Document Matrix" type: concept related: [Term Document Matrix] source: https://www.jemoka.com/posts/kbhterm_document_matrix/ confidence: high status: active --- A Term-Document Matrix is a boolean matrix of: rows—“terms”, the search keywords—and columns—“documents”, which is the document. Each element \((x,y)\) is \(1\) if \(y\) contains term \(x\), and \(0\) otherwise. To perform a search, we take a boolean operation over each row (usually either complement for NOT or identity), and AND it with all other terms. The resulting boolean string are the valid documents. Notably, this is quite intractable because the matrix is quite (words times documents) blows up. However, this representation is QUITE SPARSE. So, ideally we only store it sparsely.