[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/concept/kbhnlp_semantics_timeline.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "NLP Semantics Timeline" source: https://www.jemoka.com/posts/kbhnlp_semantics_timeline/ --- 1990 static word embeddings 2003 neural language models 2008 multi-task learning 2015 attention 2017 transformer 2018 trainable contextual word embeddings + large scale pretraining 2019 prompt engineering Motivating Attention Given a sequence of embeddings: \(x_1, x_2, …, x_{n}\) For each \(x_{i}\), the goal of attention is to produce a new embedding of each \(x_{i}\) named \(a_{i}\) based its dot product similarity with all other words that are before it. Let’s define: \begin{equation} score(x_{i}, x_{j}) = x_{i} \cdot x_{j} \end{equation} Which means that we can write: \begin{equation} a_{i} = \sum_{j \leq i}^{} \alpha_{i,j} x_{j} \end{equation} where: \begin{equation} \alpha_{i,j} = softmax \qty(score(x_{i}, x_{j}) ) \end{equation} The resulting \(a_{i}\) is the output of our attention. Attention From the above, we call the input embeddings \(x_{j}\) the values, and we will create a separate embeddings called key with which we will measure the similarity. We call the word we want the target new embeddings for the query (i.e. \(x_{i}\) from above).