Suggest edit — NLP Semantics Timeline

Title

Name

Note

---
title: "NLP Semantics Timeline"
source: https://www.jemoka.com/posts/kbhnlp_semantics_timeline/
---

1990 static word embeddings 2003 neural language models 2008 multi-task learning 2015 attention 2017 transformer 2018 trainable contextual word embeddings + large scale pretraining 2019 prompt engineering Motivating Attention Given a sequence of embeddings: \(x_1, x_2, &hellip;, x_{n}\)
For each \(x_{i}\), the goal of attention is to produce a new embedding of each \(x_{i}\) named \(a_{i}\) based its dot product similarity with all other words that are before it.
Let&rsquo;s define:
\begin{equation} score(x_{i}, x_{j}) = x_{i} \cdot x_{j} \end{equation}
Which means that we can write:
\begin{equation} a_{i} = \sum_{j \leq i}^{} \alpha_{i,j} x_{j} \end{equation}
where:
\begin{equation} \alpha_{i,j} = softmax \qty(score(x_{i}, x_{j}) ) \end{equation}
The resulting \(a_{i}\) is the output of our attention.
Attention From the above, we call the input embeddings \(x_{j}\) the values, and we will create a separate embeddings called key with which we will measure the similarity. We call the word we want the target new embeddings for the query (i.e. \(x_{i}\) from above).