DL First Taste

Lecture 1

neuron: a function that records a feature
connection: a number of weight
spread: calculate the function, times the weight, sum the bias and impulse using ReLU(modern way) or Sigmoid(older way)

Deep Learning itself is just build a huge function.

Lecture 2

Cost: how the whole network recognizes your data
It’s always easy to find a local minimum in a function
The way to calculate the gradient descent is backpropagation

Lecture 3

backpropagation: calculate which bunch of biases & weights change will have a repid decrease on cost function, all of the calculation rely on previous layer of network
backpropagation needs a huge bunch of data to train the model, it will be good to divide random shuffled data into mini-batches, and do backpropagation one by one

Lecture 5

large language model: generate next possible word based on previous input(context of human-AI talking)
train, reinforcement learning with human feedback
transformers: attention, forward propagation, attention, so on
finally choose the most value connects to the context to output

Lecture 6

Attention: vectors communicate with each other to change value
propagation: vectors passes through network to another attention layer
weights are ones to train, not data

embedded layer: a huge matrix for words and their features
unembedded layer: use the last vector to generate the probability of next possible word to generate

softmax: an algorithm turns vectors into probability distribution, to let largest value close to 1 and smallest value close to 0

Lecture 7

query: what the word care about in the context
key: what is provided for queries to check whether it is relevant
value: final result that the word vector should care
using softmax to form a normal data, before this, change all non-relevant words a neg-inf weight
it would be hard to enlarge the context size
multi-head is the way GPT scale-out

Lecture 8

perpendicular: same family of meaning, find most vectors that are perpendicular
linear, ReLU, linear, MLP are just simplest neural networks