[Paper Short Review] Do sequence-to-sequence VAEs learn global features of sentences
Keypoints
- VAE architecture.
- Classification but uses quite intersting training step.
- It is again unclear how the latent code act.
- It used δ−VAE however why? just to prevent posterior collapse.
Questions and Answers
Which model has it used?
VAE based on the seq2seq LSTM Autoencoder.
L words : x=(x1,x2,⋯,xL)embedded L vectors : (e1,⋯,eL)h1,⋯,hL=LSTM(e1,⋯,eL)
Next, generate latent vector using the last hidden vector.
μ=L1hLσ2=exp(L2r)qϕ(z|x)=N(z|μ,diag(σ2))
and decoding step.
h′1,⋯,h′L=LSTM([eBOS;z],[e1;z],⋯,[eL;z])
finally,
pθ(xi+1)|x1,⋯,xi,z)=softmax(wh′i+b)
The objective function is the marginal log-likelihood ELBO
ELBO(x,θ,ϕ)=−DKL(qϕ)(z|x)||p(z))+Eqϕ[logpθ(x|z)]
What is data?
four small versions of the labeled datasets(topic or sentiment). (∼70MB)
- AG News
- Amazon
- Yahoo
- Yelp
Dealing with posterior collapse.
modify the obejctive function which uses the free bits formulation of the δ−VAE For a desired rate λ
max
Contribution
- Measure which words benefit most from the latent information.
Experiments
References
[1] Do sequence-to-sequence VAEs learn global features of sentences?
[2] Ali Razavi, Aaron van den Oord, Ben Poole, and Oriol Vinyals. 2019. Preventing Posterior Collapse with delta-VAEs. In International Conference on Learn- ing
'딥러닝 > 자연어(NLP)' 카테고리의 다른 글
NLP의 모든 분야 탐색 (Update 중) (0) | 2021.02.19 |
---|---|
[Fairseq 1] Robert Pretrain 코드 돌리기 (0) | 2021.02.16 |
Text Summarization 분야 탐색 (0) | 2021.02.05 |
Transformer와 인간의 뇌 구조 (0) | 2021.02.03 |
Byte Pair Encoding 방법 (0) | 2021.02.02 |