Processing math: 90%
본문 바로가기

딥러닝/자연어(NLP)

[Paper Short Review] Do sequence-to-sequence VAEs learn global features of sentences

[Paper Short Review] Do sequence-to-sequence VAEs learn global features of sentences

Keypoints

  • VAE architecture.
  • Classification but uses quite intersting training step.
  • It is again unclear how the latent code act.
  • It used δVAE however why? just to prevent posterior collapse.

Questions and Answers

Which model has it used?

VAE based on the seq2seq LSTM Autoencoder.
L words : x=(x1,x2,,xL)embedded L vectors : (e1,,eL)h1,,hL=LSTM(e1,,eL)

Next, generate latent vector using the last hidden vector.

μ=L1hLσ2=exp(L2r)qϕ(z|x)=N(z|μ,diag(σ2))

and decoding step.
h1,,hL=LSTM([eBOS;z],[e1;z],,[eL;z])

finally,
pθ(xi+1)|x1,,xi,z)=softmax(whi+b)

The objective function is the marginal log-likelihood ELBO

ELBO(x,θ,ϕ)=DKL(qϕ)(z|x)||p(z))+Eqϕ[logpθ(x|z)]

What is data?

four small versions of the labeled datasets(topic or sentiment). (70MB)

  • AG News
  • Amazon
  • Yahoo
  • Yelp

Dealing with posterior collapse.

modify the obejctive function which uses the free bits formulation of the δVAE For a desired rate λ

max

Contribution

  • Measure which words benefit most from the latent information.

Experiments

References

[1] Do sequence-to-sequence VAEs learn global features of sentences?

[2] Ali Razavi, Aaron van den Oord, Ben Poole, and Oriol Vinyals. 2019. Preventing Posterior Collapse with delta-VAEs. In International Conference on Learn- ing