Slot-Gate Paper review

August 24, 2021

이전 리뷰 -> joint lerning for intent detection and slot filling 참고

introduction

slot-gate 는 Slot Filling, intent detection을 동시에 학습하는 joint learning 을 위한 논문

본 논문은 slot과 intent는 강한 relation이 있음으로 global optimization을 통해 보다 나은 symantic frame 결과를 얻기 위해 slot과 intent의 attention vector사이의 관계를 학습하는데 초점을 두었다.

본 논문의 구조는 intent detection의 정보를 Slot prediction 정보에 공급해서 사용하는 intent2slot architecture 이다.

Proposed Approach

attention 기반의 BiLSTM로 이루어진 모델이다.

$x = (x_1, …,x_T)$ -> word sequence

$\overset{\rightarrow}{h_i}$ = forward hidden state

$\overset{\leftarrow}{h_i}$ = backward hidden state

slot filling

final hidden state $h_i$ = [$\overset{\rightarrow}{h_i}$, $\overset{\leftarrow}{h_i}$] -> concat

input $x$는 outputd인 $y = (y_1^S,…,y_T^S)$ 와 매핑됨

\[c_i^S = \sum_{j=1}^T a_{i,j}^S h_j\]

$c_i^S$ = slot context vector

$a_i^S$ = attention weigth

${h_j}$ = hiddenstate

\[a_{i,j}^S = {\exp(e_{i,j}) \over \sum_{k=1}^T\exp(e_{i,k})}\] \[e_{i,k} = \sigma(W_{he}^Sh_k)\]

$\sigma$ = activate func

$W_{he}^S$ = weight matrix of feedfoward neural network

\[y_i^S = \mathbf{softmax}(W_{hy}^S(h_i + c_i^S)) \qquad (1)\]

$y_i^S$ = slot label i-th word

$W_{hy}^S$ = weight matrix

Intent Prediction

\[y^I = \mathbf{softmax}(W_{hy}^I(h_T + c^I))\]

intent prediction 할땐 마지막 hidden state 만 사용

$c^I$ = intent context vector

Slot-Gated Mechanism

slot-gate에서는 slot과 intent의 relation을 이용해 성능을 향상시키는 방법론

\[g = \sum v \cdot tanh(c_i^S + W \cdot c^I)\]

$g$ = joint context vector

$v$ = trainable vector

$W$ = matrix respectively -> 파라미터

\[y_i^S = \mathbf{softmax}(W_{hy}^S(h_i + c_i^S \cdot g))\]

아까 위의 1번 공식에서 slot context vector에 slot gate 에서 나온 $g$를 곱해주는것

아래는 slot gate의 성능을 어텐션 메커니즘과 비교하기 위한 slot gate

\[g = \sum v \cdot tanh(h_i + W \cdot c^I)\] \[y_i^S = \mathbf{softmax}(W_{hy}^S(h_i + h_i \cdot g))\]

이 공식은 hidden state 값만 사용하니 slot gate의 이점이 아닌 어텐션 메커니즘을 보여준다.

Joint Optimization

\[p(y^S, y^I | x)\] \[= p(y^I | x) \prod_{t=1}^T p(y_t^S|x)\] \[= p(y^I|x_1,...,x_T)\prod_{t=1}^T p(y_t^S|x_1,...,x_T)\]

slot과 intent를 가장 잘 뽑아내는 확률을 최대화 하도록 학습시킨다.

Experiment

실험은 ATIS 데이터셋과 SNIPS 데이터셋으로 진행한다. ATIS 데이터셋은 항공권 예약 발화 데이터로 120개의 slot label과 21개의 intent를 가지고 있다.

Snips 데이터셋은 사람이 기계에게 시키는 발화 데이터로 72개의 slot label과 7개의 intent를 가지고 있다.

실험에 따르면 본 논문이 제안한 모델은 각각 ATIS와 Snips 데이터셋에 적용한 attention 모델에 비해 4.2%, 1.9% 정도의 성능 개선으로 문장수준의 frame 정확도를 크게 향상시켰다.

Conclusion

slot-gate 는 Slot Filling, intent detection을 동시에 학습하는 joint learning 을 위한 논문이다.

본 논문의 구조는 intent detection의 정보를 Slot prediction 정보에 공급해서 사용하는 intent2slot architecture 이다.

Share on

Twitter Facebook LinkedIn

momo

Slot-Gate Paper review

introduction

Proposed Approach

slot filling

Intent Prediction

Slot-Gated Mechanism

Joint Optimization

Experiment

Conclusion

Share on

Leave a comment

You may also enjoy

Llama Guard paper review

Proxy tuning Paper review

SOLAR Paper review

[네이버 클로바 스튜디오 X 포텐데이] 참여 후기