본문 바로가기

딥러닝/강화학습(RL)

MARL-DRONE

DRON

Deep Reinforcement Opponent Network

Approach

  • jointly learn a policy and the behavior of opponents into a DQN
  • Using Mixture-of-Experts architecture(discover different strategy patterns of opponents)

Stated Questions

  • how to combine the two networks
  • what supervision signal to use.
    • predicting Q-values only, as our goal is the best reward instead of accurately simulating opponents
    • also predicting extra information about the opponent when it is available, e.g., the type of their strategy.

Conents

  • Two critical questions in opponent modeling are what variable(s) to
    model and how to use the predicted information
  • To account for changing behavior, we model uncertainty in the opponent’s strategy instead of classifying it into a set of stereotypes.
  • domain knowledge is often required when prediction of the opponents are separated from learning the dynamics of the world. Therefore, we jointly learn a policy and model the opponent probabilistically
  • DRON is a Q-Network (NQ) that evaluates actions for a state and an opponent network (No) that learns representation of π
Model MultiTasks

 

 

-

(a)it ignores the interaction between the world and the opponent

(b) , DRON-MOE knows that Q-values have different distributions depending on φ, each expert network captures one type of opponent strategy


Experiments

  • Soccer game
  • trivia game
experiment1 experiment2

 

 


Hello World!

Kakao Open Chat🖐

* 궁금하신 사항에 대해서는 질문하셔도 됩니다.

* 답변하는데 다소 시간이 걸릴 수 있습니다.

* 소스코드 구현을 의뢰하실 수 있습니다.

Github👨‍💻