티스토리 뷰

카테고리 없음

Contextual Bandit ( Thompson Sampling)

최영민85 2018. 10. 17. 17:01

MLE VS MAP

Maximum Likelihood Estimation (MLE) 와 Maximum A Posteriori (MAP)는 확률 분포 또는 그래픽 모델의 parameter를 추정 하는 방법임

- MLE: Likelihood functio을 최대화 시키는

- MAP: Posterior(Product of likelihood and prior)를 최대화 시키는

Thompson Sampling

Thompson sampling은 arm이 주는 reward 확률 기반으로 arm을 선택하는 방법이다.

여기서 확률은 Bayesian 확률로, prior와 likelihood와 둘의 곱인 posterior가 존재한다.

Reward prior 분포로 부터 시작 되며, 각 arm의 reward prior 분포부터 random sample을 추출한 값 중 가장 큰 값의 arm을 선택하고 reward 피드백을 posterior에 반영하다.

이 후 posterior(아직 추출되지 않은 arm을 prior일 테지만) 분포로 부터 reward를 추출하고 가장 칸값을 arm을 선택, posterior에 반영하는 단계를 반복해 나간다.

따라서 posterior을 쉽게 구할 수 있느냐에 따라 복잡성이 결정 된다.

Likehood에는 Conjugator prior(https://en.wikipedia.org/wiki/Conjugate_prior)가 존재하며, 이 두관계를 이용할 경우 Posterior 분포가 쉽게 결정 된다.

Thompson Samling for Bernoulli bandits은 Bernoulli 분포를 likelihood, beta분포를 prior로 사용하며, conjugator 관계에 따라 posterior는 Beta 분포가 된다.

따라서 다음 단계를 넘어갈때 arm reward의 posterior를 별도로 구할 필요가 없다.

Contextual badits의 Thompson sampling의 경우도 그 원리는 같고, 기존에 가진 정보(X)를 통해 Reward 분포를 모델링(Linear Regression, Logistic Regression) 하게 된다. Linear Regression을 사용할 경우 likelihood는 정규 분포, prior는 conjugator 관계인 정규분포를 사용하여 쉽게 Posterior를 구할 수 있다. 하지만 logistic, FM(factorization)을 사용할 경우 Posterior를 쉽게 구할 수 없게 되며, Laplace Approximation을 사용하여 정규분포인 posterior를 이용하다.

Laplace Approximation

Laplace Approximation은 모든 posterior를 정규 분포로 apprioximation시키는 방법이다.

이때(3)의 평균은 logf(w)를 최대/최소 화 시키는 w0이고 분산은 두번 logf(w)를 두번 미분한 함수에 w0을 대입 시켜 구할 수 있다.

1.

2. Compute a truncated taylor expansion of logf(w) centre at the mode

Thompson Sampling for Logistic regression

(- http://tech.adroll.com/blog/data-science/2017/03/06/thompson-sampling-bayesian-factorization-machines.html)

Common Notation

a training set of

N

examples

(x_{i}, y_{i})

, where the feature vector is a

D

-dimensional vector:

1.

2. F는 weights와 features의 dot function으로 linear, FM, FFM에 따라 달라 진다.

3. log-likelihood는 다음과 같이 정의 할 수 있음, 2번에 F에 따라 Mu는 달라 짐

Laplac Apprximation

1. Baysian logistic regression에서는 p(w)-prior는 모두 정규 분포를 가정한다. Common Notation의 F함수에 W, V parameter(Z)가 존재하며 다음과 같이 정규 분포를 가정

Bayesian Logistic Regression

는 Laplace approximation에 의해

을 따른다.

3. 평균과 분산은 각각 lnf(w)를 1차 미분, 2차 미분하여 구할 수 있다.

Bayesian Logistic Regression-Thompson Sampling

Initialize the prior on each weight w_j eigh m_j =0, s_j = \lambda

For each new batch of training data

Findmaximizing equation by numerical optimization.

Compute $A_{j}$ for each weight according to equation.

Update the weight distribution:

Bayesian Factorization Machines

는 Laplace approximation에 의해

을 따른다.

3. 평균과 분산은 각각 lnf(w)를 1차 미분, 2차 미분하여 구할 수 있다.

Bayesian Factorization Machines-Thompson Sampling

Initialize the prior on each weight $w_{j}$ with $m_{j} = 0$ , $s_{j} = λ_{w}$ .
Initialize the prior on each $V$ element $v_{j k}$ with $m_{j k} = 0$ , $s_{j k} = λ_{V}$ .

For each new batch of training data

Findmaximizing equation by numerical optimization.

Compute $A_{j}$ for each weight according to equation

Compute $A_{j k}$ for each element of $V$

Update the weight distribution:

reference:

- https://ufal.mff.cuni.cz/~jurcicek/NPFL108-BI-2014LS/04-approximate-inference-laplace-approximation.pdf

- https://wiseodd.github.io/techblog/2017/01/01/mle-vs-map/

- https://www.linkedin.com/pulse/bayesian-bandits-via-thompson-sampling-dave-golland-phd/

- http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf

- http://norman3.github.io/prml/docs/chapter03/3.html

- http://tech.adroll.com/blog/data-science/2017/03/06/thompson-sampling-bayesian-factorization-machines.html

- https://papers.nips.cc/paper/4321-an-empirical-evaluation-of-thompson-sampling.pdf

저작자표시

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

글 보관함

잡다한 데이터/머신러닝 이야기