Thompson Sampling for CPA model

티스토리 뷰

카테고리 없음

최영민85 2018. 10. 18. 11:56

In a CPA payment model, the expected value of an impression :

Model

of the training set is given by:

Choose a simple prior of Wck ,

Then we use Laplace approximation which enables us to approximate the posterior with a Gaussian distribution.

: the local minimum of the log-posterior

Also we can derive the following covariance :

However, since the sampling time for every variable can be problematic that can leads to a problem of prediction time, we better go more in depth.

Let

of the training set is given by:

Choose a simple prior of Wcv , Wd

Then we use Laplace approximation which enables us to approximate the posterior with a Gaussian distribution.

: the local minimum of the log-posterior

Also, we can derive the following covariance :

However, since the sampling time for every variable can be problematic that can leads to a problem of prediction time, we better go more in depth.

Let

1. Sample a set of parameters according to the posterior after seeing the training data:

θ^{'} \sim P (θ | x_{1 : N}, a_{1 : N}, r_{1 : N})

2. Choose the action that is optimal with respect to the sampled set of parameters:

\hat{a} = {\arg max}_{a} P (r | x, a, θ^{'})

공지사항

최근에 올라온 글

최근에 달린 댓글

글 보관함