Towards minimax policies for online linear optimization with bandit feedback