Optimization while Learning
In a typical data-driven decision-making approach, the decision maker estimates the model from data (learning), and then optimizes their decision based on the learned model (optimization). When decisions affect the data collecting process, however, this estimate-and-optimize scheme may fail to find the true optimal decision as some part of the model remains unknown. In this situation, the decision maker has to learn the model via trial-and-error, making experimental decisions in order to collect informative data. In DO lab, we study such decision-making principles, particularly utilizing techniques that have been developed in operations research community such as stochastic programming, simulation, uncertainty quantification, etc.
Keywords: bandit algorithms, Thompson sampling, reinforcement learning, optimal learning
Risk-sensitive Optimal Control
The decision maker often wants to be conservative in the face of uncertainty, by optimizing performance in adverse scenarios rather than focusing on average performance. The conditional value-at-risk (CVaR) is such a performance measure that has been widely adopted in financial applications due to its intuitive interpretation and nice mathematical properties. Unlike static decision-making problems, however, dynamic optimization of the CVaR objective is not trivial. In DO lab, we aim to develop a systematic decision-making framework for risk-sensitive optimal control problems, particularly based on a game-theoretic formulation.
Keywords: convex risk measure, conditional value-at-risk, optimal control
Business Applications
We are interested in utilizing and customizing the above methodologies for specific business applications, specifically, algorithmic trading and online advertisement. Algorithmic trading not only concerns how to predict future price, but also deals with how to manage portfolios or how to execute transactions in financial markets. Online advertisement exchange also shares similar features with financial markets as a marketplace where advertisers and publishers trade ad impressions in real time. In these applications, it is important for the decision makers to be adaptive to the changes in the market conditions.