Explicit optimal strategy under some assumptions
Besimas, Dimitrisz and Andrew W. Rowe. "Optimal control of execution cost." Financial Market News1.1(1998):1-50. Here, different price shock functions are assumed, and then the optimal trading execution scheme is obtained by solving them. According to the different parameters, the optimal strategy is either to sell all at the beginning and reduce the position evenly, or to sell all at the end. ? /doi/ABS/ 10. 1080/ 14697688.20 15. 1032543
Trading supervisor, speed limiter and the best execution of price limit and market order under filling uncertainty. Journal of International Financial Engineering 4.02n03 (2017):1750020. Also consider trading with a limit order and a market order. ? /content/pdf/10.1007/s1579-016-0162-z.pdf.
book
Katya? Algorithm and High Frequency Trading [M]. Cambridge University Press, 20 15. On the basis of transaction execution, some mathematical tools are emphasized.
Gué ant O. Financial Mathematics of Market Liquidity: From Optimal Execution to Market Making [M]. China Resources Press, 20 16. Starting with Almgren-Chriss model, it is strongly recommended to the corresponding expansion and practical problems.
Fusion estimation of hidden variables in market environment
Zhang Zhiping, Zhang Xiaoming. Research on Trading Algorithm Based on Potential Alpha Model [J]. China Securities Education, 2002. Mathematical Finance, 2019,29 (3): 735-772. Market traders will take different measures according to different market orders and price trends, so we can also understand the posterior distribution of prices in various situations according to historical data, thus better helping us to carry out trading execution or arbitrage. The final result can be regarded as adding a supervision item to the Almgren-Chris model, which reflects our expectation for the future. ? Mersey. In 2004, from the point of view of online learning, several models were put forward to enable us to trade VWAP prices. Why are you concerned about VWAP's transaction execution? When large circulation shareholders need to reduce their holdings, in order to avoid the price fluctuation caused by direct selling, they will generally sell the stocks that need to be reduced to brokers, and then the brokers will sell them separately. The transaction price in the future is generally VWAP, so brokers need to trade with VWAP as much as possible. ? /Science/Articles /pii/S0378426607003226
Time-weighted average price transaction (TWAP)
For the sake of symmetry, we can introduce another weighted average case TWAP, which is relatively simple to implement. If the market influence is not considered, it can be achieved by splitting it into each time step and selling it evenly.
It can be proved that TWAP trading is optimal in the following two situations: the market price is Brownian motion and the price shock is constant; There is no penalty for late trading (in fact, late trading means facing greater risks), but the penalty for the last unfinished transaction is even greater.
Second, strengthen learning methods.
Reinforcement learning method based on traditional model
Reinforcement learning extension of ALM gren-Chris framework for optimizing transaction execution [C]//20 14 IEEE conference on financial engineering computing intelligence; Economics (cifer). IEEE,20 14: 457-464。 This column will talk about it. ? /sol3/papers.cfm? Abstract _id=3374766
Lin Fuhai, Lin Wentao. Research on optimal execution based on dual-depth Q- learning [J]. The reinforcement learning solution of Arxiv preprint ARXIV:1812.06600,2018. DDQN is tested in American stocks. ? There is a Pact decision tree [J] which is executed independently when simulating the market reaction. Arxiv reprints arxiv:1906.02312,2019.icml-19. A simulator which can reflect the market shock of market order is constructed. Form Q learning is used to learn the model based on decision tree; Use the method of feature selection to filter features. Through the above methods, we can learn a model to help us decide when to place a market order and when to place a limit order. ? https://arxiv.org/pdf/ 1906.023 12.pdf
[2] Ike Buzzard, Tekin, Van der Schell M. Online learning in the execution of limit orders [J].IEEE Transactions on Signal Processing, 2018,66 (17): 4626-4641. Solve this problem from the perspective of online learning, and analyze regret with DP class method. ? http://repository . bil Kent . edu . tr/bitstream/handle/ 1 1693/50289/bil Kent-research-paper . pdf? Sequence = 1
Wei Hong, Rebecca, Man Gu, et al. Application of model-based reinforcement learning in the prediction and control of limit orders [J]. Securities and Economy, 2002(2).arXiv preprint ARXIV:1910.03743,2019. The column has just talked about an article, which uses the reinforcement learning algorithm based on model class to learn a world model directly, and then lets reinforcement learning strategies learn through interaction with the world model. ? https://arxiv.org/pdf/ 19 10.03743.pdf
Karpe M, Fang J, Ma Z, etc. The application of multi-agent compensation learning in the simulation of the real price-limited book market [J].arxiv Reprinted by Arxiv: 2006.0574,2020. The multi-agent here seems to be suitable for generating the actions of other market participants by combining historical data, while the learning of the optimal strategy is still completed by using the single-agent DDQN method. They open source a simulation environment considering multi-agents. ? https://arxiv.org/pdf/2006.05574.pdf
Schnaubert m. Deep compensation learning of optimal allocation of cryptocurrency limit orders [J]. European Journal of Operational Research, 2022,296 (3): 993-1006. Study how to use digital currency to place an order limit. Comparing PPO with DDQN, we found that PPO is better. Some important factors, such as current liquidity cost and unbalanced queuing, are discussed. ? https://www . econ stor . eu/bitstream/ 104 19/2 16206/ 1/ 1696077540 . pdf
Reinforcement learning+transaction execution (paper)
Hu Rui. Optimal Order Execution in Logistics Control Based on Compensation Learning [J] .20 16 .KTH (Sweden) Master thesis. The algorithm is directly based on the dynamic programming of the value function. However, it provides a more detailed simulation environment and algorithm pseudo code. ? https://www . diva-portal . org/smash/get/diva 2:963057/full text 0 1 . pdf
Rockwell. Optimal Order Execution for Deep Compensation Learning [J] .20 19. Master thesis, Montreal Business School, Canada. TD3 and DDPG algorithms are used, but the experiment is based on human-generated data (oblique normal Brownian motion). ? https://bi blos . HEC . ca/biblio/memoires/m 20 19a 628776 . pdf
Wright B. Application of deep compensation learning in order execution [d]. Osaka University School of Engineering Science, 2020. University of Toronto undergraduate thesis. On the basis of using A3C algorithm, consider using teacher-student network for transfer learning, and consider the short-term market impact. ? https://mbreiter.github.io/doc/thesis.pdf
Reinforcement learning+risk preference
Robust risk-sensitive reinforcement learning agent in trading market
Depth Isorisk Pricing of Financial Derivatives with Non-translational Invariant Risk Measure
Strengthening learning+market making strategy
Optimal market making based on reinforcement learning
Using multi-agent reinforcement learning to optimize market making
Deep reinforcement learning for market making
Deep cyclic q-network for market making
Realizing Steady Market Making through Antagonistic Reinforcement Learning
Market making through intensive learning
Intensive learning+portfolio
Deep stock trading: a hierarchical reinforcement learning framework for portfolio optimization and order execution
Robot suggestion: increase investment through reverse optimization and deep reinforcement learning
Large-scale continuous time mean-variance portfolio allocation based on reinforcement learning