A Three-Phase Foundation Model for Tax-Aware Personalized Portfolio Management
arXiv:2606.30997v1 Announce Type: new Abstract: We present a three-phase deep reinforcement learning system for personalized portfolio management that addresses three limitations shared by all prior financial RL work: 1) ticker lock-in, 2) monolithic objectives , and 3) static user models. Phase 1...
This paper from arXiv introduces a three-phase deep reinforcement learning (RL) system designed to tackle personalized portfolio management, a domain where prior financial AI models have consistently fallen short. The authors identify three critical gaps in existing work: “ticker lock-in” (models trained on a fixed set of assets that cannot adapt to new securities), “monolithic objectives” (using a single reward function that ignores the trade-offs between risk, return, and taxes), and “static user models” (treating all investors as identical). Their proposed system addresses these by decomposing the problem into three sequential phases: first, a market representation phase that learns dynamic embeddings for any ticker; second, a tax-aware policy optimization phase that incorporates capital gains and loss harvesting into the reward signal; and third, a personalization phase that fine-tunes the policy to individual user preferences and constraints.
Why this mattersThe financial services industry has long struggled to deliver truly personalized advice at scale. Robo-advisors typically rely on rule-based tax-loss harvesting and static asset allocation models, which are brittle and fail to adapt to changing market conditions or individual tax situations. This research moves the needle by treating portfolio management as a sequential decision-making problem that can be optimized end-to-end with RL, while explicitly modeling the tax implications that make real-world investing complex. The ability to handle new tickers without retraining is particularly significant—it means the model could theoretically be deployed across global markets without being locked into a pre-defined universe of stocks. For high-net-worth individuals and tax-sensitive investors, even a modest improvement in after-tax returns could translate into substantial real-world value.
Implications for AI practitionersFor those building applied RL systems, this paper offers a practical blueprint for decomposing a complex, multi-objective problem into manageable phases. The three-phase architecture—representation learning, constrained optimization, and personalization—is a pattern that could generalize beyond finance to domains like energy trading, supply chain management, or personalized healthcare recommendations. The explicit handling of tax-awareness is also a reminder that domain-specific constraints (like tax codes) are not noise to be ignored but structured knowledge that can be encoded into reward functions and state representations. Practitioners should note that the system likely requires high-quality historical market data and realistic tax simulations for training, which may be a barrier for smaller teams. Additionally, the personalization phase suggests a need for user modeling that goes beyond simple risk tolerance questionnaires—likely requiring behavioral data or explicit preference elicitation.
Key Takeaways
- The paper solves three persistent limitations in financial RL: asset flexibility, multi-objective optimization, and user personalization, using a modular three-phase architecture.
- Tax-aware portfolio management is a high-value, under-explored application where RL can outperform traditional rule-based methods, especially for affluent investors.
- The three-phase decomposition (representation → optimization → personalization) is a reusable design pattern for complex, constrained RL problems in other industries.
- Practitioners will need access to granular market data and realistic tax simulation environments to replicate or extend this work.