From Clicks to Intent: Cross-Platform Session Embeddings with LLM-Distilled Taxonomy for Financial Services Recommendations
arXiv:2606.26277v1 Announce Type: cross Abstract: Sequential user behavior modeling is widely adopted in industrial recommender systems; however, significant gaps remain in financial services, where pre-login web interactions and authenticated in-app experiences differ drastically. Specifically,...
What Happened
A new arXiv paper (2606.26277v1) tackles a persistent blind spot in recommender systems: the chasm between anonymous web browsing and authenticated in-app behavior in financial services. The authors propose a cross-platform session embedding framework that uses an LLM-distilled taxonomy to bridge these two distinct behavioral contexts. Specifically, they address the problem that pre-login clicks (e.g., browsing interest rates, comparing loan products) and post-login actions (e.g., initiating transfers, checking balances) occur in separate data silos with different user identifiers and interaction patterns. Their solution involves distilling a hierarchical taxonomy from large language models to create unified session embeddings that capture user intent across platforms, enabling more coherent recommendations that follow users from discovery to conversion.
Why It Matters
Financial services recommendations have long lagged behind e-commerce or media recommendations for structural reasons. User sessions are fragmented—a prospect might research mortgage rates on a public website, then later log into a mobile app to apply. Traditional session-based models treat these as disconnected events, losing the critical intent signal. This paper’s approach matters for three reasons:
First, it directly addresses the cold-start problem in financial products. New visitors have no authenticated history, yet their pre-login browsing contains rich intent signals (e.g., spending 10 minutes on "retirement planning" pages). By embedding these sessions into a taxonomy distilled from LLMs, the system can infer product affinity without requiring login.
Second, the LLM-distilled taxonomy component is particularly novel. Rather than relying on manually curated product categories (which become stale), the authors leverage LLMs to generate and refine a hierarchical taxonomy from behavioral data. This reduces maintenance overhead and adapts to emerging financial products (e.g., crypto savings accounts, BNPL services).
Third, the cross-platform unification has direct business implications. Financial institutions typically see 60-80% drop-off between browsing and application. Better recommendations that bridge this gap could materially improve conversion rates, especially for high-value products like mortgages or investment accounts.
Implications for AI Practitioners
For engineers building recommender systems in regulated industries, this work offers a practical blueprint. The key technical insight is that session embeddings can be made platform-agnostic by aligning them through a shared taxonomy, rather than requiring user identity reconciliation. This sidesteps privacy and compliance issues around cross-device tracking.
Practitioners should note the distillation process: the authors use LLMs to generate candidate taxonomy nodes from session sequences, then refine them through human feedback or automated validation. This hybrid approach balances the breadth of LLM knowledge with domain-specific accuracy—a pattern increasingly relevant for financial AI.
However, the paper likely assumes access to substantial compute for LLM inference and embedding generation. Teams with limited resources may need to explore smaller distilled models or pre-computed taxonomies. Additionally, the approach requires careful handling of session boundaries—financial sessions can span days for complex products, making temporal segmentation nontrivial.
Key Takeaways
- Cross-platform session unification using LLM-distilled taxonomies can bridge the gap between anonymous browsing and authenticated behavior, solving a longstanding problem in financial recommendations.
- The LLM-distilled taxonomy approach reduces manual curation overhead while adapting to evolving financial product landscapes, though it requires significant compute resources.
- Privacy-preserving design is inherent to the method, as it aligns sessions through behavioral taxonomy rather than user identity matching—critical for regulated financial environments.
- Practical adoption depends on session boundary handling and compute budgeting, making this most immediately relevant for large financial institutions with existing ML infrastructure.