Personalization Engine Design
This document defines the design principles, data flow, and Redis-based caching strategy for the Newsfork news personalization engine.
1. Overview
Section titled “1. Overview”personalization enginecollects user clicks and dwell time(dwell time) signals to estimate interest vector, and adjusts news feed rankings based on this.
2. Interest Extraction Logic
Section titled “2. Interest Extraction Logic”2.1 Signal Definitions
Section titled “2.1 Signal Definitions”| Signal | Description | Weight Reflection |
|---|---|---|
| Click | Whether article was clicked | Increases weight for relevant topic/entity upon click |
| dwell time | Article page dwell time(seconds) | Reflects weight as positive signal if exceeds threshold |
2.2 Weight Update Rules
Section titled “2.2 Weight Update Rules”- Click: Increases the weight for the article’s category/tag/entity by a fixed percentage.
- dwell time: If above the set threshold (e.g., 30 seconds), updates the weight in the same direction as a click. Ignore or apply decay to short dwells.
- Decay: Periodically reduces interest weights over time, allowing recent actions to have greater influence.
3. Redis Cache Strategy
Section titled “3. Redis Cache Strategy”3.1 Storage Structure
Section titled “3.1 Storage Structure”- Key Pattern:
user:{userId}:interests— Per-user interest vector(hash or sorted set). - TTL: Set TTL for inactive user keys to limit memory usage.
- Write: Perform incremental updates to Redis upon receiving events (clicks/dwells); periodically synchronize with persistent storage after batch aggregation.
3.2 Read Strategy
Section titled “3.2 Read Strategy”- Queries interest vector from Redis upon personalized feed requests.
- Falls back to default profile or empty vector on cache misses, with asynchronous recovery from event history.
4. Overall Data Flow
Section titled “4. Overall Data Flow”The Mermaid diagram below illustrates the data flow of personalization engine.
5. Summary
Section titled “5. Summary”- Input: Click events, dwell time.
- Processing: Weight-based interest extraction, decay application.
- Cache: Redis per-user interest vector, TTL, and fallback policy.
- Output: Personalized news feed ranking.
This design is a draft and will maintain change history based on version 1.0.0 during implementation.