How ALS Collaborative Filtering Powers Real-Time Recommendations

Most recommendation systems are not recommendation systems. They are popularity lists dressed up with a user identifier. Show everyone "trending items", personalise the title, call it ML.

Real personalisation means learning what each user actually prefers — not what everyone else clicked on. Collaborative filtering with Alternating Least Squares (ALS) is one of the most principled ways to do that at scale with implicit feedback. This post explains how it works, the maths behind it, and how we implement it at NeuronSearchLab.

The Problem: You Have Signals, Not Ratings

Most platforms collect implicit feedback: views, clicks, purchases. Users do not tell you they like something — they show you by interacting with it. This is fundamentally different from a 5-star rating, and it requires a different approach.

Implicit signals have two properties that matter:

No negatives. A user who hasn't clicked an item might hate it, or might simply not have seen it. You cannot treat absence as negative feedback.
Confidence varies. A purchase signals stronger preference than a single view. Repeated views signal something too.

The seminal paper by Hu, Koren and Volinsky (2008) — "Collaborative Filtering for Implicit Feedback Datasets" — formalises exactly this. NeuronSearchLab's baseline model follows that framework directly.

Matrix Factorisation: The Core Idea

You have a matrix R of shape (n_users, n_items) where each entry r_ui is the number of times user u interacted with item i. Most entries are zero.

The goal is to decompose this sparse matrix into two dense matrices:

X — user factors, shape (n_users, factors)
Y — item factors, shape (n_items, factors)

Such that X @ Y.T ≈ R.

Each user and item is represented as a vector in the same latent space. Users who interacted with similar items end up with similar vectors. Recommendation is then fast: compute scores = Y @ x_u and return the top-N.

From Counts to Confidence and Preference

The key insight in the HKV framework is how to handle implicit data. For each (u, i) pair:

Preference p_ui is binary:

p_ui = 1  if r_ui > 0
p_ui = 0  if r_ui = 0

Confidence c_ui encodes how sure we are:

c_ui = 1 + α × r_ui

The α hyperparameter scales raw interaction count into confidence. In our implementation, α = 40.0 by default. A single view gives c = 41; ten views gives c = 401. Items the user has never seen have c = 1 (low confidence, not zero — absence is not proof of dislike).

The model then minimises:

L = Σ_{u,i} c_ui × (p_ui − x_u · y_i)² + λ(Σ_u ||x_u||² + Σ_i ||y_i||²)

The first term fits preferences weighted by confidence. The second is L2 regularisation (λ = 0.01) to prevent overfitting.

Why "Alternating" Least Squares

You cannot optimise X and Y simultaneously (it's not convex). But if you fix Y and optimise X, each user update becomes an independent least-squares problem with a closed-form solution — and vice versa for items.

The update rule for a single user u:

x_u = (Y^T C^u Y + λI)^{-1} × Y^T C^u p_u

Where C^u = diag(c_u1, ..., c_uI) — a diagonal confidence matrix for that user.

In practice, Y^T C^u Y is expensive to compute naively (it's (factors × n_items) × (n_items × factors)). The trick is:

Y^T C^u Y = Y^T Y + Y^T (C^u − I) Y

Y^T Y is precomputed once per ALS step. C^u − I is sparse (only non-zero for items the user interacted with), so the second term only touches the small set of items with interactions.

Here is the inner loop from our implementation:

YtY = Y.T @ Y  # precomputed once

for u in range(n_users):
    item_ids = confidence.indices[start:end]   # sparse: only interacted items
    c_u      = confidence.data[start:end]      # confidence values ≥ 1

    Y_u    = Y[item_ids]                        # (nnz, factors)
    delta_c = c_u - 1.0                         # (c_ui - 1), sparse part
    A = YtY + Y_u.T @ (Y_u * delta_c[:, None]) + regularization * eye
    b = Y_u.T @ c_u                             # Y^T C^u p_u (p_ui=1 for seen items)
    X[u] = np.linalg.solve(A, b)

This runs in O(k² × nnz_u) per user, where nnz_u is the number of items that user interacted with — typically a small number even in large catalogues.

We alternate between updating all users (fixing items) and updating all items (fixing users) for iterations rounds (default: 15). That's it.

Recommendation at Inference

Once trained, serving a recommendation is a single dot product:

scores = item_factors @ user_factors[user_idx]  # (n_items,)

We mask out items the user has already interacted with and return the top-N by score. For a catalogue of 100k items and 64 latent factors, this is a few hundred microseconds.

Our REST API exposes this as:

GET /recommendations/{user_id}?n=10

Response:

{
  "user_id": 42,
  "recommendations": [
    {"item_id": 1337, "score": 0.94},
    {"item_id": 892,  "score": 0.87},
    ...
  ],
  "model": "als_baseline",
  "source": "als"
}

For new users not in the training data, we fall back to popularity — but that's a separate story (see the cold-start post in this series).

How We Train and Deploy

The pipeline looks like this:

Ingest events — the ingestion service collects UserEvent objects (view, click, purchase) and stores them in a normalised interaction matrix.
Train — run ALSModel.fit(interactions) on the full or recent interaction history.
Evaluate offline — compute NDCG@10, Hit Rate@10, and Coverage@10 on a held-out 20% test split before promoting.
Promote — swap the model pickle at models/als_baseline.pkl; the service loads it on startup.
Serve — FastAPI handles single-user and batch requests with the in-memory model.

Training is fast enough to retrain daily on modest hardware. With 64 factors and 15 iterations, a matrix with 500k users and 50k items trains in a few minutes on CPU.

What the Model Learns (and What It Doesn't)

ALS learns latent structure from co-occurrence. It discovers that users who bought hiking boots also tend to look at trekking poles, without anyone ever defining that relationship. This generalises well to long-tail items that are underrepresented in handcrafted rules.

What it does not learn:

Item content — it knows nothing about what an item actually is. Two items with identical descriptions but different interaction patterns get different vectors.
Temporal patterns — the baseline model treats all historical interactions equally. Recent interactions and seasonal trends require additional handling.
New users and items — the cold-start problem. No interactions = no personalisation (yet).

These are known limitations, and they drive our roadmap. The baseline gives you strong collaborative signal cheaply. Layer content features and online updates on top, and you get something significantly more powerful.

Key Parameters

Key defaults and impact:

factors (default 64): latent vector dimensionality. Higher is more expressive but slower.
iterations (default 15): ALS rounds. Diminishing returns typically appear after ~20.
alpha (default 40.0): confidence scaling for implicit feedback. Higher gives more weight to observed items.
regularization (default 0.01): L2 penalty. Increase when the model overfits popular items.

Tune alpha first — it has the most impact on recommendation quality for implicit data. factors and iterations are secondary.

Try It

The recommendation API is available to test today. Boot the service:

docker compose up recommender

Get recommendations for a user:

curl http://localhost:8000/recommendations/42?n=5

Batch mode (useful for pre-computation or real-time feed generation):

curl -X POST http://localhost:8000/recommendations/batch \
  -H "Content-Type: application/json" \
  -d '{"user_ids": [1, 2, 3, 42], "n": 10}'

The next post in this series covers how we handle cold-start — what happens when a user has no history and how we bridge the gap between "new user" and "personalised experience".