Haseeb Tariq

Haseeb Tariq

Data Scientist
Utrecht, The Netherlands

Challenges

The majority of the consumer base of Coolblue is not very tech-savvy. Therefore, sometimes what happens is — a person comes to the website; starts exploring a product (group) she is interested in; loses interest after wandering randomly from page to page; and eventually halts the search without buying anything.

Solution

We came up with this idea, to train a model — that is able to learn the recurring sequences of page visits from the (more) tech-savvy people on Coolblue. If the model is able to learn those sequences, we can then guide the less informed (about certain products) people more effectively. This would help some people find (and buy) the products they are looking for, more conveniently.

The model is trained on a recurrent neural network (RNN) with the Long short-term memory (LSTM) architecture. The code was implemented in Tensorflow (compiled with GPU support). The data processing, transformation, and encoding were sped up using Dask and h5py.

Brief explanation of the design:

Suppose a customer visits the following (n=) 7 product pages:

customer_id, sequence
1, [A, B, C, D, E, F, G]
This sequence is transformed into:
X(customer_id, sequence), Y
(1, [A]), B
(1, [A, B]), C
(1, [A, B, C]), D
(1, [A, B, C, D]), E
(1, [A, B, C, D, E]), F
(1, [A, B, C, D, E, F]), G
And finally:
X(customer_id, sequence), Y
Observation 1
X,                      Y
[0, 0, 0, 0, 0, 0, 0]   B
[0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0]
[A, 0, 0, 0, 0, 0, 0]

Observation 2
X,                      Y
[0, 0, 0, 0, 0, 0, 0]   C
[0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0]
[A, B, 0, 0, 0, 0, 0]

…

Observation 6 (n-1)
X,                       Y
[0, 0, 0, 0, 0, 0, 0]    G
[A, B, C, D, 0, 0, 0]
[B, C, D, E, 0, 0, 0]
[C, D, E, F, 0, 0, 0]
Note that:
  • A, B, C, … are represented as 1's, and match the specific position of the alphabet in the 1-hot encoded vector
  • Within a vector we have no sense of time
  • The sense of time is given from the 4 different windows depicting four different time steps (window length is set to 4)