it appears that there are some missing details in the approach methodology.

1 min readNov 15, 2019

what are the preprocessing methods for the text, is it just tokenization? no cleaning or normalization?
when applying KPCA on OOV words, where do these words come from? are you splitting to train/test/validation or are you applying them on another dataset entirely?
when you are applying w2v on the data, is it only on your test or some other combination of embeddings, this needs to be defined clearly.

Written by Dr. Ori Cohen