1 min readNov 15, 2019
it appears that there are some missing details in the approach methodology.
- what are the preprocessing methods for the text, is it just tokenization? no cleaning or normalization?
- when applying KPCA on OOV words, where do these words come from? are you splitting to train/test/validation or are you applying them on another dataset entirely?
- when you are applying w2v on the data, is it only on your test or some other combination of embeddings, this needs to be defined clearly.