How to keep special characters together in word_tokenize