extract text from messy data