Feature extraction and classification of text data by combining two-stage feature selection algorithm and improved machine learning algorithm