Home
Search results “Discretization in data mining wikipedia free”
More Data Mining with Weka (5.6: Summary)
 
07:27
More Data Mining with Weka: online course from the University of Waikato Class 5 - Lesson 6: Summary http://weka.waikato.ac.nz/ Slides (PDF): http://goo.gl/rDuMqu https://twitter.com/WekaMOOC http://wekamooc.blogspot.co.nz/ Department of Computer Science University of Waikato New Zealand http://cs.waikato.ac.nz/
Views: 3192 WekaMOOC
Weka Text Classification for First Time & Beginner Users
 
59:21
59-minute beginner-friendly tutorial on text classification in WEKA; all text changes to numbers and categories after 1-2, so 3-5 relate to many other data analysis (not specifically text classification) using WEKA. 5 main sections: 0:00 Introduction (5 minutes) 5:06 TextToDirectoryLoader (3 minutes) 8:12 StringToWordVector (19 minutes) 27:37 AttributeSelect (10 minutes) 37:37 Cost Sensitivity and Class Imbalance (8 minutes) 45:45 Classifiers (14 minutes) 59:07 Conclusion (20 seconds) Some notable sub-sections: - Section 1 - 5:49 TextDirectoryLoader Command (1 minute) - Section 2 - 6:44 ARFF File Syntax (1 minute 30 seconds) 8:10 Vectorizing Documents (2 minutes) 10:15 WordsToKeep setting/Word Presence (1 minute 10 seconds) 11:26 OutputWordCount setting/Word Frequency (25 seconds) 11:51 DoNotOperateOnAPerClassBasis setting (40 seconds) 12:34 IDFTransform and TFTransform settings/TF-IDF score (1 minute 30 seconds) 14:09 NormalizeDocLength setting (1 minute 17 seconds) 15:46 Stemmer setting/Lemmatization (1 minute 10 seconds) 16:56 Stopwords setting/Custom Stopwords File (1 minute 54 seconds) 18:50 Tokenizer setting/NGram Tokenizer/Bigrams/Trigrams/Alphabetical Tokenizer (2 minutes 35 seconds) 21:25 MinTermFreq setting (20 seconds) 21:45 PeriodicPruning setting (40 seconds) 22:25 AttributeNamePrefix setting (16 seconds) 22:42 LowerCaseTokens setting (1 minute 2 seconds) 23:45 AttributeIndices setting (2 minutes 4 seconds) - Section 3 - 28:07 AttributeSelect for reducing dataset to improve classifier performance/InfoGainEval evaluator/Ranker search (7 minutes) - Section 4 - 38:32 CostSensitiveClassifer/Adding cost effectiveness to base classifier (2 minutes 20 seconds) 42:17 Resample filter/Example of undersampling majority class (1 minute 10 seconds) 43:27 SMOTE filter/Example of oversampling the minority class (1 minute) - Section 5 - 45:34 Training vs. Testing Datasets (1 minute 32 seconds) 47:07 Naive Bayes Classifier (1 minute 57 seconds) 49:04 Multinomial Naive Bayes Classifier (10 seconds) 49:33 K Nearest Neighbor Classifier (1 minute 34 seconds) 51:17 J48 (Decision Tree) Classifier (2 minutes 32 seconds) 53:50 Random Forest Classifier (1 minute 39 seconds) 55:55 SMO (Support Vector Machine) Classifier (1 minute 38 seconds) 57:35 Supervised vs Semi-Supervised vs Unsupervised Learning/Clustering (1 minute 20 seconds) Classifiers introduces you to six (but not all) of WEKA's popular classifiers for text mining; 1) Naive Bayes, 2) Multinomial Naive Bayes, 3) K Nearest Neighbor, 4) J48, 5) Random Forest and 6) SMO. Each StringToWordVector setting is shown, e.g. tokenizer, outputWordCounts, normalizeDocLength, TF-IDF, stopwords, stemmer, etc. These are ways of representing documents as document vectors. Automatically converting 2,000 text files (plain text documents) into an ARFF file with TextDirectoryLoader is shown. Additionally shown is AttributeSelect which is a way of improving classifier performance by reducing the dataset. Cost-Sensitive Classifier is shown which is a way of assigning weights to different types of guesses. Resample and SMOTE are shown as ways of undersampling the majority class and oversampling the majority class. Introductory tips are shared throughout, e.g. distinguishing supervised learning (which is most of data mining) from semi-supervised and unsupervised learning, making identically-formatted training and testing datasets, how to easily subset outliers with the Visualize tab and more... ---------- Update March 24, 2014: Some people asked where to download the movie review data. It is named Polarity_Dataset_v2.0 and shared on Bo Pang's Cornell Ph.D. student page http://www.cs.cornell.edu/People/pabo/movie-review-data/ (Bo Pang is now a Senior Research Scientist at Google)
Views: 127772 Brandon Weinberg
Distance Measures STAT
 
20:09
Subject: Statistics (STAT) Paper: Multivariate Analysis Content Writer: Mr. Souvik Bandyopadhyay
Views: 220 Vidya-mitra
The collection and applications of big data - Full interview with Diane Schanzenbach | VIEWPOINT
 
38:55
The modern economy has never been more reliant on data. Businesses, governments, and families must navigate the complexities of a world made possible by new technologies and innovative business practices. Without reliable information about the economic and social environment, it is impossible in many instances to make sensible choices. Diane Schanzenbach and Michael Strain discuss the uses and benefits of the economic and social data that government agencies collect. AEI & Hamilton Project report – The Vital Role of Government-Collected Data: https://goo.gl/UOfB0m Michael Strain is Director of Economic Policy Studies and Resident Scholar at American Enterprise Institute: https://goo.gl/RQl1na Diane Schanzenbach is Director of the Hamilton Project at the Brookings Institution: https://goo.gl/VTN2zf Subscribe to AEI's YouTube Channel https://www.youtube.com/user/AEIVideos?sub_confirmation=1 Like us on Facebook https://www.facebook.com/AEIonline Follow us on Twitter https://twitter.com/AEI For more information http://www.aei.org Thumbnail photo credit: BY - Eric Fischer https://goo.gl/GX4xjh Photos marked "BY" are used under Creative Commons Attribution License: https://creativecommons.org/licenses/by/2.0/ Third-party photos, graphics, and video clips in this video may have been cropped or reframed. Music in this video may have been recut from its original arrangement and timing. In the event this video uses Creative Commons assets: If not noted in the description, titles for Creative Commons assets used in this video can be found at the link provided after each asset. The use of third-party photos, graphics, video clips, and/or music in this video does not constitute an endorsement from the artists and producers licensing those materials. AEI operates independently of any political party and does not take institutional positions on any issues. AEI scholars, fellows, and their guests frequently take positions on policy and other issues. When they do, they speak for themselves and not for AEI or its trustees or other scholars or employees. More information on AEI research integrity can be found here: http://www.aei.org/about/ #aei #news #politics #government #education #data #bigdata #census #business #economy #economics #taxes