NLP with Python for Machine Learning Essential Training

With the increased amount of data publicly available and the increased focus on unstructured text data, understanding how to clean, process, and analyze that text data is tremendously valuable. If you have some experience with Python and an interest in natural language processing (NLP), this course can provide you with the knowledge you need to tackle complex problems using machine learning. Instructor Derek Jedamski provides a quick summary of basic natural language processing (NLP) concepts, covers advanced data cleaning and vectorization techniques, and then takes a deep dive into building machine learning classifiers. During this last step, Derek shows how to build two different types of machine learning models, as well as how to evaluate and test variations of those models.

Topics include:

  • Explain the definition of an NLP.
  • Describe the process of tokenizing.
  • Identify the purpose of vectorizing.
  • Recognize the outcomes of lemmatizing.
  • Summarize the characteristics of TF-IDF.
  • Define accuracy in terms of evaluation metrics.
  • Recall three benefits of using ensemble methods.

课程信息

  • 英文名称:NLP with Python for Machine Learning Essential Training
  • 时长:4小时14分
  • 字幕:英语

课程目录

  1. Welcome
  2. What you should know
  3. What tools do you need?
  4. Using the exercise files
  5. What are NLP and NLTK?
  6. NLTK setup and overview
  7. Reading in text data
  8. Exploring the dataset
  9. What are regular expressions?
  10. Learning how to use regular expressions
  11. Regular expression replacements
  12. Machine learning pipeline
  13. Implementation: Removing punctuation
  14. Implementation: Tokenization
  15. Implementation: Removing stop words
  16. Introducing stemming
  17. Using stemming
  18. Introducing lemmatizing
  19. Using lemmatizing
  20. Introducing vectorizing
  21. Count vectorization
  22. N-gram vectorizing
  23. Inverse document frequency weighting
  24. Introducing feature engineering
  25. Feature creation
  26. Feature evaluation
  27. Identifying features for transformation
  28. Box-Cox power transformation
  29. What is machine learning?
  30. Cross-validation and evaluation metrics
  31. Introducing random forest
  32. Building a random forest model
  33. Random forest with holdout test set
  34. Random forest model with grid search
  35. Evaluate random forest model performance
  36. Introducing gradient boosting
  37. Gradient-boosting grid search
  38. Evaluate gradient-boosting model performance
  39. Model selection: Data prep
  40. Model selection: Results
  41. Next steps

评论