An incredible amount of unstructured text data is generated every day by social media, web pages, and a variety of other sources. But without the ability to tame and harness that data, you'll be unable to glean any value from it. In this course, learn how to translate messy text data into powerful insights using Python. Instructor Derek Jedamski begins with a quick review of foundational NLP concepts, including how to clean text data and build a model on top of vectorized text. He then jumps into more complex topics such as word2vec, doc2vec, and recurrent neural networks. To wrap up the course, he lends these concepts a real-world context by applying them to a machine learning problem.


  • 英文名称:Advanced NLP with Python for Machine Learning
  • 时长:2小时14分
  • 字幕:英语


  1. Leveraging the power of messy text data
  2. What you should know
  3. What tools you need
  4. Using the exercise files
  5. What is NLP?
  6. NLTK setup
  7. Reading text data into Python
  8. Cleaning text data
  9. Vectorize text using TF-IDF
  10. Building a model on top of vectorized text
  11. What is word2vec?
  12. What makes word2vec powerful?
  13. How to implement word2vec
  14. How to prep word vectors for modeling
  15. What is doc2vec?
  16. What makes doc2vec powerful?
  17. How to implement doc2vec
  18. How to prep document vectors for modeling
  19. What is a neural network?
  20. What is a recurrent neural network?
  21. What makes RNNs so powerful for NLP problems?
  22. Preparing data for an RNN
  23. How to implement a basic RNN
  24. Prep the data for modeling
  25. Build a model on TF-IDF vectors
  26. Build a model on word2vec embeddings
  27. Build a model on doc2vec embeddings
  28. Build an RNN model
  29. Compare all methods using key performance metrics
  30. Key takeaways for advanced NLP modeling techniques
  31. How to continue advancing your skills