logo
episode-header-image
Jul 2017
1h 5m

MLG 019 Natural Language Processing 2

OCDevel
About this episode

Try a walking desk to stay healthy while you study or work!

Notes and resources at  ocdevel.com/mlg/19 

Classical NLP Techniques:

  • Origins and Phases in NLP History: Initially reliant on hardcoded linguistic rules, NLP's evolution significantly pivoted with the introduction of machine learning, particularly shallow learning algorithms, leading eventually to deep learning, which is the current standard.

  • Importance of Classical Methods: Knowing traditional methods is still valuable, providing a historical context and foundation for understanding NLP tasks. Traditional methods can be advantageous with small datasets or limited compute power.

  • Edit Distance and Stemming:

    • Levenshtein Distance: Used for spelling corrections by measuring the minimal edits needed to transform one string into another.
    • Stemming: Simplifying a word to its base form. The Porter Stemmer is a common algorithm used.
  • Language Models:

    • Understand language legitimacy by calculating the joint probability of word sequences.
    • Use n-grams for constructing language models to increase accuracy at the expense of computational power.
  • Naive Bayes for Classification:

    • Ideal for tasks like spam detection, document classification, and sentiment analysis.
    • Relies on a 'bag of words' model, simplifying documents down to word frequency counts and disregarding sequence dependence.
  • Part of Speech Tagging and Named Entity Recognition:

    • Methods: Maximum entropy models, hidden Markov models.
    • Challenges: Feature engineering for parts of speech, complexity in named entity recognition.
  • Generative vs. Discriminative Models:

    • Generative Models: Estimate the joint probability distribution; useful with less data.
    • Discriminative Models: Focus on decision boundaries between classes.
  • Topic Modeling with LDA:

    • Latent Dirichlet Allocation (LDA) helps identify topics within large sets of documents by clustering words into topics, allowing for mixed membership of topics across documents.
  • Search and Similarity Measures:

    • Utilize TF-IDF for transforming documents into vectors reflecting term importance inversely correlated with document frequency in the corpus.
    • Employ cosine similarity for measuring semantic similarity between document vectors.
Up next
Jul 2017
MLG 020 Natural Language Processing 3
<p><a href= "https://ocdevel.com/walk?utm_source=podcast&utm_medium=mlg&utm_campaign=mlg20"> Try a walking desk</a> to stay healthy while you study or work!</p> <p>Notes and resources at  <a href= "https://ocdevel.com/mlg/20?utm_source=podcast&utm_medium=mlg&utm_campaign=mlg20"> ... Show More
40m 45s
Jul 2017
MLG 022 Deep NLP 1
<p><a href= "https://ocdevel.com/walk?utm_source=podcast&utm_medium=mlg&utm_campaign=mlg22"> Try a walking desk</a> to stay healthy while you study or work!</p> <p>Notes and resources at  <a href= "https://ocdevel.com/mlg/22?utm_source=podcast&utm_medium=mlg&utm_campaign=mlg22"> ... Show More
49m 40s
Aug 2017
MLG 023 Deep NLP 2
<p><a href= "https://ocdevel.com/walk?utm_source=podcast&utm_medium=mlg&utm_campaign=mlg23"> Try a walking desk</a> to stay healthy while you study or work!</p> <p>Notes and resources at  <a href= "https://ocdevel.com/mlg/23?utm_source=podcast&utm_medium=mlg&utm_campaign=mlg23"> ... Show More
43m 4s
Recommended Episodes
Apr 2017
Feature Processing for Text Analytics
It seems like every day there's more and more machine learning problems that involve learning on text data, but text itself makes for fairly lousy inputs to machine learning algorithms.  That's why there are text vectorization algorithms, which re-format text data so it's ready f ... Show More
17m 28s
Mar 2023
The Startup World in Generative AI! With Or Gorodissky, VP of R&D at D-ID - What's AI Podcast Episode 6
<p>This is an interview with <a href="https://www.linkedin.com/in/orgoro" target="_blank" rel="noopener noreferer">Or Gorodissky</a>, VP of R&amp;D at an amazing generative AI startup called D-ID.</p> <p>This was <strong>my first-ever interview</strong>! I hope you will still enj ... Show More
1h 2m
Aug 26
From Academia to Industry: Bridging Data Engineering Challenges
SummaryIn this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discussing the evolution of data provenance and ... Show More
50m 54s
Dec 2024
707 : Topical English Vocabulary Lesson With Teacher Tiffani about Artificial intelligence in everyday life
<p>In today’s episode, you will learn a series of vocabulary words that are connected to a specific topic. This lesson will help you improve your ability to speak English fluently about a specific topic. It will also help you feel more confident in your English abilities.</p><h1> ... Show More
21m 5s
Dec 2021
#1 Japanese Particles and Casual Japanese
Learn Japanese using modern language learning techniques such as contextual learning, and pattern recognition. Japanese doesn't have to be difficult, it just comes down to the methods you use. 
24m 56s
Jul 2025
Building Open Infrastructure for AI with Illia Polosukhin
<p>Illia Polosukhin is a veteran AI researcher and one of the original authors of the landmark Transformer paper, Attention is All You Need, which he co-authored during his time at Google Research. He has a deep background in machine learning and natural language processing, and ... Show More
49m 12s
Nov 2024
SE Radio 641: Catherine Nelson on Machine Learning in Data Science
<p><strong>Catherine Nelson</strong>, author of the new O'Reilly book, <em data-renderer-mark="true">Software Engineering for Data Scientists</em>, discusses the collaboration between data scientists and software engineers -- an increasingly common pairing on machine learning and ... Show More
48m 19s
Dec 2016
Ep. 2: Where Deep Learning Goes Next - Bryan Catanzaro, NVIDIA Applied Deep Learning Research
Bryan Catanzaro, vice president for applied deep learning research at NVIDIA, talks about how we know an AI technology is working, the potential for AI-powered speech, and where we’ll see the next deep learning breakthroughs. 
32m 52s
Sep 2024
machine learning (noun) [Word Notes]
Enjoy this special encore episode. A programming technique where the developer doesn't specify each step of the algorithm in code, but instead teaches the algorithm to learn from the experience. 
6m 16s
Feb 2025
From Clinical Notes to GPT-4: Dr. Emily Alsentzer on Natural Language Processing in Medicine
<p><a href='https://profiles.stanford.edu/emily-alsentzer'>Dr. Emily Alsentzer</a> joins hosts Raj Manrai and Andy Beam on NEJM AI Grand Rounds to discuss the evolution of natural language processing (NLP) in medicine. A Stanford faculty member and expert in clinical AI, Emily sh ... Show More
55m 6s