Machine Learning

Learn Information Extraction with Skip-Gram Architecture

Why Information Extraction?

Mastering Information Extraction offers several advantages in the field of natural language processing and text analysis:

  1. Knowledge extraction: Information Extraction enables the extraction of valuable insights and structured knowledge from unstructured text data, allowing for a deeper understanding of textual information.
  2. Enhanced data understanding: By analyzing patterns and relationships within text data, Information Extraction helps uncover hidden patterns, correlations, and semantic associations that can provide valuable insights and improve decision-making processes.
  3. Automation and efficiency: Automated Information Extraction techniques save time and effort by automatically processing large volumes of text data, enabling faster analysis and decision-making.
  4. Real-world applications: Information Extraction has wide-ranging applications in various domains, including information retrieval, sentiment analysis, question answering systems, knowledge graph construction, and more.

What will be covered in this tutorial

In this tutorial on Information Extraction, we will cover the following topics:

  • Understanding Information Extraction: Exploring the concept and significance of Information Extraction in extracting valuable insights from unstructured text data.
  • Extracting knowledge from patterns: Learning techniques for identifying and extracting valuable knowledge by analyzing patterns and relationships within text data.
  • Word representation: Understanding the importance of representing words in a meaningful way, such as numerical vectors, to capture their semantic and contextual information.
  • Skip-Gram architecture: Introducing the Skip-Gram architecture, a neural network model commonly used for learning word embeddings and capturing the relationships between words in a text corpus.
  • Discovering word relationships: Uncovering surprising relationships and associations between words using techniques like word embeddings, which enable us to understand how words relate to each other in a meaningful and context-aware manner.
Watch tutorial

What is Information Extraction?

Information Extraction is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents (wiki).

Let’s try some different approaches.

Approach 1: Extract Knowledge from Patters

Given data knowledge that is fit together – then try to find patterns.

This is actually a powerful approach. Assume you know that Amazon was founded in 1992 and Facebook was founded in 2004.

A pattern could be be “When {company} was founded in {year},”

Let’s try this in real life.

import pandas as pd
import re

# Reading a knowledge base (here only one entry in the csv file)
books = pd.read_csv('', header=None)

# Convert to t a list
book_list = books.values.tolist()

# Read some content (here a web-page)
with open('') as f:
    corpus =

corpus = corpus.replace('\n', ' ').replace('\t', ' ')

# Try to look where we find our knowledge to find patters
for val1, val2 in book_list:
    print(val1, '-', val2)
    for i in range(0, len(corpus) - 100, 20):
        pattern = corpus[i:i + 100]
        if val1 in pattern and val2 in pattern:
            print('-:', pattern)

This gives the following.

1984 - George Orwell
-: ge-orwell-with-a-foreword-by-thomas-pynchon/">1984</a></h2>   <h2 class="author">by George Orwell</h
-: eword-by-thomas-pynchon/">1984</a></h2>   <h2 class="author">by George Orwell</h2>    <div class="de
-: hon/">1984</a></h2>   <h2 class="author">by George Orwell</h2>    <div class="desc">We were pretty c
The Help - Kathryn Stockett
-: /the-help-by-kathryn-stockett/">The Help</a></h2>   <h2 class="author">by Kathryn Stockett</h2>    <
-: -stockett/">The Help</a></h2>   <h2 class="author">by Kathryn Stockett</h2>    <div class="desc">Thi

This gives you an idea of some patterns.

prefix = re.escape('/">')
middle = re.escape('</a></h2>   <h2 class="author">by ')
suffix = re.escape('</h2>    <div class="desc">')

regex = f"{prefix}(.{{0,50}}?){middle}(.{{0,50}}?){suffix}"
results = re.findall(regex, corpus)

for result in results:

Giving the following pattern matches with new knowledge.

[('War and Peace', 'Leo Tolstoy'),
 ('Song of Solomon', 'Toni Morrison'),
 ('Ulysses', 'James Joyce'),
 ('The Shadow of the Wind', 'Carlos Ruiz Zafon'),
 ('The Lord of the Rings', 'J.R.R. Tolkien'),
 ('The Satanic Verses', 'Salman Rushdie'),
 ('Don Quixote', 'Miguel de Cervantes'),
 ('The Golden Compass', 'Philip Pullman'),
 ('Catch-22', 'Joseph Heller'),
 ('1984', 'George Orwell'),
 ('The Kite Runner', 'Khaled Hosseini'),
 ('Little Women', 'Louisa May Alcott'),
 ('The Cloud Atlas', 'David Mitchell'),
 ('The Fountainhead', 'Ayn Rand'),
 ('The Picture of Dorian Gray', 'Oscar Wilde'),
 ('Lolita', 'Vladimir Nabokov'),
 ('The Help', 'Kathryn Stockett'),
 ("The Liar's Club", 'Mary Karr'),
 ('Moby-Dick', 'Herman Melville'),
 ("Gravity's Rainbow", 'Thomas Pynchon'),
 ("The Handmaid's Tale", 'Margaret Atwood')]

Approach 2: Skip-Gram Architecture

One-Hot Representation

  • Representation word as a vector with a single 1, and with other values as 0
  • Maybe not useful to have with

Distributed Representation

  • representation of meaning distributed across multiple values

How to define words as vectors

  • Word is defined by what words suround it
  • Based on the context
  • What words happen to show up around it


  • model for generating word vectors

Skip-Gram Architecture

  • Neural network architecture for predicting context words given a target word
    • Given a word – what words show up around it in a context
  • Example
    • Given target word (input word) – train the network of which context words (right side)
    • Then the weights from input node (target word) to hidden layer (5 weights) give a representation
    • Hence – the word will be represented by a vector
    • The number of hidden nodes represent how big the vector should be (here 5)
  • Idea is as follows
    • Each input word will get weights to the hidden layers
    • The hidden layers will be trained
    • Then each word will be represented as the weights of hidden layers
  • Intuition
    • If two words have similar context (they show up the same places) – then they must be similar – and they have a small distance from each other representations
import numpy as np
from scipy.spatial.distance import cosine

with open('') as f:
    words = {}
    lines = f.readlines()
    for line in lines:
        row = line.split()
        word = row[0]
        vector = np.array([float(x) for x in row[1:]])
        words[word] = vector

def distance(word1, word2):
    return cosine(word1, word2)

def closest_words(word):
    distances = {w: distance(word, words[w]) for w in words}
    return sorted(distances, key=lambda w: distances[w])[:10]

This will amaze you. But first let’s see what it does.

distance(words['king'], words['queen'])

Gives 0.19707422881543946. Some number that does not give much sense.

distance(words['king'], words['pope'])

Giving 0.42088794105426874. Again, not much of value.

closest_words(words['king'] - words['man'] + words['woman'])




Why do I say wow?

Well, king – man + woman becomes queen.

If that is not amazing?

Want to learn more?

This is was the last lesson of the 15 machine learning projects.

This is part of a FREE 10h Machine Learning course with Python.

  • 15 video lessons – which explain Machine Learning concepts, demonstrate models on real data, introduce projects and show a solution (YouTube playlist).
  • 30 JuPyter Notebooks – with the full code and explanation from the lectures and projects (GitHub).
  • 15 projects – with step guides to help you structure your solutions and solution explained in the end of video lessons (GitHub).

View Comments

  • Hi,

    I intend to contribute a guest post to your website that will help you get good traffic as well as interest your readers.

    Shall I send you the topics then?

    Kathelene Paul

Recent Posts

Build and Deploy an AI App

Build and Deploy an AI App with Python Flask, OpenAI API, and Google Cloud: In…

5 days ago

Building Python REST APIs with gcloud Serverless

Python REST APIs with gcloud Serverless In the fast-paced world of application development, building robust…

5 days ago

Accelerate Your Web App Development Journey with Python and Docker

App Development with Python using Docker Are you an aspiring app developer looking to level…

6 days ago

Data Science Course Made Easy: Unlocking the Path to Success

Why Value-driven Data Science is the Key to Your Success In the world of data…

2 weeks ago

15 Machine Learning Projects: From Beginner to Pro

Harnessing the Power of Project-Based Learning and Python for Machine Learning Mastery In today's data-driven…

2 weeks ago

Unlock the Power of Python: 17 Project-Based Lessons from Zero to Machine Learning

Is Python the right choice for Machine Learning? Should you learn Python for Machine Learning?…

2 weeks ago