A blog about embeddings

“Poetry is, at its core, the art of identifying and manipulating linguistic similarity.”

Allision Parrish

Believe it our not computers, without any labeled data, are capable of understanding the relationship of everyday things. By this I mean a computer can tell you that a shirt is more similar to a shoe than it is to a car.

In the past

In the past, to enable a computer to understand the relationship between various things you would have to provide it a labeled dataset.

	Weight (kg)	Volume (liters)	Average length of Use (years)
Shirt	0.2	1	2
Shoe	0.5	2	4
Car	1500	8000	12

In this table you can see that the shirt and the shoe are much more similar than the car. A computer, with simple math, could easily understand this too.

Now

Now, computers understand semantics, or the meaning behind a word, without labeled datasets. So a computer will be able to tell that a shoe is more similar to a shirt than a car with only one value as in input.

Computers can also perform linguistic operations like this:

(King – Man) + Woman = Queen

Amazing!

But how? Let me introduce you to embeddings.

Embeddings

Simply put, an embedding is a numerical way of representing a word. For example, If I embedded the word cat it would look something like this [0.94, -0.223, 0.54 …, 0.45, 1.34]. This is a vector of 300+ numbers (the size of this vector depends on the embedding model you use).

Once you have this array of floats you can plot it! Our brains aren’t able to understand such a high demential space, but it’s no problem for a computer.

Once the computer plots the word in vector space, it can understand the relationship between words based on their location. In this higher dimensional space you’d see cat, dog, rabbit plotted close together; and you’d also see car, motorcycle, RV plotted close in another section of the space. This multidimensional space has thousands of axises that we can’t label linguistically at this moment, but they do properly represent the word or phrase at hand.

That’s it!

That’s it! Embeddings are a crazy complex beast of computation that represent words as vectors. You can calculate the distance between two vectors to determine their similarity!

But how do you train a model that makes embeddings? What are some real world use cases of embeddings? How accessible are embeddings?

Stay tuned!

The Story of Me

A blog about embeddings

In the past

Now

Embeddings

That’s it!

Leave a comment Cancel reply

In the past

Now

Embeddings

That’s it!

Share this:

Related

Leave a comment Cancel reply