Unfolding the universe of possibilities..

Whispers from the digital wind, hang tight..

Moving Earth, Word, and Concept

Photo by Nadine Shaabana on Unsplash

Distance as a measure of difference


This article discusses three measures of distance: (1) the Earth Mover’s Distance (EMD; Rubner et al., 1998); (2) the Word Mover’s Distance (WMD; Kusner et al., 2015); and (3) the Concept Mover’s Distance (CMD; Stoltz & Taylor, 2019). These measures build on one another such that the CMD stems from the WMD, which stems from the EMD; the progression from one measure to the next is not quite linear, as one work builds indirectly from the previous to serve a different purpose, and thus, the movement from one work to the next is itself interesting to consider. For this reason, this article will discuss both the distance measures themselves and the progression from one to the next.

Earth Mover’s Distance

The Earth Mover’s Distance (EMD) is presented by Rubner et al. (1998) as a distance measure for improving image database search. The measure is described using a metaphor in which soil distributed in some way is used to fill holes distributed another way, but the case considered in the paper is not so literal. More specifically, taking image database search as a use case, Rubner et al. show that the EMD can be calculated between pairs of images and that a lower EMD indicates higher similarity. The analysis focuses on color and texture as pointwise and region-spanning properties of images, respectively, but the analysis of texture is limited to images of uniform texture. The discussion ties these properties to their importance to human perception and concludes that the EMD provides an intuitive measure of image similarity. To exhibit the potential of the EMD for navigating large sets of images, multidimensional scaling is used to plot images in two dimensions such that the information provided by the EMD is preserved.

Rubner et al. build from existing measures for calculating the distance between histograms, and one of the main contributions of the paper is its use of image “signatures” rather than full histograms; there, a signature is defined by clustering the features of an image (e.g., color features, texture features) and representing the image as a set of bins (to borrow histogram terminology), where each bin is defined by the cluster center and the size of the cluster. In other words, a signature is an alternative to a histogram for which the bins are defined by the data rather than a priori. The use of signatures improves the compactness of the data and thus improves the computational efficiency of the distance calculations while also reducing the risk of over- or underestimating a distance compared with previous methods. Further, Rubner et al. report that the EMD allows for partial matches and that it is a “true metric” when the total weights of two signatures are equal.

Word Mover’s Distance

In light of the algebraic properties of word representations highlighted by Mikolov et al. (2013), the Word Mover’s Distance (WMD) is presented by Kusner et al. (2015) to extend the EMD from image retrieval to document classification and retrieval. By representing each word from a document, where a document is a bag of words, by the vector representation derived from an embedding algorithm such as word2vec, the distance between two documents can be calculated by minimizing the distance each embedded word must travel to transform one document into another. Compared with the EMD, the WMD operates over a different type of data, but the distance calculation is much the same, and the same optimization machinery can be used. Furthermore, similar to the color case considered by Rubner et al., Kusner et al. consider a document as a point cloud of words (but what might be considered the texture of a document is left to the imagination).

In line with the image signatures presented by Rubner et al., Kusner et al. show that computational requirements can be reduced in the document retrieval context by leveraging the word centroid distance, which can be calculated by using an average of the word vectors of a document, to place a lower bound on the WMD; however, the WMD as presented does not first bin the words in a document to create a document signature, and in fact, the interpretability of the WMD, which stems from the possibility of considering pointwise movement from one document to another, is presented as one of the greatest benefits of using the measure.

Concept Mover’s Distance

In the presentations of the EMD and WMD, the closeness between items is taken to indicate their similarity, and this notion of similarity is taken as a useful way to perform retrieval tasks. The Concept Mover’s Distance (CMD) presented by Stoltz & Taylor (2019), by slight contrast, assumes that there is analytical value to such a measure of similarity. More specifically, Stoltz & Taylor differentiate the CMD from the WMD through their use of an “ideal pseudo document” against which documents can be analyzed. This pseudo document is defined by the analyst according to the needs of the study, and according to Stoltz & Taylor, this approach has the following benefits: (1) it captures the structure of concepts well; (2) it is robust to document length and the pruning of sparse terms; and (3) it can be used regardless of whether the concept of interest in present in the document.

To exhibit the analytical power of the CMD, Stoltz & Taylor examine three hypotheses (i.e., Jaynes’s (1976) hypothesis about consciousness (or its lack) in the Iliad, Odyssey, and King James Version of the Bible; one claiming that the number of deaths in Shakespearean plays correlates with engagement with the concept of death; and, following Lakoff’s (2002) theory of models of morality in United States politics, one examining engagement with the concepts of “strict father” and “nurturing parent” in State of the Union Addresses), and they show that the CMD produces values that align with expectation. Importantly, Stoltz & Taylor note that the CMD approach is useful when there is an existing theory to test, and they do not comment on the physicality of the CMD.


The three measures discussed here aim to define the distance between a pair of items as a way to quantify difference, but in stepping from one to the next, the physicality of distance is weakened. More specifically, when compared with the EMD, which relies on a relatively direct connection to human perception, the WMD largely defers to the high quality of the word embeddings and the validity of classification benchmarks to support its ability to measure semantic distance (this deference may be reasonable given the specific type of complexity that characterizes text data, but the physicality of the measure relative to the data is weakened nonetheless). Furthermore, in going from WMD to CMD, the destination against which a source can be measured is no longer observed but rather constructed as an ideal — a practice that seems at this point more art than science. The shifts from one measure to the next do not necessarily denigrate the potential of such approaches to measuring difference, as the potential stands relative to the requirements of the task at hand, but going from the notion of moving earth to fill holes to the EMD itself and then to WMD and CMD involves a layering of abstraction that must be considered when evaluating the meaning of difference.


Jaynes, Julian. 1976. The Origins of Consciousness in the Breakdown of the Bicameral Mind. Houghton Mifflin.Kusner, M. J., Sun, Y., Kolkin, N. I., & Weinberger, K. Q. (2015). From Word Embeddings To Document Distances. Proceedings of the 32 Nd International Conference on Machine Learning. International Conference on Machine Learning, Lille, France.Lakoff, George. (2002). Moral Politics: How Liberals and Conservatives Think. Chicago, IL: The University of Chicago Press.Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. http://arxiv.org/abs/1301.3781Rubner, Y., Tomasi, C., & Guibas, L. J. (1998). A metric for distributions with applications to image databases. Sixth International Conference on Computer Vision (IEEE Cat. №98CH36271), 59–66. https://doi.org/10.1109/ICCV.1998.710701Stoltz, D. S., & Taylor, M. A. (2019). Concept Mover’s Distance: Measuring concept engagement via word embeddings in texts. Journal of Computational Social Science, 2(2), 293–313. https://doi.org/10.1007/s42001-019-00048-6

Moving Earth, Word, and Concept was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Comment