Geographic flow of music: the blog post

Last week my supervisor Pádraig Cunningham and I posted a paper called "The Geographic Flow of Music" on arXiv. We examined a few topics, but the main point of enquiry was to see whether some cities are consistently ahead of others in their musical preferences. To answer this question, we adapted a methodology used to find leadership in bird flocks, which was developed in this paper by Nagy et al.

I'm surprised to see that there have already been a number of blog posts written on the topic, and that newspaper blogs are starting to pick up on the work as well. By and large the coverage does a good job of summarizing the work, but it leaves out some interesting aspects out and in some cases overstates things. I'll try to provide an accurate summary here. Let me just begin by mentioning three caveats that are elaborated on below: 1) this is all based on lastfm users who are likely not representative of music listeners as a whole; 2) we have not yet shown our model to be predictive; 3) we don't know the mechanism that causes some cities to be ahead of others (it may or may not be influence).

Artist space
We extracted our data from last.fm's Geo API,which provides weekly charts for about 200 cities over the last three years. These charts indicate not only the ranking of the top 500 artists, but also how many unique users listened to each of the top 500 artists every week. We can think of how a city spreads its preferences over artists as a position in "artist space"---if two cities listen to the same artists at the same rates, then they are close in this space.

For example, if we take every pair of cities and calculate the distances between them in this space, we notice that some of them bunch together in clusters. Here's a dendrogram that reveals these clusters.

(Note that this dendrogram is different from the one in the paper. Depending on the subset of cities included, some things change---I wanted to show a different one here to give a feel for which the elements of the structure that are stable and which are instable. Technical details: the dendrogram is based on average linkage clustering of Euclidean distances after taking the Euclidean norm on each city's position in artist space.)

You can interpret the above diagram as follows: cities whose branches join up closer to the left are more similar. If you take a close look, there are a number of interesting points: while many of the pairs of cities that are closest together are geographically close to each other, there are some clusters of cities that have similar musical taste despite large physical distances. New York City, Chicago, San Francisco, Austin, Seattle and Portland form a tight cluster that spans the entire nation. I'd say these are the "cool kids"---trendy cities with similar tastes (similarly, Vancouver, Toronto, and Montreal separate themselves from the rest of Canada).

Leaders and followers

Leadership network for the twenty most active cities in Canada and the USA. Arrows point from followers to leaders. Height corresponds to PageRank, size to weighted in-degree. Edge width is determined by "lagged correlation." Gray edges have a lag time of 1, 2, or 3 weeks; blue edges represent lags of 4 or 5 weeks.

With the idea of "artist space" now established, you can now imagine how cities move in this space as time passes and different artists become popular. Pádraig and I wanted to know if some cities follow the movements of other cities. To check this, we first set some "lag time," lets say one month. We then see if the movements of one city resemble the movements of some other city one month earlier. We check for a high "lagged correlation" for all pairs of cities, and based on the results, decide whether to draw an edge indicating that one follows the other. In fact, we try several sizes for the lag, and choose the one that yields the highest similarity. The details are a bit technical, but not too bad--see the paper. This method is based on Nagy et al.'s method for finding leadership in pigeon flocks.

The results are interesting. First of all, depending on the genre, the leaders vary. Secondly, the big cities are not in the lead. I was expecting the above-mentioned cluster of NYC and San Francisco to be leaders, but that's just not what the data shows. Also, if you compare this diagram of leadership to the dendrogram above, you'll notice that the heavy flows do not take place between cities in the same cluster. I'm not sure what to make of that. It could be that these cities are already synchronized in their musical taste, so the flow of new information of music occurs mostly between clusters rather than within them (along the lines of Grannovetter's theory of weak ties).

Make no doubt about it: although we've tried to develop a rigorous methodology, these are preliminary findings. There are a couple of important caveats.

Caveat #1: It's all based on Last.fm data :-/
Before we even consider possible methodological problems, one important caveat to keep in mind is that our data is from last.fm, so we can't safely generalize them to your average music listeners. To what market does this analysis apply? Well, we've got to think about who uses last.fm. Honestly, I don't know much about that, but I guess it's mostly music enthusiasts. Also, although most people may not have heard much about last.fm for a while, it's still a popular service. The number of active last.fm users is a matter of conjecture, but it's in the tens of millions. I was surprised to find that last.fm users spread over many continents---it's not only popular in western Europe and North America, but also in some South American, Eastern European, and Asian cities.

Caveat #2: We haven't yet shown our model of flow to be predictive
While there are some encouraging signs that our model is not just an artifact of the method we employed (see paper for details), the most convincing validation for me would be to see that the model helps make predictions about what will be popular in any given city in the future. I've got an idea of how to perform this validation, and plan to do so in the near future.

Caveat #3: We just indicate which cities are ahead, we don't know why (leaders may not be 'influencers')
We don't know what mechanism causes some cities to appear to follow other cities. It could be that followers genuinely imitate leaders---i.e., the leading cities exert influence upon the followed cities. Or it could be that some other force (e.g., the music industry or tour dates) is causing this to happen.

Geographic flow of music: the blog post

View comments

The Sociograph