Text analysis of migration news articles

Leslie Root and I did some exploratory text analysis of migration-related newspaper articles in the US. We analyzed almost 10,000 articles over the period 2008-2017, looking at how topics and sentiment about migration have changed.

Our initial analysis suggests that sentiment in migration news coverage has changed over time, and decreased since 2013. Major topics in migration news include political campaigns, the economy, illegal immigrants, Europe, and more recently, Donald Trump. Topics and language appears to be influenced by election cycles. Additionally, in recent years there has been increased focus on Mexican and Muslim migrants.

All analysis was done in R using the awesome package tidytext and following ‘Text Mining in R: A tidy approach’ by Julia Silge and David Robinson. Code can be found here.


We searched ProQuest for newspaper articles that included the terms ‘population’ and ’immigra*’. This resulting in a total of 9,869 articles in the years 2008-2017. The search returned articles from the New York Times, Los Angeles Times, Wall Street Journal (WSJ) and WSJ Online. There is a drop in the number of articles in 2017 because we couldn’t collect for the whole year.

number of articles by year

Important words and combinations

The plots below show the most ‘important words’ for articles in each year. We define important words based on the term frequency–inverse document frequency, or tf-idf measure. This measures the number of times a word appears in the document, but adjusts for the fact that some words appear more frequently in general.

Throughout all years there is a strong political focus, with Obama, McCain, Clinton, Sanders and Trump all appearing as important words, as well as words like vote/r, campaign and republican. Other common words include hispanic/latino, law, workers and border. The words in 2015 show themes relating to the European migration crisis. In the most recent year, the words ‘police’ and ‘muslim’ have become important.

important words

We also looked at words preceding the words ‘immigrant’ and ‘migrant’ to see what the most important combinations were. In general the most common preceding word was ‘illegal’. However, looking at changes over time suggests a shift to using alternative words such as ‘undocumented’ and ‘unauthorized’. The number of instances of ‘undocumented immigrant/migrant’ overtook ‘illegal immigrant/migrant’ in 2017. Various ethnicities were also important preceding words, including Mexican, Latino, Hispanic, Asian and Muslim.

words preceding immigrant

Focusing on the types of ethnicities/religions most commonly mentioned preceding the words immigrant or migrant, there appears to have been a shift in the discussion over time. The most common type of immigrant discussed in the news are Muslim immigrants, and this has broadly been increasing since 2013. In addition, there appears to have been a shift towards referring to Mexican immigrants and away from using the terms Latino and Hispanic.

ethnicities mentioned


We also looked at how sentiment in migration news articles is changing over time. The plot below shows sentiment for all articles in each year, using the AFINN measure. This lexicon assigns words with a score that runs between -5 and 5, with negative scores indicating negative sentiment and positive scores indicating positive sentiment.

Sentiment is negative for all years, but the level has been changing. After initially dropping, sentiment increased during Obama’s second term, but has since decreased. This year’s news articles display the most negative sentiment of the whole period.

sentiment over time

Looking at the top five negative words over time, as we saw in the word analysis above, we can see a shift away from the use of ‘illegal’ towards the use of ‘undocumented’. The use of ‘attack’ peaked in 2015, corresponding to the Paris attacks.

sentiment over time


We used LDA to create an eight-topic model for all migration articles. The plots below show the terms that are most common with each topic. Broadly, the topics can be thought of as covering

  • Hispanic/ Latinos in LA
  • Europe
  • Art and museums
  • Workers, growth and the economy
  • Communities
  • Trump-related politics
  • Illegal, border, law enforcement
  • Elections

eight topics

Looking at the proportion of documents in five of the above topics by year, unsurprisingly the Trump-related topic has spiked in the last two years. The election topic peaks in line with election cycles. Talk of illegality and enforcement has spiked in the last year, while talk of the economy peaked in 2013, in line with when sentiment was the least negative.

topics over time


Coverage of migration issues in major news outlets in the US is generally negative, however it seems to have become even more negative over the past few years. Coverage is driven by the election cycles, and while many of the articles focus on illegal immigrants, there appears to have been a shift in terminology in recent years from ‘illegal’ to ‘undocumented’ and ‘unauthorized’.

In future work we want to build on this initial analysis by gathering a more complete dataset of US news, and look at sentiment in public opinion surveys to see how it relates to changes in coverage in the media.