• Publicado: 26 Sep 2016

  • Archivado en: general, datascience, r-english, política

When Trump visits... tweets from his trip to Mexico

I’m sure many of my fellow Mexicans will remember the historically ill-advised (to say the least) decision by our President to invite Donald Trump for a meeting.

Talking to some fellow colleagues, we couldn’t help but notice that maybe in another era this decision would have been good policy. The problem, some concluded, was the influence of social media today. In fact, the Trump debacle did cause outcry among leading politica voices online.

I wanted to investigate this further, and thankfully for me, I’ve been using R to collect tweets from a catalog of leading political personalities in Mexico for a personal business project.

Here is a short descriptive look at what the 65 twitter accounts I’m following tweeted between August 27th and September 5th (the Donald announced his visit on August the 30th). I’m sorry I can’t share the dataset, but you get the idea with the code…

library(dplyr)
library(stringr)

# 42 of the 65 accounts tweeted between those dates.
d %>% 
  summarise("n" = n_distinct(NOMBRE))
#   n
#  42

We can see how mentions of trump spike just about the time it was announced…

byhour <- d %>% 
  mutate("MONTH" = as.numeric(month(T_CREATED)), 
         "DAY" = as.numeric(day(T_CREATED)), 
         "HOUR" = as.numeric(hour(T_CREATED)), 
         "TRUMP_MENTION" = str_count(TXT, pattern = "Trump|TRUMP|trump")) %>% 
  group_by(MONTH, DAY, HOUR) %>% 
  summarise("N" = n(), 
            "TRUMP_MENTIONS" = sum(TRUMP_MENTION)) %>%
  mutate("PCT_MENTIONS" = TRUMP_MENTIONS/N*100) %>%
  arrange(desc(MONTH), desc(DAY), HOUR) %>%
  mutate("CHART_DATE" = as.POSIXct(paste0("2016-",MONTH,"-",DAY," ", HOUR, ":00")))

library(ggplot2)  
library(eem)
ggplot(byhour, 
       aes(x = CHART_DATE, 
           y = PCT_MENTIONS)) + 
        geom_line(colour=eem_colors[1]) + 
        theme_eem()+
        labs(x = "Time", 
             y = "Trump mentions \n (% of Tweets)")

Trump tweets by mexican officials, percent

The peak of mentions (as a percentage of tweets) was September 1st at 6 am (75%). But it terms of amount of tweets, it is much more obvious the outcry was following the anouncement and later visit of the candidate:

ggplot(byhour, 
       aes(x = CHART_DATE, 
           y = TRUMP_MENTIONS)) + 
        geom_line(colour=eem_colors[1]) + 
        theme_eem()+
        labs(x = "Time", 
             y = "Trump mentions \n (# of Tweets)")

Trump tweets by mexican officials, total

We can also (sort-of) identify the effect of these influencers tweeting. I’m going to add the followers, which are potential viewers, of each tweet mentioning Trump, by hour.

byaudience <- d %>% 
  mutate("MONTH" = as.numeric(month(T_CREATED)), 
         "DAY" = as.numeric(day(T_CREATED)), 
         "HOUR" = as.numeric(hour(T_CREATED)), 
         "TRUMP_MENTION" = str_count(TXT, pattern = "Trump|TRUMP|trump")) %>% 
  filter(TRUMP_MENTION > 0) %>%
  group_by(MONTH, DAY, HOUR) %>% 
  summarise("TWEETS" = n(), 
            "AUDIENCE" = sum(U_FOLLOWERS)) %>%
  arrange(desc(MONTH), desc(DAY), HOUR) %>%
  mutate("CHART_DATE" = as.POSIXct(paste0("2016-",MONTH,"-",DAY," ", HOUR, ":00")))


ggplot(byaudience, 
       aes(x = CHART_DATE, 
           y = AUDIENCE)) + 
        geom_line(colour=eem_colors[1]) + 
        theme_eem()+
        labs(x = "Time", 
             y = "Potential audience \n (# of followers)")

Total audience of trump tweets

So clearly, I’m stating the obvious. People were talking. But how was the conversation being developed? Let’s first see the type of tweets (RT’s vs drafted individually):

bytype <- d %>% 
  mutate("TRUMP_MENTION" = str_count(TXT, pattern = "Trump|TRUMP|trump")) %>%
  # only the tweets that mention trump
  filter(TRUMP_MENTION>0) %>%
  group_by(T_ISRT) %>% 
  summarise("count" = n())
kable(bytype)
T_ISRT count
FALSE 313
TRUE 164

About 1 in 3 was a RT. Comparing to the overall tweets, (1389 out of 3833) this seems not too much of a difference, so it wasn’t necesarrily an influencer pushing the discourse. In terms of the most mentioned by tweet it was our President on the spotlight:

bymentionchain <- d %>% 
  mutate("TRUMP_MENTION" = str_count(TXT, pattern = "Trump|TRUMP|trump")) %>%
  # only the tweets that mention trump
  group_by(TRUMP_MENTION, MENTION_CHAIN) %>% 
  summarise("count" = n()) %>% 
  ungroup() %>% 
  mutate("GROUPED_CHAIN" = ifelse(grepl(pattern = "EPN", 
                                        x = MENTION_CHAIN), 
                                  "EPN", MENTION_CHAIN)) %>% 
  mutate("GROUPED_CHAIN" = ifelse(grepl(pattern = "realDonaldTrump", 
                                        x = MENTION_CHAIN), 
                                  "realDonaldTrump", GROUPED_CHAIN))
                                  
ggplot(order_axis(bymentionchain %>% 
                    filter(count>10 & GROUPED_CHAIN!="ND"), 
                  axis = GROUPED_CHAIN, 
                  column = count), 
       aes(x = GROUPED_CHAIN_o, 
           y = count)) + 
  geom_bar(stat = "identity") + 
  theme_eem() + 
  labs(x = "Mention chain \n (separated by _|.|_ )", y = "Tweets")

Mentions

How about the actual persons who tweeted? It seemed like news anchor Joaquin Lopez-Doriga and security analyst Alejandro Hope were the most vocal about the visit (out of the influencers i’m following).

bytweetstar <- d %>% 
  mutate("TRUMP_MENTION" = ifelse(str_count(TXT, pattern = "Trump|TRUMP|trump")<1,0,1)) %>%
  group_by(TRUMP_MENTION, NOMBRE) %>% 
  summarise("count" = n_distinct(TXT))
## plot with ggplot2

Mentions

I also grouped each person by his political affiliation and I found it confirms the notion that the conversation on the eve of the visit, at least among this very small subset of twitter accounts, was driven by those with no party afiliation or in the “PAN” (opposition party).

byafiliation <- d %>% 
  mutate("MONTH" = as.numeric(month(T_CREATED)), 
         "DAY" = as.numeric(day(T_CREATED)), 
         "HOUR" = as.numeric(hour(T_CREATED)), 
         "TRUMP_MENTION" = ifelse(str_count(TXT, pattern = "Trump|TRUMP|trump")>0,1,0)) %>% 
  group_by(MONTH, DAY, HOUR, TRUMP_MENTION, AFILIACION) %>% 
  summarise("TWEETS" = n()) %>%
  arrange(desc(MONTH), desc(DAY), HOUR) %>%
  mutate("CHART_DATE" = as.POSIXct(paste0("2016-",MONTH,"-",DAY," ", HOUR, ":00")))
  
 ggplot(byafiliation, 
       aes(x = CHART_DATE, 
           y = TWEETS, 
           group = AFILIACION, 
           fill = AFILIACION)) + 
  geom_bar(stat = "identity") + 
  theme_eem() + 
  scale_fill_eem(20) + 
  facet_grid(TRUMP_MENTION ~.) +
  labs(x = "Time", y = "Tweets \n (By mention of Trump)")

Mentions

However, It’s interesting to note how there is a small spike of the accounts afiliated with the PRI (party in power) on the day after his visit (Sept. 1st). Maybe they were trying to drive the conversation to another place?