Competitive Pokemon

In this project, I am looking to answer the question, what makes a good competitive team in Pokemon? Pokemon is currently in its 7th Generation with its latest release in November 2017, Ultra Sun and Ultra Moon.

There is a huge competitive community of Pokemon and many applications that let you build teams and fight with other players.

There are a huge amount of variables that may go into battling. Here are some of them:

Scraping

We want to make a good competitive Pokemon team, so we should look at what was successful in previous competitions.

We will be scraping data from the official Pokemon site of Video Game Championship (VGC) tournaments in the Master’s division (we will be excluding Juniors and Seniors divisions). These are double battles which changes the dynamic from single battles (e.g. increasing the importance of the move Protect so multi-pokemon-hitting moves like Earthquake don’t attack allies)

website <- "https://www.pokemon.com/us/play-pokemon/"
page_suffix = "/vg-masters/"
pages <- c("internationals/2018/latin-america/vg-masters/",
"sindelfingen-regionals-2018/vg-masters/",
"portland-regionals-2018/vg-masters/",
"charlotte-regionals-2018/vg-masters/",
"costa-mesa-regionals-2018/vg-masters/",
"collinsville-regionals-2018/vg-masters/",
"malmo-regionals-2018/vg-masters/",
"internationals/2018/oceania/vg-masters/",
"leipzig-regionals-2018/vg-masters/",
"dallas-regionals-2018/vg-masters/",
"memphis-regionals-2018/vg-masters/",
"san-jose-regionals-2018/vg-masters/",
"internationals/2018/europe/vgc-masters/",
"vancouver-regionals-2018/vg-masters/",
"daytona-regionals-2018/vg-masters/",
"hartford-regionals-2018/vg-masters/",
"bremen-regionals-2018/vg-masters/",
"fort-wayne-regionals-2018/vg-masters/"
)
team <- "div.team"
trainer <- ".banner h2"
trainer_rank <- ".banner p"
pokemon_selector <- "div.body .row-wrapper .pokemon"
page <- pages[1]
teams <- data.frame(matrix(ncol=0, nrow=0))
competing_pokemon <- data.frame(matrix(ncol=0, nrow=0))
scrape_vector <- function(nodes, selector) nodes %>%
  html_nodes(selector) %>%
  html_text() %>%
  str_trim() %>%
  str_replace("\\s+", " ")

In our scraping we will be attempting to find the above information for competitive pokemon, except for the asterisks. Individual values and effort values are stats factor into the final stats of a Pokemon in addition to the Base Stat. While this is an important calculation, in the competitive scene, we can assume that the IVs and EVs are maxed out, but the distribution of EVs should look pretty similar to how the Base Stat for each Pokemon is distributed. EVs and IVs are generally hidden and games and the Pokemon VGC Tournament standings do not release EV and IV information, so we cannot track these.

In scraping the webpages, the Pokemon website was weirdly cooperative with me. The URLs followed the same pattern, each standings page had the same mark up, and they provided enough information for me to run on.

rows <- 0

for (page in pages) {
  webpage <- read_html(paste(website,page,sep=""))
  # year in URL used to determine what year the team was used
  year <- strtoi(str_match(page, "(20[0-9]{2})"))

  # assemble teams with columns: trainer, trainer_rank, team_id, year
  team_nodes <- html_nodes(webpage, team)
  num_teams <- length(team_nodes)
  trainers <- scrape_vector(team_nodes, trainer)
  trainer_ranks <- team_nodes %>%
    html_node(trainer_rank) %>%
    html_text() %>%
    str_trim() %>%
    str_replace("\\s+", " ")
  page_frame <- data.frame(
    year=year,
    trainer=trainers,
    trainer_rank=trainer_ranks,
    team_id=rows:(rows+num_teams-1))
  # assemble pokemon on teams with columns: team_id, name, ability, move(1/2/3/4), nature, held item
  pokemon_nodes <- team_nodes %>%
    html_nodes(pokemon_selector)
  num_pokemons <- length(pokemon_nodes)
  pokemon_names <- scrape_vector(pokemon_nodes, ".banner")
  pokemon_abilities <- scrape_vector(pokemon_nodes, ".ability li")
  pokemon_held_items <- scrape_vector(pokemon_nodes, ".held-item li")
  pokemon_natures <- scrape_vector(pokemon_nodes, ".nature li")
  pokemon_frame <- scrape_vector(pokemon_nodes, ".moves ul li") %>%
    matrix(nrow=4, ncol=num_pokemons) %>%
    rotate() %>%
    as.data.frame()
  colnames(pokemon_frame) <- c("move1", "move2", "move3", "move4")
  pokemon_frame$name = pokemon_names
  pokemon_frame$team_id = floor((0:(num_pokemons-1))/6) + rows
  pokemon_frame$held_item = pokemon_held_items
  pokemon_frame$nature = pokemon_natures
  pokemon_frame$ability = pokemon_abilities
  rows = rows + num_teams

  #append the new data on this page to the teams and competing pokemon
  teams <- rbind(teams, page_frame)
  competing_pokemon <- rbind(competing_pokemon, pokemon_frame)
}
head(teams)
##   year                       trainer                     trainer_rank
## 1 2018 Carson Confer (United States)        Masters Division Champion
## 2 2018  Alberto Lara (United States)       Masters Division Runner-Up
## 3 2018             Eric Ríos (Spain)    Masters Division Semifinalist
## 4 2018        Melvin Keh (Singapore)    Masters Division Semifinalist
## 5 2018            Alex Gomez (Spain) Masters Division Quarterfinalist
## 6 2018  Jean Paul Lopez Buiza (Peru) Masters Division Quarterfinalist
##   team_id
## 1       0
## 2       1
## 3       2
## 4       3
## 5       4
## 6       5
head(competing_pokemon)
##          move1        move2          move3        move4
## 1      Protect       Encore   Helping Hand        Scald
## 2     Fake Out     Ice Beam    Energy Ball   Hydro Pump
## 3 Hidden Power  Sludge Bomb      Rock Tomb  Earth Power
## 4      Protect Sucker Punch      Iron Head    Knock Off
## 5      Protect  Volt Switch Dazzling Gleam      Thunder
## 6      Protect Swords Dance       Bug Bite Bullet Punch
##                     name team_id    held_item  nature        ability
## 1               Politoed       0 Eject Button   Sassy        Drizzle
## 2               Ludicolo       0   Waterium Z   Timid     Swift Swim
## 3 Landorus Therian Forme       0 Assault Vest  Modest     Intimidate
## 4                Bisharp       0   Focus Sash Adamant        Defiant
## 5              Tapu Koko       0     Life Orb   Timid Electric Surge
## 6                 Scizor       0    Scizorite Adamant    Light Metal

Pokemon Stat Data

Now that we got our teams, we need to get the rest of the characteristics we mentioned before: Base Stats, and Typing.

There are a lot of ’Pokedex’s, but each of them have their own issues. One issue could be they did not provide all the attributes we wanted. Or, maybe the attributes were spread out among multiple spreadsheets. Or, (most commonly) the spreadsheets did not provide the full data about every pokemon currently. Some would exclude the 7th generation which may also affect our attributes since Base Stats are known to change sporadically over the generations.

So, we get our data from a RESTful API that has the least problems. While it has Generation 7 data, it did not include the new Pokemon from Ultra Sun and Ultra Moon which meant I had to input manually below. Luckily only two new Pokemon were used competitively (Naganadel, Stakataka).

Another issue was formes. Many Pokemon take ‘formes’. For some Pokemon, like Landorus Therian Forme and Alolan Ninetales, the scraping produced correct namings that our API understands. However, we have some Pokemon who change formes in battle due to their abilities. Competitively, only Aegislash and Mimikyu are known to change formes during battle. They produce a difficult decision since their Base Stats do change during battle, so I only decided to query their starting formes when they enter battle.

pokemon <- data.frame(matrix(ncol=0, nrow=0))

# aegislash and mimikyu have special abilities that changes their form during battle.
# We'll assume their starting forms in battle
form_pokemon = data.frame(
  api_name = c("aegislash", "mimikyu"),
  form_name = c("shield"  , "disguised")
)
competing_pokemon <- competing_pokemon %>%
  mutate(api_name=name %>%
           tolower() %>%
           gsub("form(e?)", "", .) %>%
           gsub("rotom$", "", .) %>%
           str_trim() %>%
           gsub("\\s", "-", .)) %>%
  left_join(form_pokemon) %>%
  mutate(api_name=ifelse(is.na(form_name), api_name, paste(api_name, form_name, sep="-")))
## Joining, by = "api_name"
## Warning: Column `api_name` joining character vector and factor, coercing
## into character vector
unique_pokemon <- competing_pokemon %>%
  select(name, api_name) %>%
  group_by(name=gsub("(F|f)orme", "Form", name), api_name) %>%
  summarize(num_teams=n())
pokemon_datum = data.frame(matrix(ncol=0, nrow=0))
pokemon = unique_pokemon$api_name[1]
for (pokemon in unique_pokemon$api_name) {
  # two pokemon, naganadel and stakataka, are introduced in a later version than our API covers, so we manually enter their details
  if (pokemon == "naganadel") {
    poke_data = data.frame(
      id=804,
      api_name=pokemon,
      type1="poison",
      type2="dragon",
      hp=73,
      atk=73,
      def=73,
      spatk=127,
      spdef=73,
      speed=121
    )
  } else if (pokemon == "stakataka") {
    poke_data = data.frame(
      id=805,
      api_name=pokemon,
      type1="rock",
      type2="steel",
      hp=63,
      atk=131,
      def=211,
      spatk=53,
      spdef=101,
      speed=13
    )
  } else {
    url = paste("https://pokeapi.co/api/v2/pokemon/", pokemon, sep="")
    tryCatch({
      pokemon_json = fromJSON(file=url)
    }, error=function(e) {
      print("Failure:")
      print(pokemon)
    })
    poke_data = data.frame(
      id=pokemon_json$id,
      api_name=pokemon,
      type1 = pokemon_json$types[[1]]$type$name,
      type2 = ifelse(length(pokemon_json$types) == 1, NA, pokemon_json$types[[2]]$type$name)
    )
    for (stat in pokemon_json$stats) {
      stat_name = stat$stat$name
      if (stat_name == "special-defense") {
        poke_data$spdef = stat$base_stat
      } else if (stat_name == "special-attack") {
        poke_data$spatk = stat$base_stat
      } else if (stat_name == "speed") {
        poke_data$speed = stat$base_stat
      } else if (stat_name == "defense") {
        poke_data$def = stat$base_stat
      } else if (stat_name == "attack") {
        poke_data$atk = stat$base_stat
      } else if (stat_name == "hp") {
        poke_data$hp = stat$base_stat
      }
    }
  }
  pokemon_datum = rbind(pokemon_datum, poke_data)
}
head(pokemon_datum, 30)
##       id         api_name    type1   type2 speed spdef spatk def atk  hp
## 1    681 aegislash-shield    ghost   steel    60   150    50 150  50  60
## 2    142       aerodactyl   flying    rock   130    75    60  65 105  80
## 3    591        amoonguss   poison   grass    30    80    85  70  85 114
## 4    752        araquanid      bug   water    42   132    50  92  70  68
## 5     59         arcanine     fire    <NA>    95    80   100  80 110  90
## 6    531           audino   normal    <NA>    50    86    60  86  60 103
## 7    184        azumarill    fairy   water    50    80    60  80  50 100
## 8    625          bisharp    steel    dark    70    70    60 100 125  65
## 9      9        blastoise    water    <NA>    78   105    85 100  83  79
## 10   257         blaziken fighting    fire    80    70   110  70 120  80
## 11   286          breloom fighting   grass    70    60    60  80 130  60
## 12   794         buzzwole fighting     bug    79    53    53 139 139 107
## 13   323         camerupt   ground    fire    40    75   105  70 100  70
## 14   565       carracosta     rock   water    32    65    83 133 108  74
## 15   797       celesteela   flying   steel    61   101   107 103 101  97
## 16   609       chandelure     fire   ghost    80    90   145  90  55  60
## 17     6        charizard   flying    fire   100    85   109  78  84  78
## 18    35         clefairy    fairy    <NA>    35    65    60  48  45  70
## 19   488        cresselia  psychic    <NA>    85   130    75 120  70 120
## 20   149        dragonite   flying  dragon    80   100   100  95 134  91
## 21   780           drampa   dragon  normal    36    91   135  85  60  78
## 22   426         drifblim   flying   ghost    80    54    90  44  80 150
## 23   133            eevee   normal    <NA>    55    65    45  50  55  55
## 24   196           espeon  psychic    <NA>   110    95   130  60  65  65
## 25   530        excadrill    steel  ground    88    65    50  60 135 110
## 26 10114  exeggutor-alola   dragon   grass    45    75   125  85 105  95
## 27   598       ferrothorn    steel   grass    20   116    54 131  94  74
## 28   671          florges    fairy    <NA>    75   154   112  68  65  78
## 29   445         garchomp   ground  dragon   102    85    80  95 130 108
## 30   282        gardevoir    fairy psychic    80   115   125  65  65  68
unique_pokemon <- unique_pokemon %>%
  left_join(pokemon_datum)
## Joining, by = "api_name"
## Warning: Column `api_name` joining character vector and factor, coercing
## into character vector
write.csv(unique_pokemon, "C:/Users/Jeremy/Documents/cmsc320/pokemon.csv")

Superficial Analysis

Pokemon Frequency

In our first plot, we look at the most used Pokemon. Introduced in Pokemon Sun and Moon, Tapu Koko takes the top spot with its sister pokemon, Tapu Fini and Tapu Bulu, also up there. Tapu Koko is an Electric/Fairy type with a decent signature ability that prevents ‘stalling’ (a game where little progress is made in fainting the stalling Pokemon). Second place is Landorus Therian Forme, a Ground/Flying type. This pokemon gets the ability Intimidate which cuts the Attack stat of the opponent’s Pokemon. It also gets the move Earthquake and as a ground type, it would hit harder. Snorlax also cracks the top list because it has little type weaknesses and has access to the move Belly Drum which maxes out its Attack stat at an HP cost.

Type Frequency

Calculating the percentages of each type occurence, we can see some very revealing things. The percentages sum to above 100% because Pokemon can hold two types, so the upper bound of types is 200%.

Fairy is by far the most common type. This is likely because Dragon-type pokemon usually have a high attack statistic and generally are weak to Fairy types. Dark types are also weak to Fairies, and they are used relatively common competitively.

In order to counter Fairies, teams also have a Steel type, since Fairies are weak to Steel, and the type generally has access to a set of moves that set up ‘entry hazards’. We also have very common Ground and Fire users to counter Steel types.

Some uncommon types include Bug, due to a low Stat Total; Ice, probably due to its many weaknesses to commonly used types; Fighting, probably due to Normal typing having better bulk due to type weaknesses.

Another interesting statistic is that Ghost types appear more in the higher ranks and Psychic types appearing less in higher ranks.

## Warning: Column `type` joining factors with different levels, coercing to
## character vector
## Joining, by = "trainer_rank"

## Hypothesis 1: Base Stat Total is related to rank

Rejected. The P value is about 0.14 meaning that we have issues with the null hypothesis. We want our p value to be below 0.05.

mutated <- fully_joined %>%
  mutate(rank=ifelse(trainer_rank == "Masters Division Champion", 1,
                     ifelse(trainer_rank=="Masters Division Quarterfinalist", 2,
                            ifelse(trainer_rank == "Masters Division Runner-Up", 3, 4))),
         total_stat=hp+atk+def+spatk+spdef+speed)
regression <- lm(data=mutated, total_stat~rank)
regression %>% broom::tidy()
##          term   estimate std.error statistic   p.value
## 1 (Intercept) 531.827257  5.359973 99.222008 0.0000000
## 2        rank   2.896412  1.990644  1.455013 0.1460298