Introduction to lingtypology

1. What is `lingtypology`?

simple tool for creating interactive and static maps in R
you can use the result on your website and in your projects
API to some linguistic databases (Glottolog, WALS, PHOIBLe, AUTOTYP and others)

2. Package instalation

Get the stable version from CRAN:

install.packages("lingtypology")

… or get the development version from GitHub:

# install.packages("devtools") # 
devtools::install_github("ropensci/lingtypology")

If you’ve got some problems with dependencies, try to install one of the older vertions:

devtools::install_version("lingtypology", version = "1.0.12")

Misha Voronov claim that for correct instalation on Debian some additional packages should be installed:

apt-get install libcurl4-openssl-dev
apt-get install libssl-dev

Load a library:

library(lingtypology)

2. Glottolog functions

This package is based on the Glottolog database (v. 2.7), so lingtypology has several functions for accessing data from that database.

2.1 Command name’s syntax

Most of the functions in lingtypology have the same syntax: what you need.what you have. Most of them are based on language name.

aff.lang() — get affiliation by language
area.lang() — get macro area by language
country.lang() — get country by language
iso.lang() — get ISO 639-3 code by language
gltc.lang() — get glottocode (identifier for a language in the Glottolog database) code by language
lat.lang() — get latitude by language
long.lang() — get longitude by language

Some of them help to define a vector of languages.

lang.aff() — get language by affiliation
lang.iso() — get language by ISO 639-3 code
lang.gltc() — get language by glottocode

Additionally there are some functions to convert glottocodes to ISO 639-3 codes and vice versa:

gltc.iso() — get glottocode by ISO 639-3 code
iso.gltc() — get ISO 639-3 code by glottocode

Glottolog database (v. 2.7) provides lingtypology with language names, ISO codes, genealogical affiliation, macro area, countries, coordinates, and much information. This set of functions doesn’t have a goal to cover all possible combinations of functions. Check out additional information that is preserved in the version of the Glottolog database used in lingtypology:

names(glottolog.original)

 [1] "language"           "iso"                "glottocode"        
 [4] "longitude"          "latitude"           "affiliation"       
 [7] "area"               "alternate names"    "affiliation-HH"    
[10] "country"            "dialects"           "language status"   
[13] "language use"       "location"           "population numeric"
[16] "typology"           "writing"

Using R functions for data manipulation you can create your own database for your purpose.

2.2 Using base functions

All functions introduced in the previous section are regular functions, so they can take the following objects as input:

a regular string

iso.lang("Adyghe")

Adyghe 
 "ady"

lang.iso("ady")

     ady 
"Adyghe"

country.lang("Adyghe")

                                                                                                                 Adyghe 
"Turkey, United States, Israel, Australia, Egypt, Macedonia, France, Russia, Netherlands, Germany, Syria, Jordan, Iraq"

lang.aff("West Caucasian")

[1] "Adyghe"    "Abkhaz"    "Abaza"     "Ubykh"     "Kabardian"

a vector of strings

area.lang(c("Adyghe", "Aduge"))

   Adyghe     Aduge 
"Eurasia"  "Africa"

lang <- c("Adyghe", "Russian")
aff.lang(lang)

                                       Adyghe 
"North Caucasian, West Caucasian, Circassian" 
                                      Russian 
                "Indo-European, Slavic, East"

other functions. For example, let’s try to get a vector of ISO codes for the Circassian languages

iso.lang(lang.aff("Circassian"))

   Adyghe Kabardian 
    "ady"     "kbd"

The behavior of most functions is rather predictable, but the function country.lang has an additional feature. By default this function takes a vector of languages and returns a vector of countries. But if you set the argument intersection = TRUE, then the function returns a vector of countries where all languages from the query are spoken.

country.lang(c("Udi", "Laz"))

                                                       Udi 
               "Russia, Georgia, Azerbaijan, Turkmenistan" 
                                                       Laz 
"Turkey, Georgia, France, United States, Germany, Belgium"

country.lang(c("Udi", "Laz"), intersection = TRUE)

[1] "Georgia"

2.3 Spell Checker: look carefully at warnings!

There are some functions that take country names as input. Unfortunately, some countries have alternative names. In order to save users the trouble of having to figure out the exact name stored in the database (for example Ivory Coast or Cote d’Ivoire), all official country names and standard abbreviations are stored in the database:

lang.country("Cape Verde")

[1] "Kabuverdianu" "Portuguese"

lang.country("Cabo Verde")

[1] "Kabuverdianu" "Portuguese"

head(lang.country("USA"))

[1] "Holikachuk"       "Hopi"             "Palewyami Yokuts"
[4] "Finnish"          "Mbum"             "Lower Sorbian"

All functions which take a vector of languages are enriched with a kind of a spell checker. If a language from a query is absent in the database, functions return a warning message containing a set of candidates with the minimal Levenshtein distance to the language from the query.

aff.lang("Adyge")

Adyge 
   NA

2.4 Changes in the glottolog database

Unfortunately, the Glottolog database (v. 2.7) is not perfect for all my tasks, so I changed it a little bit. After Robert Forkel’s issue I decided to add an argument glottolog.source, so that everybody has access to “original” and “modified” (by default) glottolog versions:

is.glottolog(c("Abkhaz", "Abkhazian"), glottolog.source = "original")

[1] FALSE  TRUE

is.glottolog(c("Abkhaz", "Abkhazian"), glottolog.source = "modified")

[1]  TRUE FALSE

Task 2.5: Celtic languages

How many Celtic languages in the database?

Task 2.6: Austronesian languages

How many Austronesian languages in the database?

Task 2.7: Russian and Standard Arabic

What is the country where, according the database, Russian and Standard Arabic are spoken?

3. Map creation

3.1 Base map

The most important part of the lingtypology package is the function map.feature:

map.feature(c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"))

3.2 Set features

The goal of this package is to allow linguists to map language features. A list of languages and correspondent features can be stored in a data.frame as follows:

df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
                 features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
df

Now we can draw a map:

map.feature(languages = df$language,
            features = df$features)

There are several types of variables in R and map.feature works differently depending on the variable type. I will use a build in data set ejective_and_n_consonants that contains 19 languages from UPSyD database. This dataset have three variables: the categorical variable ejectives indicates whether some language has any ejective sound, numeric variables consonants and vowels that contains information about the number of consonants and vowels (based on UPSyD database). We can create two maps with categorical variable and with numeric variable:

map.feature(languages = ejective_and_n_consonants$language,
            features = ejective_and_n_consonants$ejectives) # categorical

map.feature(languages = ejective_and_n_consonants$language,
            features = ejective_and_n_consonants$consonants) # numeric

There are two possible ways to show the World map: with the Atlantic sea or with the Pacific sea in the middle. If you don’t need default Pacific view use the map.orientation parameter (thanks @languageSpaceLabs and @tzakharko for that idea):

map.feature(languages = ejective_and_n_consonants$language,
            features = ejective_and_n_consonants$consonants,
            map.orientation = "Atlantic")

3.3 Set labels

An alternative way to add some short text to a map is to use the label option.

map.feature(languages = df$language,
            features = df$features,
            label = df$language)

There are some additional arguments for customization: label.fsize for setting font size, label.position for controlling the label position, and label.hide to control the appearance of the label: if TRUE, the labels are displayed on mouse over(as on the previous map), if FALSE, the labels are always displayed (as on the next map).

map.feature(languages = df$language, 
            features = df$features,
            label = df$language,
            label.fsize = 20,
            label.position = "left",
            label.hide = FALSE)

Task 3.4

Create a map with Chukchi, French, Khana and Nii and add labels that don’t disappear:

Task 3.5

Create a map of Bodish languages and add minimap with argument minimap = TRUE:

Introduction to `lingtypology`

G. Moroz
Presentation link: tinyurl.com/y9lbrzf6

1. What is `lingtypology`?

2. Package instalation

2. Glottolog functions

2.1 Command name’s syntax

2.2 Using base functions

2.3 Spell Checker: look carefully at warnings!

2.4 Changes in the glottolog database

Task 2.5: Celtic languages

Task 2.6: Austronesian languages

Task 2.7: Russian and Standard Arabic

3. Map creation

3.1 Base map

3.2 Set features

3.3 Set labels

Task 3.4

Task 3.5

Introduction to lingtypology

G. Moroz Presentation link: tinyurl.com/y9lbrzf6

1. What is lingtypology?

2. Package instalation

2. Glottolog functions

2.1 Command name’s syntax

2.2 Using base functions

2.3 Spell Checker: look carefully at warnings!

2.4 Changes in the glottolog database

Task 2.5: Celtic languages

Task 2.6: Austronesian languages

Task 2.7: Russian and Standard Arabic

3. Map creation

3.1 Base map

3.2 Set features

3.3 Set labels

Task 3.4

Task 3.5

Introduction to `lingtypology`

G. Moroz
Presentation link: tinyurl.com/y9lbrzf6

1. What is `lingtypology`?