Title: | Language Mapping and Geospatial Analysis of Linguistic and Cultural Data |
---|---|
Description: | Streamlined workflows for geolinguistic analysis, including: accessing global linguistic and cultural databases, data import, data entry, data cleaning, data exploration, mapping, visualization and export. |
Authors: | Sietze Norder, Rui Dong |
Maintainer: | Rui Dong <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.0.113 |
Built: | 2024-11-12 04:27:45 UTC |
Source: | https://github.com/glottospace/glottospace |
This function restructures glottolog data, and optionally adds/removes data. If you want more flexibility in choosing which data to add/remove, you can use glottoboosterflex().
glottobooster( glottologdata = NULL, space = TRUE, addfamname = TRUE, addisolates = TRUE, L1only = TRUE, addfamsize = TRUE, addfamsizerank = TRUE, rename = TRUE )
glottobooster( glottologdata = NULL, space = TRUE, addfamname = TRUE, addisolates = TRUE, L1only = TRUE, addfamsize = TRUE, addfamsizerank = TRUE, rename = TRUE )
glottologdata |
data from glottolog, can be downloaded with glottoget("glottolog"). |
space |
Return spatial object? |
addfamname |
Add column with familiy names? |
addisolates |
Add column to identify isolates? |
L1only |
Keep only L1 languages (remove bookkeeping, unclassifiable, sign languages, etc.). |
addfamsize |
Add column with family size? |
addfamsizerank |
Add column with family size rank? |
rename |
Rename columns "id" to "glottocode" and "iso639p3code" to "isocode" |
This function is used to generate 'glottobase' (the reference dataset used throughout the glottospace R package). The default options generate 'glottobase', which can be loaded directly using glottoget("glottobase").
glottologdata object, either a spatial object (class: sf) or a data.frame.
Other <glottobooster>:
glottoboosterflex()
glottologdata <- glottoget("glottolog") glottobase <- glottobooster(glottologdata)
glottologdata <- glottoget("glottolog") glottobase <- glottobooster(glottologdata)
This function first checks whether a dataset is glottodata or glottosubdata, and depending on the result calls glottocheck_data or glottocheck_subdata.
glottocheck(glottodata, diagnostic = TRUE, checkmeta = TRUE)
glottocheck(glottodata, diagnostic = TRUE, checkmeta = TRUE)
glottodata |
User-provided glottodata |
diagnostic |
If TRUE (default) a data viewer will be opened to show the levels of each variable (including NAs), and a data coverage plot will be shown. |
checkmeta |
Should metadata be checked as well? |
It subsequently checks whether:
one column exists with the name "glottocode"
there are rows without a glottocode (missing IDs)
there are rows with duplicated glottocodes (duplicate IDs)
all variables have at least two levels
all glottocodes are valid
Diagnostic messages highlighting potential issues with glottodata or glottosubdata.
glottodata <- glottoget("demodata") glottocheck(glottodata, diagnostic = FALSE)
glottodata <- glottoget("demodata") glottocheck(glottodata, diagnostic = FALSE)
This function cleans glottodata/glottosubdata and returns a simplified glottodata/glottosubdata object containing only the cleaned data table and a structure table.
glottoclean( glottodata, tona = NULL, tofalse = NULL, totrue = NULL, id = NULL, glottosample = FALSE, one_level_drop = TRUE )
glottoclean( glottodata, tona = NULL, tofalse = NULL, totrue = NULL, id = NULL, glottosample = FALSE, one_level_drop = TRUE )
glottodata |
glottodata (either a list or a data.frame) |
tona |
Optional additional values to recode to NA (besides default) |
tofalse |
Optional additional values to recode to FALSE (besides default) |
totrue |
Optional additional values to recode to TRUE (besides default) |
id |
By default, glottoclean looks for a column named 'glottocode', if the id is in a different column, this should be specified. |
glottosample |
Should the sample table be used to subset the data? |
one_level_drop |
A logical value to denote whether or not to drop variables with a single value, the default value is TRUE. |
This function has some built in default values that are being recoded: For example, if column type is 'symm' or 'asymm', values such as "No" and 0 are recoded to FALSE Values such as "?" are recoded to NA.
A cleaned-up and simplified version of the original glottodata object
glottodata <- glottoget("demodata", meta = TRUE) glottodata <- glottoclean(glottodata) glottosubdata <- glottoget("demosubdata", meta = TRUE) glottosubdata <- glottoclean(glottosubdata)
glottodata <- glottoget("demodata", meta = TRUE) glottodata <- glottoclean(glottodata) glottosubdata <- glottoget("demosubdata", meta = TRUE) glottosubdata <- glottoclean(glottosubdata)
Checks whether a set of glottocodes exist in glottolog (checked at the level of L1 languages)
glottocode_exists(glottocode)
glottocode_exists(glottocode)
glottocode |
A glottocode or character vector of glottocodes |
A logical vector
glottocode_exists(c("yucu1253")) glottocode_exists(c("yucu1253", "abcd1234"))
glottocode_exists(c("yucu1253")) glottocode_exists(c("yucu1253", "abcd1234"))
This function is mainly intended for 'messy' datasets that are not in glottodata/glottosubdata structure.
glottoconvert( data, var = NULL, glottocodes = NULL, table = NULL, glottocolumn = NULL, glottosubcolumn = NULL, ref = NULL, page = NULL, remark = NULL, contributor = NULL, varnamecol = NULL )
glottoconvert( data, var = NULL, glottocodes = NULL, table = NULL, glottocolumn = NULL, glottosubcolumn = NULL, ref = NULL, page = NULL, remark = NULL, contributor = NULL, varnamecol = NULL )
data |
A dataset that should be converted into glottodata/glottosubdata. This will generally be an excel file loaded with glottoget(). The dataset will be converted into glottodata if:
Otherwise, glottospace will attempt to convert the dataset into glottosubdata. This works if:
|
var |
Character string that distinguishes those columns which contain variable names. |
glottocodes |
Optional character vector of glottocodes. If no glottocodes are supplied, glottospace will search for them in the sample table. |
table |
In case dataset consists of multiple tables, indicate which table contains the data that should be converted. |
glottocolumn |
column name or column id with glottocodes (optional, provide if glottocodes are not stored in a column called 'glottocode') |
glottosubcolumn |
Column name or column id with glottosubcodes (optional, provide if glottosubcodes are not stored in a column called 'glottosubcode') |
ref |
Character string that distinguishes those columns which contain references. |
page |
Character string that distinguishes those columns which contain page numbers. |
remark |
Character string that distinguishes those columns which contain remarks. |
contributor |
Character string that distinguishes those columns which contain contributors. |
varnamecol |
In case the dataset contains a structure table, but the varnamecol is not called 'varname', its name should be specified. |
A glottodata or glottosubdata object (either a list or data.frame)
# Create a messy dataset: glottodata <- glottoget("demodata") glottodata <- cbind(glottodata, data.frame("redundant" = c(1:6))) # In this messy dataset there's no way to determine which columns contain the relevant variables... # Therefore we manually add a character string to distinguish the relevant columns: colnames(glottodata)[2:3] <- paste0("var_", colnames(glottodata)[2:3] ) glottoconverted <- glottoconvert(glottodata, var = "var_")
# Create a messy dataset: glottodata <- glottoget("demodata") glottodata <- cbind(glottodata, data.frame("redundant" = c(1:6))) # In this messy dataset there's no way to determine which columns contain the relevant variables... # Therefore we manually add a character string to distinguish the relevant columns: colnames(glottodata)[2:3] <- paste0("var_", colnames(glottodata)[2:3] ) glottoconverted <- glottoconvert(glottodata, var = "var_")
Creates glottodata/glottosubdata and optionally save it as excel file.
glottocreate( glottocodes, variables, meta = TRUE, filename = NULL, simplify = TRUE, groups = NULL, n = NULL, levels = NULL, check = FALSE, maintainer = NULL, email = NULL, citation = NULL, url = NULL )
glottocreate( glottocodes, variables, meta = TRUE, filename = NULL, simplify = TRUE, groups = NULL, n = NULL, levels = NULL, check = FALSE, maintainer = NULL, email = NULL, citation = NULL, url = NULL )
glottocodes |
Character vector of glottocodes |
variables |
Either a vector with variable names, or a single number indicating the total number of variable columns to be generated |
meta |
Should metatables be created? |
filename |
Optional name of excel file where to store glottodata |
simplify |
By default, if a glottodata table is created without metadata, the data will be returned as a data.frame (instead of placing the data inside a list of length 1) |
groups |
Character vector of group names (only for glottosubdata) |
n |
Optional, number of records to be assigned to each group (only for glottosubdata) |
levels |
Optional character vector with levels across all variables |
check |
Should glottocodes be checked? Default is FALSE because takes much time to run. |
maintainer |
Name of the person/organization maintaining the data (optional, added to readme tab) |
email |
Email address of maintainer/contact person (optional, added to readme tab) |
citation |
How to cite the data (optional, added to readme tab) |
url |
Link to a webpage (optional, added to readme tab). |
By default, glottodata will be created. In case a groups argument is provided, glottosubdata will be created.
glottodata has one table for all languages (and a number of metatables if meta = TRUE), with one row per glottocode. glottosubdata has one table for each language (and a number of metatables if meta = TRUE), with one row per glottosubcode.
Run glottoget("demodata") or glottoget("demosubdata") to see examples.
In case you already have your own dataset and want to convert it into glottodata, use: glottoconvert().
A glottodata or glottosubdata object (either with or without metadata). The output can be a list or a data.frame.
# Creates glottodata table without metadata tables glottocreate(glottocodes = c("yucu1253", "tani1257"), variables = 3, meta = FALSE) # Creates glottodata table with metadata tables (stored in a list): glottocreate(glottocodes = c("yucu1253", "tani1257"), variables = 3) # Creates glottosubdata table (stored in a list) glottocreate(glottocodes = c("yucu1253", "tani1257"), variables = 3, groups = c("a", "b") ) # Create glottodata table and add some information to the readme table: glottocreate(glottocodes = c("yucu1253", "tani1257"), variables = 3, maintainer = "Your name", email = "[email protected]")
# Creates glottodata table without metadata tables glottocreate(glottocodes = c("yucu1253", "tani1257"), variables = 3, meta = FALSE) # Creates glottodata table with metadata tables (stored in a list): glottocreate(glottocodes = c("yucu1253", "tani1257"), variables = 3) # Creates glottosubdata table (stored in a list) glottocreate(glottocodes = c("yucu1253", "tani1257"), variables = 3, groups = c("a", "b") ) # Create glottodata table and add some information to the readme table: glottocreate(glottocodes = c("yucu1253", "tani1257"), variables = 3, maintainer = "Your name", email = "[email protected]")
Add sample table to glottodata or glottosubdata
glottocreate_addsample(glottodata)
glottocreate_addsample(glottodata)
glottodata |
glottodata or glottosubdata |
glottodata/glottosubdata with a sample table
glottodata <- glottoget("demodata") glottocreate_addsample(glottodata)
glottodata <- glottoget("demodata") glottocreate_addsample(glottodata)
Add structure table to glottodata or glottosubdata
glottocreate_addstructure(glottodata)
glottocreate_addstructure(glottodata)
glottodata |
glottodata or glottosubdata |
glottodata/glottosubdata with a structure table
glottodata <- glottoget("demodata") glottocreate_addstructure(glottodata)
glottodata <- glottoget("demodata") glottocreate_addstructure(glottodata)
Calculate distances between languages
glottodist(glottodata, metric = "gower")
glottodist(glottodata, metric = "gower")
glottodata |
glottodata or glottosubdata, either with or without structure table. |
metric |
either "gower" or "anderberg" |
object of class dist
The function “glottodist” returns a “dist” object with respect to either Gower distance or Anderberg dissimilarity.
The Anderberg dissimilarity is defined as follows.
Consider a categorical dataset containing
objects
defined over a set of
categorical features where
denotes the
th feature.
The feature
take
values in the given dataset which are denoted by
. We regard 'NA' as a new value.
We also use the following notations:
: The number of times feature
takes the value
in the dataset
.
If
,
.
: The sample frequency of feature
to take the value
in the dataset
.
.
The Anderberg dissimilarity of and
is defined in the form of:
where
and
The numeber gives the weight of the
-th feature,
and the numebr
is equal to either
or
.
It is equal to
when the type of the
-th feature is asymmetric binary and both values of
and
are
,
or when either value of the
-th feature is missing,
otherwise, it is equal to
.
When
and the type of
is "ordered",
is equal to the normalized difference of
and
,
otherwise
is equal to
.
Andergerg M.R. (1973). Cluster analysis for applications. Academic Press, New York.
Boriah S., Chandola V., Kumar V. (2008). Similarity measures for categorical data: A comparative evaluation.
In: Proceedings of the 8th SIAM International Conference on Data Mining, SIAM, p. 243-254.
glottodata <- glottoget("demodata", meta = TRUE) glottodist <- glottodist(glottodata = glottodata, metric="anderberg") glottosubdata <- glottoget("demosubdata", meta = TRUE) glottodist <- glottodist(glottodata = glottosubdata)
glottodata <- glottoget("demodata", meta = TRUE) glottodist <- glottodist(glottodata = glottodata, metric="anderberg") glottosubdata <- glottoget("demosubdata", meta = TRUE) glottodist <- glottodist(glottodata = glottosubdata)
Calculate construction-based distances between languages
glottodist_subdata( glottosubdata, metric = NULL, index_type = NULL, avg_idx = NULL, fixed_idx = NULL )
glottodist_subdata( glottosubdata, metric = NULL, index_type = NULL, avg_idx = NULL, fixed_idx = NULL )
glottosubdata |
an glottosubdata object |
metric |
either "gower" or "anderberg" |
index_type |
either "mci" or "ri" or "fmi" |
avg_idx |
the feature indices over which the average of distances is computed, it must be given when index_type is either "ri" or "fmi". |
fixed_idx |
the feature indices over which the distance of two constructions is computed, it must be given when index_type is either "ri" or "fmi". |
object of class dist
The function “glottodist_subdata” returns a “dist” object,
the input is a glottosubdata object,
it computes the construction-based distance between languages,
we refer to the observations of each language as constructions.
The distance between two constructions
in a language
and
in a language
is determined by the argument “metric”,
whose value is either “gower” or “anderberg”.
When “index_type” is “mci”,
it returns the “matching constructions index”:
.
When “index_type” is “ri”,
it returns the “relative index”:
,
here
is the indices of a subset of variables given by the argument “avg_idx” and
is the indices of a subset of variables given by the argument “fixed_idx”,
the restricted constructions
and
are defined as the constructions
,
restricted to “fixed_idx”
.
When “index_type” is “fmi”,
it returns the “form-meaning index”:
,
here
, if both
and
are empty,
.
glottosubdata_cnstn <- glottoget(glottodata = "demosubdata_cnstn") glottodist_subdata(glottosubdata = glottosubdata_cnstn, metric = "gower", index_type = "mci") glottodist_subdata(glottosubdata = glottosubdata_cnstn, metric = "gower", index_type = "ri", avg_idx = 1:4, fixed_idx = 5:7) glottodist_subdata(glottosubdata = glottosubdata_cnstn, index_type = "fmi", avg_idx = 1:4, fixed_idx = 5:7)
glottosubdata_cnstn <- glottoget(glottodata = "demosubdata_cnstn") glottodist_subdata(glottosubdata = glottosubdata_cnstn, metric = "gower", index_type = "mci") glottodist_subdata(glottosubdata = glottosubdata_cnstn, metric = "gower", index_type = "ri", avg_idx = 1:4, fixed_idx = 5:7) glottodist_subdata(glottosubdata = glottosubdata_cnstn, index_type = "fmi", avg_idx = 1:4, fixed_idx = 5:7)
By default, the glottolog data will be used to filter from. But in case the user provides glottodata, this will be used.
glottofilter( glottodata = NULL, glottocode = NULL, location = NULL, name = NULL, family = NULL, family_id = NULL, continent = NULL, country = NULL, sovereignty = NULL, macroarea = NULL, expression = NULL, isocodes = NULL, colname = NULL, select = NULL, drop = NULL )
glottofilter( glottodata = NULL, glottocode = NULL, location = NULL, name = NULL, family = NULL, family_id = NULL, continent = NULL, country = NULL, sovereignty = NULL, macroarea = NULL, expression = NULL, isocodes = NULL, colname = NULL, select = NULL, drop = NULL )
glottodata |
A glottodata table |
glottocode |
A character vector of glottocodes |
location |
A character vector with a location (either a continent, country, macroarea, or sovereignty) |
name |
A character vector of language names |
family |
A character vector of language families |
family_id |
A character vector of language family IDs |
continent |
A character vector of continents |
country |
A character vector of countries |
sovereignty |
Sovereignty |
macroarea |
Glottolog macroarea |
expression |
A logical expression |
isocodes |
A character vector of iso639p3codes |
colname |
A column name |
select |
Character vector of things to select (only if colname is provided) |
drop |
Character vector of things to drop (only if colname is provided) |
A subset of the original glottodata table (data.frame or sf) containing only filtered languages.
glottofiltermap()
points <- glottofilter(location = "Australia") points <- glottofilter(glottocode = "wari1268") points <- glottofilter(family = "Indo-European") points <- glottofilter(continent = "South America") points <- glottofilter(family = "Indo-European", continent = "South America") points <- glottofilter(country = c("Colombia", "Venezuela")) points <- glottofilter(expression = family %in% c("Arawakan", "Tucanoan")) points <- glottofilter(expression = family_size > 2) points <- glottofilter(colname = "family", drop = "Indo-European")
points <- glottofilter(location = "Australia") points <- glottofilter(glottocode = "wari1268") points <- glottofilter(family = "Indo-European") points <- glottofilter(continent = "South America") points <- glottofilter(family = "Indo-European", continent = "South America") points <- glottofilter(country = c("Colombia", "Venezuela")) points <- glottofilter(expression = family %in% c("Arawakan", "Tucanoan")) points <- glottofilter(expression = family_size > 2) points <- glottofilter(colname = "family", drop = "Indo-European")
Select languages by drawing or clicking on a map. The output should be assigned to a new object. In case you want to select languages based on a (non-spatial) condition, you might want to use glottofilter() instead.
glottofiltermap(glottodata = NULL, mode = NULL, ...)
glottofiltermap(glottodata = NULL, mode = NULL, ...)
glottodata |
Spatial glottodata object |
mode |
You can choose here whether you want to interactively select languages by clicking on them (mode = 'click', default) or by drawing a shape around them (mode = 'draw'). |
... |
Additional arguments to pass to glottofilter |
A set of languages selected from the original glottodata object
## Not run: # Interactive selection by clicking on languages: selected <- glottofiltermap(continent = "South America") glottomap(selected) # Interactive selection by drawing a shape: selected <- glottofiltermap(continent = "South America", mode = "draw") glottomap(selected) ## End(Not run)
## Not run: # Interactive selection by clicking on languages: selected <- glottofiltermap(continent = "South America") glottomap(selected) # Interactive selection by drawing a shape: selected <- glottofiltermap(continent = "South America", mode = "draw") glottomap(selected) ## End(Not run)
Load locally stored glottodata, download databases from online sources, or load built-in demo data
glottoget( glottodata = NULL, meta = FALSE, download = FALSE, dirpath = NULL, url = NULL, seed = NULL )
glottoget( glottodata = NULL, meta = FALSE, download = FALSE, dirpath = NULL, url = NULL, seed = NULL )
glottodata |
options are:
|
meta |
In case 'glottodata' is demodata/demosubdata: by default, meta sheets are not loaded. Use meta=TRUE if you want to include them. |
download |
By default internally stored versions of global databases are used. Specify download = TRUE in case you want to download the latest version from a remote server. |
dirpath |
Optional, if you want to store a global CLDF dataset in a specific directory, or load it from a specific directory. |
url |
Zenodo url, something like this: "https://zenodo.org/api/records/3260727" |
seed |
the seed number when glottoget phoible dataset, if not provided, the glottoget function will randomly choose one language for each duplicated glottocode. |
A glottodata or glottosubdata object (a data.frame or list, depending on which glottodata is requested)
Other <glottodata>:
glottosave()
glottoget("glottolog")
glottoget("glottolog")
Join glottodata with other objects, datasets, or databases.
glottojoin(glottodata, with = NULL, id = NULL, na.rm = FALSE, type = "left")
glottojoin(glottodata, with = NULL, id = NULL, na.rm = FALSE, type = "left")
glottodata |
glottodata or glottosubdata |
with |
Optional: glottodata (class data.frame), a dist object (class dist), or the name of a glottodatabase ("glottobase" or "glottospace") |
id |
By default, data is joined by a column named "glottocode" or "glottosubcode". In case you want to join using another column, the column name should be specified. |
na.rm |
Only used when joining with a dist object. By default NAs are kept. |
type |
In case two glottodata objects are joined, you can specify the type of join: "left" (default), "right", "full", or "inner" |
glottodata or glottosubdata, either with or without metatables. Object is returned as a data.frame or list, depending on the input.
glottosplit
glottodata <- glottoget("demodata") glottodata_space <- glottojoin(glottodata, with = "glottospace") glottodata_base <- glottojoin(glottodata, with = "glottobase") # Join with a dist object glottodata <- glottoget("demodata", meta = TRUE) dist <- glottodist(glottodata) glottodata_dist <- glottojoin(glottodata, with = dist) # Join glottosubdata tables: glottosubdata <- glottocreate(glottocodes = c("yucu1253", "tani1257"), variables = 3, groups = c("a", "b"), n = 2, meta = FALSE) glottodatatable <- glottojoin(glottodata = glottosubdata)
glottodata <- glottoget("demodata") glottodata_space <- glottojoin(glottodata, with = "glottospace") glottodata_base <- glottojoin(glottodata, with = "glottobase") # Join with a dist object glottodata <- glottoget("demodata", meta = TRUE) dist <- glottodist(glottodata) glottodata_dist <- glottojoin(glottodata, with = dist) # Join glottosubdata tables: glottosubdata <- glottocreate(glottocodes = c("yucu1253", "tani1257"), variables = 3, groups = c("a", "b"), n = 2, meta = FALSE) glottodatatable <- glottojoin(glottodata = glottosubdata)
With this function you can easily create static and dynamic maps from glottodata (by setting type to 'static' or 'dynamic'). Alternatively, by specifying type = "filter", you can interactively select languages by drawing a shape around them (mode = "draw"; default) or by clicking on them (mode = "click"). See ?glottofiltermap for more details.
glottomap( glottodata = NULL, color = NULL, label = NULL, type = NULL, ptsize = NULL, alpha = NULL, lbsize = NULL, palette = NA, rivers = FALSE, nclass = NULL, filename = NULL, projection = NULL, glotto_title = NULL, mode = NULL, basemap = "country", ... )
glottomap( glottodata = NULL, color = NULL, label = NULL, type = NULL, ptsize = NULL, alpha = NULL, lbsize = NULL, palette = NA, rivers = FALSE, nclass = NULL, filename = NULL, projection = NULL, glotto_title = NULL, mode = NULL, basemap = "country", ... )
glottodata |
Optional, user-provided glottodata. In case no glottodata is provided, you can pass arguments directly to glottofilter. |
color |
glottovar, column name, or column index to be used to color features (optional). See 'Details' below. |
label |
glottovar, column name, or column index to be used to label features (optional). See 'Details' below. |
type |
One of: "static", "dynamic", or "filter". Default is "static". |
ptsize |
Size of points between 0 and 1 |
alpha |
Transparency of points between 0 (very transparent) and 1 (not transparent) |
lbsize |
Size of labels between 0 and 1 |
palette |
Color palette, see glottocolpal("all") for possible options, and run glottocolpal("turbo") to see what it looks like (replace it with palette name). Alternatively, you could also run tmaptools::palette_explorer(), RColorBrewer::display.brewer.all(), ?viridisLite::viridis, or scales::show_col(viridisLite::viridis(n=20)) |
rivers |
Do you want to plot rivers? |
nclass |
Preferred number of classes (default is 5) |
filename |
Optional filename if you want to save resulting map |
projection |
For static maps, you can choose one of the following: 'eqarea' (equal-area Eckert IV, default), 'pacific' (Pacific-centered), or any other Coordinate Reference System, specified using an EPSG code (https://epsg.io/), for example: "ESRI:54009". |
glotto_title |
Optional, the title of legend, the default value is the name of the argument color. |
mode |
In case type = "filter", you can choose here whether you want to interactively select languages by clicking on them (mode = 'click', default) or by drawing a shape around them (mode = 'draw'). |
basemap |
The default basemap is "country", which gives the borders of countries. Alternatively, the basemap can be set to be "hydro-basin", this gives global hydro-basins (Level 03). |
... |
Additional parameters to glottofilter |
If no glottodata object is provided, then you have the following options for the 'color' and 'label' arguments: ', 'glottocode', 'name', 'macroarea', 'isocode', 'countries', 'family_id', 'classification', 'parent_id', 'family', 'isolate', 'family_size', 'family_size_rank', 'country', 'sovereignty', 'type', 'geounit', 'continent', 'adm0_a3', '
a map created from a glotto(sub)data object and can be saved with glottosave()
## Not run: glottomap(country = "Netherlands") glottopoints <- glottofilter(continent = "South America") glottopols <- glottospace(glottopoints, method = "voronoi") glottomap(glottodata = glottopols, color = "family_size_rank") glottomap(glottodata = glottopols, color = "family", palette = "turbo", type = "dynamic", label = "name") glottodata <- glottoget() families <- dplyr::count(glottodata, family, sort = TRUE) # highlight 10 largest families: glottodata <- glottospotlight(glottodata = glottodata, spotcol = "family", spotlight = families$family[1:10], spotcontrast = "family") # Or, place 10 largest families in background glottodata <- glottospotlight(glottodata = glottodata, spotcol = "family", spotlight = families$family[-c(1:10)], spotcontrast = "family") glottomap(glottodata, color = "legend") # Interactive selection by clicking on languages: selected <- glottomap(continent = "South America", type = "filter") glottomap(selected) # Interactive selection by drawing a shape: selected <- glottomap(continent = "South America", type = "filter", mode = "draw") glottomap(selected) ## End(Not run)
## Not run: glottomap(country = "Netherlands") glottopoints <- glottofilter(continent = "South America") glottopols <- glottospace(glottopoints, method = "voronoi") glottomap(glottodata = glottopols, color = "family_size_rank") glottomap(glottodata = glottopols, color = "family", palette = "turbo", type = "dynamic", label = "name") glottodata <- glottoget() families <- dplyr::count(glottodata, family, sort = TRUE) # highlight 10 largest families: glottodata <- glottospotlight(glottodata = glottodata, spotcol = "family", spotlight = families$family[1:10], spotcontrast = "family") # Or, place 10 largest families in background glottodata <- glottospotlight(glottodata = glottodata, spotcol = "family", spotlight = families$family[-c(1:10)], spotcontrast = "family") glottomap(glottodata, color = "legend") # Interactive selection by clicking on languages: selected <- glottomap(continent = "South America", type = "filter") glottomap(selected) # Interactive selection by drawing a shape: selected <- glottomap(continent = "South America", type = "filter", mode = "draw") glottomap(selected) ## End(Not run)
Title
glottomap_persist_diagram(glottodata, maxscale)
glottomap_persist_diagram(glottodata, maxscale)
glottodata |
a glottodata is an object of sf with geometry type as 'POINT' |
maxscale |
a numeric number, maximum value of the rips filtration, the default unit is "100km" |
a ggplot2 map
glottopoints <- glottofilter(continent = "South America") awk <- glottopoints[glottopoints$family == "Arawakan", ] glottomap_persist_diagram(awk, maxscale = 15)
glottopoints <- glottofilter(continent = "South America") awk <- glottopoints[glottopoints$family == "Arawakan", ] glottomap_persist_diagram(awk, maxscale = 15)
Title
glottomap_rips_filt( glottodata, r = 0, maxscale, is_animate = FALSE, length.out = 20, movie.name = "filtration.gif" )
glottomap_rips_filt( glottodata, r = 0, maxscale, is_animate = FALSE, length.out = 20, movie.name = "filtration.gif" )
glottodata |
a glottodata is an object of sf with geometry type as 'POINT' |
r |
a numerica number, the radius of buffers of all the points in glottodata, the default unit is "100km" |
maxscale |
a numeric number, maximum value of the rips filtration, the default unit is "100km" |
is_animate |
if TRUE, it will generate a GIF file, if FALSE, it will generate a tmap plot, the default value is FALSE |
length.out |
the amount of images to be generated in GIF file when 'is_animate = TRUE', the default value is '20' |
movie.name |
name of the GIF file, the default value is "filtration.gif" |
if 'is_animate = FALSE' return a tmap, if 'is_animate = TRUE' return a GIF file
glottopoints <- glottofilter(continent = "South America") awk <- glottopoints[glottopoints$family == "Arawakan", ] glottomap_rips_filt(glottodata = awk, r = 6, maxscale = 8) ## Not run: glottomap_rips_filt(glottodata = awk, r = 6, maxscale = 8, is_animate=TRUE) ## End(Not run)
glottopoints <- glottofilter(continent = "South America") awk <- glottopoints[glottopoints$family == "Arawakan", ] glottomap_rips_filt(glottodata = awk, r = 6, maxscale = 8) ## Not run: glottomap_rips_filt(glottodata = awk, r = 6, maxscale = 8, is_animate=TRUE) ## End(Not run)
Match a vector of language names to glottocodes and names
glottomatch(namevec, glottodata = NULL, tolerance = NULL)
glottomatch(namevec, glottodata = NULL, tolerance = NULL)
namevec |
Vector of language names |
glottodata |
Optional, where to search for matches. If kept empty, the entire glottolog database will be searched, you could also search within a specific area |
tolerance |
Optional, search tolerance. |
a data.frame with exact or closest matches, and their glottocodes.
glottodata <- glottofilter(continent = "South America") # Finds a single match glottomatch(name = "yucuni", glottodata = glottodata) # Finds multiple matches glottomatch(name = "quechui", glottodata = glottodata)
glottodata <- glottofilter(continent = "South America") # Finds a single match glottomatch(name = "yucuni", glottodata = glottodata) # Finds multiple matches glottomatch(name = "quechui", glottodata = glottodata)
This is a wrapper around the monoMDS function in the vegan package.
glottonmds(glottodist = NULL, k = NULL, na.rm = FALSE, row2id = NULL)
glottonmds(glottodist = NULL, k = NULL, na.rm = FALSE, row2id = NULL)
glottodist |
A glottodist object |
k |
Number of dimensions. Either 2 or 3 for nmds. |
na.rm |
Whether na's should be removed (default is FALSE) |
row2id |
In case of nmds, specify what each row contains (either 'glottocode' or 'glottosubcode') |
a glottonmds object which can be plotted using glottoplot(glottonmds = ). See ?monoMDS for more details.
glottodata <- glottoget("demodata", meta = TRUE) glottodist <- glottodist(glottodata = glottodata) glottonmds <- glottonmds(glottodist, k = 2, row2id = "glottocode") glottoplot(glottonmds = glottonmds)
glottodata <- glottoget("demodata", meta = TRUE) glottodist <- glottodist(glottodata = glottodata) glottonmds <- glottonmds(glottodist, k = 2, row2id = "glottocode") glottoplot(glottonmds = glottonmds)
This function offers different types of visualizations for linguistic data and linguistic distances.
glottoplot( glottodata = NULL, glottodist = NULL, type = NULL, glottonmds = NULL, color = NULL, ptsize = NULL, label = NULL, filename = NULL, palette = NULL, k = NULL, na.rm = FALSE, row2id = NULL, preventoverlap = FALSE, alpha = NULL, colorvec = NULL, expand = NULL, lbsize = NULL, ptshift = NULL, lbshift = NULL )
glottoplot( glottodata = NULL, glottodist = NULL, type = NULL, glottonmds = NULL, color = NULL, ptsize = NULL, label = NULL, filename = NULL, palette = NULL, k = NULL, na.rm = FALSE, row2id = NULL, preventoverlap = FALSE, alpha = NULL, colorvec = NULL, expand = NULL, lbsize = NULL, ptshift = NULL, lbshift = NULL )
glottodata |
glottodata table |
glottodist |
A dist object created with |
type |
The type of plot: "heatmap", "nmds", or "missing". Default is heatmap if nothing is provided. |
glottonmds |
A glottonmds object created with |
color |
Name of variable to be used to color features (optional). See 'Details' below. |
ptsize |
Size of points between 0 and 1 (optional) |
label |
Name of variable to be used to label features (optional). See 'Details' below. |
filename |
Optional filename if output should be saved. |
palette |
Name of color palette, use glottocolpal("all") to see the options |
k |
Number of dimensions. Either 2 or 3 for nmds. |
na.rm |
Whether na's should be removed (default is FALSE) |
row2id |
In case of nmds, specify what each row contains (either 'glottocode' or 'glottosubcode') |
preventoverlap |
For nmds with 2 dimensions, should overlap between data points be prevented? |
alpha |
For nmds with 2 dimensions: Transparency of points between 0 (very transparent) and 1 (not transparent) |
colorvec |
Vector specifying colors for individual values and legend order (non-matching values are omitted), for example: c("Arawakan" = "rosybrown1", "Yucuna" = "red", "Tucanoan" = "lightskyblue1", "Tanimuca-Retuarã" = "blue", "Naduhup" = "gray70", "Kakua-Nukak" = "gray30") |
expand |
Optionally expand one or all of the axes. Default is c(0,0,0,0), referring to respectively xmin, xmax, ymin, ymax. If you want to change the maximum of the x-axis, you would do: c(0,1,0,0). |
lbsize |
Label size (optional) |
ptshift |
(optional) If preventoverlap is TRUE, how much should points be shifted? |
lbshift |
(optional) If preventoverlap is TRUE, how much should labels be shifted? See the 'values' argument in ggplot2::scale_color_manual() for details. |
If no glottodata object is provided, then you have the following options for the 'color' and 'label' arguments: ', 'glottocode', 'name', 'macroarea', 'isocode', 'countries', 'family_id', 'classification', 'parent_id', 'family', 'isolate', 'family_size', 'family_size_rank', 'country', 'sovereignty', 'type', 'geounit', 'continent', 'adm0_a3', '
a visualization of a glotto(sub)data, glottodist or glottonmds object, which can be saved with glottosave()
# Plot glottodist as nmds: glottodata <- glottoget("demodata", meta = TRUE) glottodist <- glottodist(glottodata = glottodata) # glottoplot(glottodist = glottodist, type = "nmds", # k = 2, color = "family", label = "name", row2id = "glottocode") # To create a stress/scree plot, you can run: # goeveg::dimcheckMDS(matrix = as.matrix(glottodist), k = k) # Plot missing data: glottodata <- glottoget("demodata", meta = TRUE) glottodata <- glottosimplify(glottodata) glottoplot(glottodata = glottodata, type = "missing")
# Plot glottodist as nmds: glottodata <- glottoget("demodata", meta = TRUE) glottodist <- glottodist(glottodata = glottodata) # glottoplot(glottodist = glottodist, type = "nmds", # k = 2, color = "family", label = "name", row2id = "glottocode") # To create a stress/scree plot, you can run: # goeveg::dimcheckMDS(matrix = as.matrix(glottodist), k = k) # Plot missing data: glottodata <- glottoget("demodata", meta = TRUE) glottodata <- glottosimplify(glottodata) glottoplot(glottodata = glottodata, type = "missing")
Recode character columns to TRUE/FALSE
glottorecode_logical(glottodata, structure, totrue = NULL, tofalse = NULL)
glottorecode_logical(glottodata, structure, totrue = NULL, tofalse = NULL)
glottodata |
glottodata list |
structure |
structure table |
totrue |
values to recode to TRUE |
tofalse |
values to recode to FALSE |
glottodata <- glottoget("demodata", meta = TRUE) glottorecode_logical(glottodata, totrue = c("y", "Y", 1), tofalse = c("n", "N", 0), structure = glottodata[["structure"]]) glottosubdata <- glottoget("demosubdata", meta = TRUE) glottorecode_logical(glottosubdata, totrue = c("y", "Y", 1), tofalse = c("n", "N", 0), structure = glottosubdata[["structure"]])
glottodata <- glottoget("demodata", meta = TRUE) glottorecode_logical(glottodata, totrue = c("y", "Y", 1), tofalse = c("n", "N", 0), structure = glottodata[["structure"]]) glottosubdata <- glottoget("demosubdata", meta = TRUE) glottorecode_logical(glottosubdata, totrue = c("y", "Y", 1), tofalse = c("n", "N", 0), structure = glottosubdata[["structure"]])
Recode missing values to NA
glottorecode_missing(glottodata, tona)
glottorecode_missing(glottodata, tona)
glottodata |
glottodata |
tona |
Optional, additional values to recode to NA |
glottodata <- glottoget("demodata", meta = TRUE) glottorecode_missing(glottodata, tona = "?") glottosubdata <- glottoget("demosubdata", meta = TRUE) glottorecode_missing(glottosubdata, tona = "?")
glottodata <- glottoget("demodata", meta = TRUE) glottorecode_missing(glottodata, tona = "?") glottosubdata <- glottoget("demosubdata", meta = TRUE) glottorecode_missing(glottosubdata, tona = "?")
If no filename is provided, the name of the glottodata object will be used.
glottosave(glottodata, filename = NULL)
glottosave(glottodata, filename = NULL)
glottodata |
User-provided glottodata |
filename |
Filename either with or without file extension |
If no file extension is provided, a sensible default file extension is chosen. Dynamic maps (tmap) are saved in .html format, static maps (tmap) are saved as .png. Spatial data (sf) are saved as geopackage (.GPKG) by default, but .shp is also possible.
No object is returned, it will be save locally at the specified location
glottoget_glottodata
Other <glottodata>:
glottoget()
glottodata <- glottoget("demodata", meta = FALSE) # Saves as .xlsx glottosave(glottodata, filename = file.path(tempdir(), "glottodata") ) glottospacedata <- glottospace(glottodata) # Saves as .GPKG glottosave(glottospacedata, filename = file.path(tempdir(), "glottodata") ) glottomap <- glottomap(glottodata) # Saves as .png glottosave(glottomap, filename = file.path(tempdir(), "glottomap") ) # Saves as .html glottomap <- glottomap(glottodata, type = "dynamic", filename = file.path(tempdir(), "glottomap") )
glottodata <- glottoget("demodata", meta = FALSE) # Saves as .xlsx glottosave(glottodata, filename = file.path(tempdir(), "glottodata") ) glottospacedata <- glottospace(glottodata) # Saves as .GPKG glottosave(glottospacedata, filename = file.path(tempdir(), "glottodata") ) glottomap <- glottomap(glottodata) # Saves as .png glottosave(glottomap, filename = file.path(tempdir(), "glottomap") ) # Saves as .html glottomap <- glottomap(glottodata, type = "dynamic", filename = file.path(tempdir(), "glottomap") )
Search within glottodata for languages, glottocodes, etc.
glottosearch( search, glottodata = NULL, partialmatch = TRUE, columns = NULL, tolerance = NULL )
glottosearch( search, glottodata = NULL, partialmatch = TRUE, columns = NULL, tolerance = NULL )
search |
Character string to search for, this can be the name of a language, a family, a glottocode, isocode. |
glottodata |
Any linguistic or cultural dataset. Default is to search within glottobase. |
partialmatch |
By default, partial matches will be returned as well. In case you only want exact matches, this argument should be set to FALSE. |
columns |
By default, the entire dataset is searched, but optionally the search can be limited to specific columns. |
tolerance |
In case partialmatch is TRUE: what is the maximum difference between search term and match? Default is 0.1 |
A subset of glottodata that matches search conditions (object returned as a data.frame/tibble)
glottosearch(search = "Yucuni") glottosearch(search = "Yucuni", columns = "name") glottosearch(search = "Yucuni", columns = c("name", "family"))
glottosearch(search = "Yucuni") glottosearch(search = "Yucuni", columns = "name") glottosearch(search = "Yucuni", columns = c("name", "family"))
With glottosimplify, the structure of a glottodata object is simplified by removing tables and properties
glottosimplify( glottodata, droplist = TRUE, dropmeta = TRUE, dropspatial = TRUE, submerge = TRUE, dropunits = FALSE )
glottosimplify( glottodata, droplist = TRUE, dropmeta = TRUE, dropspatial = TRUE, submerge = TRUE, dropunits = FALSE )
glottodata |
glottodata or glottosubdata. |
droplist |
By default, if only one sheet is loaded, the data will be returned as a data.frame (instead of placing the data inside a list of length 1) |
dropmeta |
By default all metadata is removed. |
dropspatial |
By default spatial properties are removed. |
submerge |
By default, glottosubdata tables are merged into a single glottodata table. |
dropunits |
By default units are kept. |
a simplified version of the original dataset, either a data.frame/tibble or a list (depending on the selected options)
glottodata <- glottoget("demodata", meta = TRUE) glottosimplify(glottodata)
glottodata <- glottoget("demodata", meta = TRUE) glottosimplify(glottodata)
This function takes glottodata (either with or without metadata) and turns it into spatial points or polygons.
glottospace(glottodata, method = NULL, radius = NULL)
glottospace(glottodata, method = NULL, radius = NULL)
glottodata |
A glottodata table, or list of a glottodata table and metadata table(s) |
method |
Interpolation method, either "buffer" or "voronoi" (synonymous with "thiessen") |
radius |
In case interpolation method "buffer", the radius in km around the points. If method "thiessen", a buffer will be created into the ocean, particularly relevant for island languages. |
A spatial version of glottodata. In case glottodata has metadata, only glottodata will be converted to spatial (but all metadata tables are kept). Object returned as sf object, or a list of which the first element is an sf object, depending on the input.
glottodata <- glottoget("demodata", meta = TRUE) glottopols <- glottospace(glottodata, method = "voronoi") glottodata <- glottofilter(country = "Netherlands") glottopols <- glottospace(glottodata, method = "buffer", radius = 20) glottomap(glottopols) glottodata <- glottofilter(continent = "South America") glottopols <- glottospace(glottodata, method = "thiessen") glottomap(glottopols) glottodata <- glottofilter(country = "Philippines") glottopols <- glottospace(glottodata, radius = 100, method = "thiessen") glottomap(glottopols)
glottodata <- glottoget("demodata", meta = TRUE) glottopols <- glottospace(glottodata, method = "voronoi") glottodata <- glottofilter(country = "Netherlands") glottopols <- glottospace(glottodata, method = "buffer", radius = 20) glottomap(glottopols) glottodata <- glottofilter(continent = "South America") glottopols <- glottospace(glottodata, method = "thiessen") glottomap(glottopols) glottodata <- glottofilter(country = "Philippines") glottopols <- glottospace(glottodata, radius = 100, method = "thiessen") glottomap(glottopols)
Usually, you will run this function twice, once to split metadata from glottodata, and a second time to join it again.
glottosplitmergemeta(glottodata, splitted = NULL)
glottosplitmergemeta(glottodata, splitted = NULL)
glottodata |
glottodata |
splitted |
if provided, the second element of the list will be joined with glottodata |
A list of length 2 in case only glottodata is provided, and a merged glottodata object otherwise.
glottojoin
glottosimplify
glottodata <- glottoget("demodata", meta = TRUE) splitted <- glottosplitmergemeta(glottodata) merged <- glottosplitmergemeta(glottodata = glottodata, splitted = splitted)
glottodata <- glottoget("demodata", meta = TRUE) splitted <- glottosplitmergemeta(glottodata) merged <- glottosplitmergemeta(glottodata = glottodata, splitted = splitted)
This function creates two separate color scales: one for points to highlight, and a second for the remaining background points. It also creates a legend. This is useful for preparing the data for visualizations such as maps or other plots.
glottospotlight(glottodata, spotcol, spotlight, spotcontrast = NULL)
glottospotlight(glottodata, spotcol, spotlight, spotcontrast = NULL)
glottodata |
User-provided glottodata |
spotcol |
Name of the column that contains the data to put in the spotlights (as well as remaining background data). |
spotlight |
Selection of data to put in the spotlights. |
spotcontrast |
Optional column to contrast between data points in the spotlight. |
A glottodata object with columns added to be used in visualization.
glottodata <- glottofilter(country = c("Netherlands", "Germany", "Belgium") ) glottodata <- glottospotlight(glottodata = glottodata, spotcol = "country", spotlight = "Netherlands") glottomap(glottodata, color = "legend")
glottodata <- glottofilter(country = c("Netherlands", "Germany", "Belgium") ) glottodata <- glottospotlight(glottodata = glottodata, spotcol = "country", spotlight = "Netherlands") glottomap(glottodata, color = "legend")
This function takes a dist object and performs a
Permutational Multivariate Analysis of Variance (PERMANOVA). It can be used
to test whether two or more groups are significantly different from each
other (by specifying the comparison
argument with either 'overall' or
'pairwise').
glottostat_dist_permanova( glottodist = NULL, glottodata = NULL, comparison = NULL, sample = NULL, permutations = NULL, by = NULL )
glottostat_dist_permanova( glottodist = NULL, glottodata = NULL, comparison = NULL, sample = NULL, permutations = NULL, by = NULL )
glottodist |
a dist object |
glottodata |
glottodata contains sample |
comparison |
Either "overall" or "pairwise" |
sample |
sample table (optional). By default, searches for sample table in glottodata/glottosubdata. |
permutations |
Number of permutations (default is 999) |
by |
the column name of "sample", over which to compute the permanova. |
The argument by
is the name of a column in the sample table,
which can be either provided by a "sample" sheet in glottodata
or given by the argument sample
.
The default value of by
is "group".
The function uses by
to do the comparisons.
The function calls vegan::adonis2()
, type ?adonis2
for more details.
glottodata <- glottoget("demodata", meta = TRUE) glottodist <- glottodist(glottodata, metric = "gower") glottostat_dist_permanova(glottodist = glottodist, glottodata = glottodata, comparison = "pairwise")
glottodata <- glottoget("demodata", meta = TRUE) glottodist <- glottodist(glottodata, metric = "gower") glottostat_dist_permanova(glottodist = glottodist, glottodata = glottodata, comparison = "pairwise")
A temporary version of glottostat_dist_permanova
glottostat_dist_permanova_mci( glottodist = NULL, glottodata = NULL, comparison = NULL, sample = NULL, permutations = NULL, by = NULL )
glottostat_dist_permanova_mci( glottodist = NULL, glottodata = NULL, comparison = NULL, sample = NULL, permutations = NULL, by = NULL )
glottodist |
a dist object |
glottodata |
a glottodata |
comparison |
comparision |
sample |
sample |
permutations |
permutations |
by |
by |
This function takes a glottodata or glottosubdata object and performs a Permutational Multivariate Analysis of Variance (PERMANOVA). It can be used to test whether two or more groups are significantly different from each other (by specifying the 'comparison' argument with either 'overall' or 'pairwise'). The function uses the 'group' column in the sample table to do the comparisons. Before running the analysis, a distance matrix is constructed from the glotto(sub)data object using glottodist(). The function calls vegan::adonis2(), type ?adonis2 for more details.
glottostat_permanova( glottodata, comparison = NULL, sample = NULL, permutations = NULL, metric = "gower" )
glottostat_permanova( glottodata, comparison = NULL, sample = NULL, permutations = NULL, metric = "gower" )
glottodata |
glottodata or glottosubdata |
comparison |
Either "overall" or "pairwise" |
sample |
sample table (optional). By default, searches for sample table in glottodata/glottosubdata. |
permutations |
Number of permutations (default is 999) |
metric |
Either "gower" or "anderberg" |
glottodata <- glottoget("demodata", meta = TRUE) glottostat_permanova(glottodata, comparison = "pairwise") # Use subgroup (or another column in the structure table) as group glottodata[["sample"]][,"group"] <- NULL # delete old 'group' column glottodata[["sample"]][,"group"] <- glottodata[["sample"]][,"subgroup"] glottostat_permanova(glottodata, comparison = "pairwise") glottosubdata <- glottoget("demosubdata", meta = TRUE) glottostat_permanova(glottodata = glottosubdata, comparison = "pairwise")
glottodata <- glottoget("demodata", meta = TRUE) glottostat_permanova(glottodata, comparison = "pairwise") # Use subgroup (or another column in the structure table) as group glottodata[["sample"]][,"group"] <- NULL # delete old 'group' column glottodata[["sample"]][,"group"] <- glottodata[["sample"]][,"subgroup"] glottostat_permanova(glottodata, comparison = "pairwise") glottosubdata <- glottoget("demosubdata", meta = TRUE) glottostat_permanova(glottodata = glottosubdata, comparison = "pairwise")
Title
phoible_param_sf(phoible_data)
phoible_param_sf(phoible_data)
phoible_data |
A non-spatial phoible dataset |
an sf object
phoible_sf <- phoible_param_sf(glottospace::phoible_raw)
phoible_sf <- phoible_param_sf(glottospace::phoible_raw)