The list of abbreviations is an obligatory part of linguistic articles that nobody reads. These lists contain definitions of abbreviations used in the article (e.g. the names of corpora or sign languages), but also a list of linguistic glosses — abbreviations used in interlinear glossed examples. There is a document proposing standardized glossing rules (Comrie, Haspelmath, and Bickel 2008), which ends with a list of 84 standard abbreviations. A much bigger list of standard abbreviations is present on Wikipedia. However, researchers can deviate from the proposed abbreviations and use their own instead.
The following list of abbreviations, which I came across in a published article, makes it clear that there is room for improvement in compiling such lists:
NOM = nominative, GEN = nominative, DAT = nominative, ACC = accusative, VOC = accusative, LOC = accusative, INS = accusative, PL = plural, SG = singular
Besides the obvious errors, this list contains more problems that I would like to point out:
The main goal of the lingglosses
R package is to provide
an option for creating:
.html
output of rmarkdown
(Xie, Allaire,
and Grolemund 2018)1;You can install the stable version of the package from CRAN:
install.packages("lingglosses")
You can also install the development version of
lingglosses
from GitHub with:
# install.packages("remotes")
remotes::install_github("agricolamz/lingglosses")
In order to use the package you need to load it with the
library()
call:
library(lingglosses)
You can go through the examples in this tutorial or you can create a lingglosses example from the rmarkdown template (File > New File > R Markdown… > From Template > lingglosses Document).
gloss_example()
The main function of the lingglosses
package is
gloss_example()
. This package has the following
arguments:
transliteration
;glosses
;free_translation
;comment
;grammaticality
;annotation
2;line_length
.HL | H | L | H |
eze | a | za | a |
np | prfx | root | sfx |
'Eze swept... (Igbo, from [@goldsmith79: 209])' |
All arguments except the last one are self-explanatory.
gloss_example(transliteration = "bur-e-**ri** c'in-ne-sːu-w",
glosses = "fly-NPST-**INF** know-HAB-NEG-M",
free_translation = "I cannot fly. (Zilo Andi, East Caucasian)",
comment = "(lit. do not know how to)",
annotation = "Бурери цIиннессу.",
grammaticality = "*")
Бурери | цIиннессу. | |
* | bur-e-ri | c’in-ne-sːu-w |
fly-npst-inf | know-hab-neg-m | |
'I cannot fly. (Zilo Andi, East Caucasian)' | ||
(lit. do not know how to) |
In this first example you can see that:
italic_transliteration = FALSE
)3;**a**
for
bold);Since the function arguments’ names are optional in R, users can omit
them as long as they follow the order of the arguments (you can always
find the correct order in ?gloss_example
):
gloss_example("bur-e-**ri** c'in-ne-sːu-w",
"fly-NPST-**INF** know-HAB-NEG-M",
"I cannot fly. (Zilo Andi, East Caucasian)",
"(lit. do not know how to)",
"Бурери цIиннессу.",
"*")
Бурери | цIиннессу. | |
* | bur-e-ri | c’in-ne-sːu-w |
fly-npst-inf | know-hab-neg-m | |
'I cannot fly. (Zilo Andi, East Caucasian)' | ||
(lit. do not know how to) |
It is possible to number and call your examples using the standard
rmarkdown
tool for generating lists (@)
:
(@) my first example
(@) my second example
(@) my third example
renders as:
In order to reference examples in the text you need to give them names:
(@my_ex) example for referencing
With the names settled you can reference the example (4) in the text
using the following code (@my_ex)
.
So this kind of example referencing can be used with
lingglosses
examples like in (5) and (6). The only
important details are:
echo = FALSE
(or
specify it for all code chunks with the following comand in the begining
of the document knitr::opts_chunk$set(echo = FALSE")
);(@...)
) and the code chunk with lingglosses
code.bur-e-ri | c’in-ne-sːu |
fly-npst-inf | know-hab-neg |
'I cannot fly. (Zilo Andi, East Caucasian)' | |
(lit. do not know how to) |
bur-e-ri | c’in-ne-sːu |
fly-npst-inf | know-hab-neg |
'I cannot fly.' | |
(lit. do not know how to) |
Sometimes people gloss morpheme by morpheme (this is especially
useful for polysynthetic languages). It is also possible in
lingglosses
. You can annotate slots with the
annotation
argument, see footnote 2 for the details.
gloss_example("s- z- á- la- nəq'wa -wa -dzə -j -ɕa -t'",
"1SG.ABS POT 3SG.N.IO LOC pass IPF LOC 3SG.M.IO seem(AOR) DCL",
"It seemed to him that I would be able to pass there.")
s- | z- | á- | la- | nəq’wa | -wa | -dzə | -j | -ɕa | -t’ |
1sg.abs | pot | 3sg.n.io | loc | pass | ipf | loc | 3sg.m.io | seem(aor) | dcl |
'It seemed to him that I would be able to pass there.' |
The glossing extraction algorithm implemented in
lingglosses
is case sensitive, so if you want to escape it
you can use curly brackets:
gloss_example("den=no he.ʃː-qi hartʃ'on-k'o w-uʁi w-uk'o.",
"{I}=ADD DEM.M-INS watch-CVB M-stand.AOR M-be.AOR",
"And I stood there, watching him.")
den=no | he.ʃː-qi | hartʃ’on-k’o | w-uʁi | w-uk’o. |
I=add | dem.m-ins | watch-cvb | m-stand.aor | m-be.aor |
'And I stood there, watching him.' |
In the example above {I}
is just the English word
I that will be escaped and will not appear in the gloss list as
marker of class I.
It make sense to avoid to use single quotes for the quotation, since it can cause some troubles for the package’s functions and use escape slash for quotations, like in the following example:
gloss_example("\"a-pi ɲɯ-ɕpaʁ-a\" ti ɲɯ-ŋu",
"1SG.POSS-elder.sibling SENS-be.thirsty-1SG say:FACT SENS-be",
"She said: \"Sister, I am thirsty.\"")
“a-pi | ɲɯ-ɕpaʁ-a” | ti | ɲɯ-ŋu |
1sg.poss-elder.sibling | sens-be.thirsty-1sg | say:fact | sens-be |
'She said: "Sister, I am thirsty."' |
After a while I was asked to make it possible to add sole line examples:
gloss_example("Learn to value yourself, which means: to fight for your happiness. (Ayn Rand)",
line_length = 100)
Learn | to | value | yourself, | which | means: | to | fight | for | your | happiness. | (Ayn | Rand) |
Sometimes examples are too long and do not fit onto the page. In that
case you need to add the argument results='asis'
to your
chunk. gloss_example()
will then automatically split your
example into multiple rows.
gloss_example('za-s jaːluʁ **wo-b** **qa-b-ɨ**; turs-ubɨ qal-es-di ǯiqj-eː jaːluʁ-**o-b** **qa-b-ɨ**',
'1SG.OBL-DAT shawl.3 AUX-3 PRF-3-bring.PFV woolen_sock-PL NPL.bring-PL-A.OBL place-IN shawl.3-AUX-3 PRF-3-bring.PFV',
'(they) **brought** me a shawl; instead of (lit. in place of bringing) woolen socks, (they) **brought** a shawl.',
'(Woolen socks are considered to be more valuable than a shawl.)')
za-s | jaːluʁ | wo-b | qa-b-ɨ; | turs-ubɨ | qal-es-di |
1sg.obl-dat | shawl.3 | aux-3 | prf-3-bring.pfv | woolen_sock-pl | npl.bring-pl-a.obl |
ǯiqj-eː | jaːluʁ-o-b | qa-b-ɨ |
place-in | shawl.3-aux-3 | prf-3-bring.pfv |
‘(they) brought me a shawl; instead of (lit. in place of bringing) woolen socks, (they) brought a shawl.’ | ||
(Woolen socks are considered to be more valuable than a shawl.) |
If you are not satisfied with the result of the automatic split you
can change the value of the line_length
argument (the
default value is 70
, that means 70 characters of the
longest line).
It is possible to add a soundtrack to the example using an
audio_path
argument. It can be both: a path to the file or
an URL.
gloss_example("á-ɕa",
"DEF-brother",
"This brother",
audio_path = "abaza_brother.wav")
á-ɕa |
def-brother |
'This brother' |
You can hear the recording if you click on the note icon above. If
you do not like the icon, you can change it to any text using an
audio_label
argument.
Adding video is also possible:
gloss_example("PIECE",
"piece",
video_path = "USL_piece.mp4")
PIECE |
piece |
<video src="USL_piece.mp4" controls="TRUE" width="320" height="240"></video> |
There are additional arguments video_width
and
video_hight
for width and hight.
When an example is small, the author may not want to put it in a
separate paragraph, but prefer to display it as part of the running
text. This is possible to achieve using the standard for
rmarkdown
inline code. The
result of the R code can be inserted into the rmarkdown document using
the backtick
symbol and the small r, for example `r 2+2`
will be
rendered as 4. Currently lingglosses
can not automatically
detect whether code was provided via code chunk or inline. So if you
want to use an in-text glossed example and want the glosses to appear in
list, it is possible to write them using the
gloss_example()
with the intext = TRUE
argument. Here is a Turkish example from (DeLancey (1997)): Kemal
gel-miş (Kemal come-mir) that
was produced with the following inline code:
`r gloss_example("Kemal gel-miş", "Kemal come-MIR", intext = TRUE)`
In the third section I show how you can create a semi-automatically compiled list of abbreviations for your document. As an example I provide the list for this exact document. Even though the mir gloss appears only in this exact section in the in-text example above, it appears in the lists presented in the third section.
add_gloss()
Sometimes glosses are used in other environments besides examples,
e.g. in a table or in the text. So if you want to use in-text glosses
and want them to appear in the glosses list, it is possible to add them
using the add_gloss()
function. As an example I adapted
part of the verbal inflection paradigm of Andi (East Caucasian) from
Table 2 (Verhees 2019: 199):
aff | neg | |
---|---|---|
aor | -∅ | -sːu |
msd | -r | -sːu-r |
hab | -do | -do-sːu |
fut | -dja | -do-sːja |
inf | -du | -du-sːu |
that is generated using the folowing markdown4 code5:
| | `r add_gloss("AFF")` | `r add_gloss("NEG")` |
|----------------------|----------------------|----------------------|
| `r add_gloss("AOR")` | -∅ | *-sːu* |
| `r add_gloss("MSD")` | *-r* | *-sːu-r* |
| `r add_gloss("HAB")` | *-do* | *-do-sːu* |
| `r add_gloss("FUT")` | *-dja* | *-do-sːja* |
| `r add_gloss("INF")` | *-du* | *-du-sːu* |
In the third section I show you how to create a semi-automatically compiled abbreviation list for your document. As an example I provide the list of abbreviations for this exact document. Even though the fut and msd glosses appears only in this exact section in the table above, it appears in the lists presented in the third section.
Unfortunately, gloss extraction implemented in
lingglosses
is case sensitive. That makes it hard to use
for the glossing of Sign Languages, because:
I will illustrate these problems with an example from Russian Sign Language (Kimmelman 2012: 421):
gloss_example(glosses = c("LH: {CHAIR} ________",
"RH: {} CL:{SIT}.{ON}"),
free_translation = "The cat sits on the chair",
comment = "[RSL; Eks3–12]",
drop_transliteration = TRUE)
lh: | CHAIR | ________ |
rh: | cl:SIT.ON | |
'The cat sits on the chair' | ||
[RSL; Eks3–12] |
The capitalization that is not used for morphemic glossing is
embraced with curly brackets, so that lingglosses
does not
treat these items as glosses. Two separate gloss lines for different
hands are provided with a vector with two elements (see c()
function for the vector creation). It is important to provide the
drop_transliteration = TRUE
argument, otherwise internal
tests within the gloss_example()
function will fail.
It is also possible to use pictures in a transliteration line, see an example from Kazakh-Russian Sign Language (Kuznetsova et al. 2021: 51) (pictures are used with the permission of the author Anna Kuznetsova):
gloss_example("![](when.png) ![](mom.png) ![](tired.png)",
c("br_raise_______ {} {}",
"chin_up_______ {} {}",
"{WHEN} {MOM} {TIRED}"),
"When was mom tired?")
br_raise_______ | ||
chin_up_______ | ||
WHEN | MOM | TIRED |
'When was mom tired?' |
The first line corresponds to pictures in markdown format that should
be located in the same folder (otherwise you need to specify the path to
them, e.g. ![](images/your_plot.png)
). The next three lines
correspond to different lines in the example with some non-manual
articulation: as before, all glossing lines are stored as a vector of
strings. The user can replace {}
with _______
in order to show the scope of non-manual articulation.
After you finished your text, it is possible to call the
make_gloss_list()
function in order to automatically create
a list of abbreviations.
make_gloss_list()
1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affix; aor — aorist; aux — auxiliary; cl — clitic; cvb — converb; dat — dative; dcl — declarative; def — definite; dem — demonstrative; fact — factitive; fut — future; hab — habitual; imp — imperative; in — in a container; inf — infinitive; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — neutral plural; npst — non-past; obl — oblique; pfv — perfective; pl — plural; poss — possessive; pot — potential; prf — perfect; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sens — sensory evidential; sfx — suffix
This function works with the built-in dataset glosses_df
that is compiled from Leipzig Glosses, Wikipedia
page and articles from the open access journal Glossa6. Everybody can download
and change this dataset for their own purposes. I would be grateful if
you leave your proposals for changes to the dataset for this list in the
issue
tracker on GitHub.
It is possible that the user is not satisfied with the result of the
make_gloss_list()
function. In this case there are two
possible strategies. The first strategy is to copy the result of the
make_gloss_list()
, modify it and paste it into your
rmarkdown
document. Sometimes you work on some volume
dedicated to a particular group of languages and you want to assure that
glosses are the same across all articles. Then you can compile your own
table with the columns gloss
and definition_en
and use it within the make_gloss_list
function. As you can
see, all glosses specified in the my_abbreviations
dataset
changed their values in the output below:
my_abbreviations <- data.frame(gloss = c("NPST", "HAB", "INF", "NEG"),
definition_en = c("non-past tense", "habitual aspect", "infinitive", "negation marker"))
make_gloss_list(my_abbreviations)
1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affix; aor — aorist; aux — auxiliary; cl — clitic; cvb — converb; dat — dative; dcl — declarative; def — definite; dem — demonstrative; fact — factitive; fut — future; hab — habitual aspect; imp — imperative; in — in a container; inf — infinitive; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation marker; np — noun phrase; npl — neutral plural; npst — non-past tense; obl — oblique; pfv — perfective; pl — plural; poss — possessive; pot — potential; prf — perfect; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sens — sensory evidential; sfx — suffix
Unfortunately, some glosses can have multiple meanings in different
traditions (e.g. ass can be either an
associative plural or assertive mood). By default
make_gloss_list()
shows only some entries that were chosen
by the author of this package. You can see all the possibilities if you
add the argument all_possible_variants = TRUE
. As you can
see, there are multiple possible values for aff, ass, cl, imp, in, ins, and
prf:
make_gloss_list(all_possible_variants = TRUE)
1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affirmative; aff — affix; aor — aorist; ass — assertive; ass — associative; aux — auxiliary; cl — classifier; cl — clitic; cvb — converb; dat — dative; dcl — declarative; def — definite; dem — demonstrative; fact — factative; fact — factitive; fact — factive; fut — future; hab — habitual; imp — imperative; imp — imperfect; imp — imperfective; imp — impersonal; in — in a container; in — inclusive; in — inessive; inf — infinitive; ins — instantiated; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — neutral plural; npst — non-past; obl — oblique; pfv — perfective; pl — plural; poss — possessive; poss — possessor; pot — potential; prf — perfect; prf — perfective; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sens — sensive; sens — sensory evidential; sfx — suffix
You can notice that problematic glosses (those which lack a
definition or are duplicated) are colored. This can be switched off
adding the argument annotate_problematic = FALSE
:
make_gloss_list(all_possible_variants = TRUE, annotate_problematic = FALSE)
1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affirmative; aff — affix; aor — aorist; ass — assertive; ass — associative; aux — auxiliary; cl — classifier; cl — clitic; cvb — converb; dat — dative; dcl — declarative; def — definite; dem — demonstrative; fact — factative; fact — factitive; fact — factive; fut — future; hab — habitual; imp — imperative; imp — imperfect; imp — imperfective; imp — impersonal; in — in a container; in — inclusive; in — inessive; inf — infinitive; ins — instantiated; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — neutral plural; npst — non-past; obl — oblique; pfv — perfective; pl — plural; poss — possessive; poss — possessor; pot — potential; prf — perfect; prf — perfective; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sens — sensive; sens — sensory evidential; sfx — suffix
In case you want to remove some glosses from the list, you can use
the argument remove_glosses
:
make_gloss_list(remove_glosses = c("1SG", "3SG"))
3 — third person; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affix; aor — aorist; ass — associative; aux — auxiliary; cl — clitic; cvb — converb; dat — dative; dcl — declarative; def — definite; dem — demonstrative; fact — factitive; fut — future; hab — habitual; imp — imperative; in — in a container; inf — infinitive; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — neutral plural; npst — non-past; obl — oblique; pfv — perfective; pl — plural; poss — possessive; pot — potential; prf — perfect; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sens — sensory evidential; sfx — suffix
It is really important that one should not treat the results of the
make_gloss_list()
function as carved in stone: once it is
compiled you can copy, modify and paste it in your document. You can try
to spend time improving the output of the function, but at the final
stage it is probably faster to correct it manually.
Right now there is no direct way of knitting lingglosses
to .docx
format. You can knit by adding an argument
always_allow_html: true
to your yaml file, however the
result will be not ideal. You can work around this by copying and
pasting from the .html
version:
Both kniting to .pdf
and .docx
outputs are
possible, but there are some known restrictions:
So if you want to avoid these problems, the best solution is to use
one of the latex glossing packages listed in the first footnote and the
package glossaries
for automatic compilation of glosses.
glosses_df
datasetAs mentioned above, the make_gloss_list()
function’s
definitions are based on the glosses_df
dataset.
str(glosses_df)
## 'data.frame': 1342 obs. of 4 variables:
## $ gloss : chr "&" "1" "1DU" "1O" ...
## $ definition_en: chr "coordination marker" "first person" "first dual" "first person object" ...
## $ source : chr "lingglosses" "Leipzig Glossing Rules" "lingglosses" "lingglosses" ...
## $ weight : num 1 1 1 1 1 1 1 1 1 1 ...
Most definitions are too general on purpose: asc, for example, is defined as
associative
, which can be associative case, associative
plural, associative mood, or associated motion. Since the user can
easily replace the output with their own definitions, it is not a
problem for the lingglosses
package. However, it will make
things easier, and more comparable and reproducible, if linguists would
create a unified database of glosses, similar to the concepticon (Johann Mattis List et al. 2021) (Max Ionov mentioned to me that
this could be Ontologies of
Linguistic Annotation). If you think that some glosses and
definitions should be changed, do not hesitate to open an
issue on the GitHub page of the lingglosses
project.
There have been several alternative to lingglosses
infrastructure for interlinear glossed examples that might be
interesting for the reader:
Xigt
(Goodman et al. 2015);pyigt
(Johann-Mattis List, Sims, and Forkel
2021).Only several of them (ODIN
, Xigt
,
scription
and pyigt
) are attempts towards
creating a standard for the databases of interlinear glossed examples. I
also wanted to mention paper by (Round et al.
2020), where authors provided a script for the
automated identification and parsing of interlinear glossed text from
scanned page images. The motivation for creating cross-linguistic
database of interlinear glossed examples is the following:
The lingglosses
package make an attempt for going in
this direction and provide an ability to extract examples in table
format that can be further transformed into other formats. Each
interlinear glossed example could be easily represented as a table using
the convert_to_df()
function.
convert_to_df(transliteration = "bur-e-**ri** c'in-ne-sːu",
glosses = "fly-NPST-**INF** know-HAB-NEG",
free_translation = "I cannot fly.",
comment = "(lit. do not know how to)",
annotation = "Бурери цIиннессу.")
This table lists all the parameters that could be useful for a database, and has the following columns:
id
— unique identifier through the whole table;example_id
— unique identifier of particular
examples;word_id
— unique identifier of the word in the example
(delimited with spaces and other punctuation);morpheme_id
— unique identifier of the morpheme within
the word (delimited with -
or =
);transliteration
— language material;gloss
— glosses;delimiter
— delimiters: space, -
or
=
transliteration_orig
— original string with
transliteration;glosses_orig
— original string with glosses;free_translation
— original string with the free
translation;comment
— original string with a comment;When you use the gloss_example()
function, a table of
the structure described above is added to the database, so in the end
you can extract it by saving the output of the
get_examples_db()
function to the file:
get_examples_db()
Of course one can just use a subset of some columns:
unique(get_examples_db()[, c("example_id", "transliteration_orig", "glosses_orig")])
If you want to render a .pdf
version you
can either use latex and multiple linguistic packages developed for it
(see e. g. gb4e
, langsci
, expex
, philex
), or you
can render .html
first and convert it to .pdf
afterwards.↩︎
I used annotation
for representing
orthography, but it also possible to use this tier for the annotation of
words, like here:↩︎
Sometimes it is make sense to set this option ones for
the whole document using the following code
options("lingglosses.italic_transliteration" = FALSE)
.↩︎
The table generated with markdown is visually poor.
There are a lot of other ways to generate a table in R:
kable()
from knitr
; kableExtra
package, DT
package and many others.↩︎
It is easier to generate Markdown or Latex tables with Libre Office or MS Excel and then use an online table generator website like https://www.tablesgenerator.com/.↩︎
The script for collecting glosses is available here.
The list was manually corrected and merged with glosses from other
sources. This kind of glosses are marked in the glosses_df
dataset as lingglosses
in the source
column.↩︎