1 Introduction

The list of abbreviations is an obligatory part of linguistic articles that nobody reads. These lists contain definitions of abbreviations used in the article (e.g. the names of corpora or sign languages), but also a list of linguistic glosses — abbreviations used in interlinear glossed examples. There is a document proposing standardized glossing rules (Comrie, Haspelmath, and Bickel 2008), which ends with a list of 84 standard abbreviations. A much bigger list of standard abbreviations is present on Wikipedia. However, researchers can deviate from the proposed abbreviations and use their own instead.

The following list of abbreviations, which I came across in a published article, makes it clear that there is room for improvement in compiling such lists:

NOM = nominative, GEN = nominative, DAT = nominative, ACC = accusative, VOC = accusative, LOC = accusative, INS = accusative, PL = plural, SG = singular

Besides the obvious errors, this list contains more problems that I would like to point out:

the lack of alphabetic order;
some abbreviations used in the article (sbjv, imp) are absent in the list.

The main goal of the lingglosses R package is to provide an option for creating:

interlinear glossed linguistic glosses for an .html output of rmarkdown (Xie, Allaire, and Grolemund 2018)¹;
a semi-automatically compiled list of glosses.

You can install the stable version of the package from CRAN:

install.packages("lingglosses")

You can also install the development version of lingglosses from GitHub with:

# install.packages("remotes")
remotes::install_github("agricolamz/lingglosses")

In order to use the package you need to load it with the library() call:

library(lingglosses)

You can go through the examples in this tutorial or you can create a lingglosses example from the rmarkdown template (File > New File > R Markdown… > From Template > lingglosses Document).

2 Create glossed examples with `gloss_example()`

2.1 Basic usage

The main function of the lingglosses package is gloss_example(). This package has the following arguments:

transliteration;
glosses;
free_translation;
comment;
grammaticality;
annotation²;
line_length.

All arguments except the last one are self-explanatory.

gloss_example(transliteration = "bur-e-**ri** c'in-ne-sːu-w",
              glosses = "fly-NPST-**INF** know-HAB-NEG-M",
              free_translation = "I cannot fly. (Zilo Andi, East Caucasian)",
              comment = "(lit. do not know how to)",
              annotation = "Бурери цIиннессу.",
              grammaticality = "*")

	Бурери	цIиннессу.
＊	bur-e-ri	c’in-ne-sːu-w
	fly-npst-inf	know-hab-neg-m
(lit. do not know how to)
‘I cannot fly. (Zilo Andi, East Caucasian)’

In this first example you can see that:

the transliteration line is italic by default (if you do not want it, just add the argument italic_transliteration = FALSE)³;
you can use standard markdown syntax (e.g. **a** for bold);
the free translation line is automatically framed with quotation marks.

Since the function arguments’ names are optional in R, users can omit them as long as they follow the order of the arguments (you can always find the correct order in ?gloss_example):

gloss_example("bur-e-**ri** c'in-ne-sːu-w",
              "fly-NPST-**INF** know-HAB-NEG-M",
              "I cannot fly. (Zilo Andi, East Caucasian)",
              "(lit. do not know how to)",
              "Бурери цIиннессу.",
              "*")

	Бурери	цIиннессу.
＊	bur-e-ri	c’in-ne-sːu-w
	fly-npst-inf	know-hab-neg-m
(lit. do not know how to)
‘I cannot fly. (Zilo Andi, East Caucasian)’

It is possible to number and call your examples using the standard rmarkdown tool for generating lists (@):

(@) my first example
(@) my second example
(@) my third example

renders as:

my first example
my second example
my third example

In order to reference examples in the text you need to give them names:

(@my_ex) example for referencing

example for referencing

With the names settled you can reference the example (4) in the text using the following code (@my_ex).

So this kind of example referencing can be used with lingglosses examples like in (5) and (6). The only important details are:

change your code chunk argument to echo = FALSE (or specify it for all code chunks with the following comand in the begining of the document knitr::opts_chunk$set(echo = FALSE"));
do not put an empty line between the reference line (with (@...)) and the code chunk with lingglosses code.

bur-e-ri c’in-ne-sːu

fly-npst-inf know-hab-neg

(lit. do not know how to)

‘I cannot fly. (Zilo Andi, East Caucasian)’
Zilo Andi, East Caucasian

bur-e-ri c’in-ne-sːu

fly-npst-inf know-hab-neg

(lit. do not know how to)

‘I cannot fly.’

Sometimes people gloss morpheme by morpheme (this is especially useful for polysynthetic languages). It is also possible in lingglosses. You can annotate slots with the annotation argument, see footnote 2 for the details.

Abaza, West Caucasian (Arkadiev and Lander 2020: example 5.2)

gloss_example("s- z- á- la- nəq'wa -wa -dzə -j -ɕa -t'",
              "1SG.ABS POT 3SG.N.IO LOC pass IPF LOC 3SG.M.IO seem(AOR) DCL",
              "It seemed to him that I would be able to pass there.")

s-	z-	á-	la-	nəq’wa	-wa	-dzə	-j	-ɕa	-t’
1sg.abs	pot	3sg.n.io	loc	pass	ipf	loc	3sg.m.io	seem(aor)	dcl
‘It seemed to him that I would be able to pass there.’

The glossing extraction algorithm implemented in lingglosses is case sensitive, so if you want to escape it you can use curly brackets:

Kvankhidatli Andi, (Verhees 2019: 203)

gloss_example("den=no he.ʃː-qi hartʃ'on-k'o w-uʁi w-uk'o.",
              "{I}=ADD DEM.M-INS watch-CVB M-stand.AOR M-be.AOR",
              "And I stood there, watching him.")

den=no	he.ʃː-qi	hartʃ’on-k’o	w-uʁi	w-uk’o.
I=add	dem.m-ins	watch-cvb	m-stand.aor	m-be.aor
‘And I stood there, watching him.’

In the example above {I} is just the English word I that will be escaped and will not appear in the gloss list as marker of class I.

It make sense to avoid to use single quotes for the quotation, since it can cause some troubles for the package’s functions and use escape slash for quotations, like in the following example:

Kunbzang Japhug, (Jacques 2021: 1143)

gloss_example("\"a-pi ɲɯ-ɕpaʁ-a\" ti ɲɯ-ŋu",
              "1SG.POSS-elder.sibling SENS-be.thirsty-1SG say:FACT SENS-be",
              "She said: \"Sister, I am thirsty.\"")

“a-pi	ɲɯ-ɕpaʁ-a”	ti	ɲɯ-ŋu
1sg.poss-elder.sibling	sens-be.thirsty-1sg	say:fact	sens-be
‘She said: “Sister, I am thirsty.”’

After a while I was asked to make it possible to add sole line examples:

gloss_example("Learn to value yourself, which means: to fight for your happiness. (Ayn Rand)",
              line_length = 100)

Learn

value

yourself,

which

means:

fight

for

your

happiness.

(Ayn

Rand)

2.2 Multiline examples

Sometimes examples are too long and do not fit onto the page. In that case you need to add the argument results='asis' to your chunk. gloss_example() will then automatically split your example into multiple rows.

Mishlesh Tsakhur, East Caucasian (Maisak and Tatevosov 2007: 386)

gloss_example('za-s jaːluʁ **wo-b** **qa-b-ɨ**; turs-ubɨ qal-es-di ǯiqj-eː jaːluʁ-**o-b** **qa-b-ɨ**', 
               '1SG.OBL-DAT shawl.3 AUX-3 PRF-3-bring.PFV woolen_sock-PL NPL.bring-PL-A.OBL place-IN shawl.3-AUX-3 PRF-3-bring.PFV',
               '(they) **brought** me a shawl; instead of (lit. in place of bringing) woolen socks, (they) **brought** a shawl.',
               '(Woolen socks are considered to be more valuable than a shawl.)')

za-s	jaːluʁ	wo-b	qa-b-ɨ;	turs-ubɨ	qal-es-di
1sg.obl-dat	shawl.3	aux-3	prf-3-bring.pfv	woolen_sock-pl	npl.bring-pl-a.obl

ǯiqj-eː	jaːluʁ-o-b	qa-b-ɨ
place-in	shawl.3-aux-3	prf-3-bring.pfv
(Woolen socks are considered to be more valuable than a shawl.)
‘(they) brought me a shawl; instead of (lit. in place of bringing) woolen socks, (they) brought a shawl.’

If you are not satisfied with the result of the automatic split you can change the value of the line_length argument (the default value is 70, that means 70 characters of the longest line).

2.3 Add audio and video

It is possible to add a soundtrack to the example using an audio_path argument. It can be both: a path to the file or an URL.

Abaza, West Caucasian (my field recording)

gloss_example("á-ɕa",
              "DEF-brother",
              "This brother",
              audio_path = "abaza_brother.wav")

á-ɕa

def-brother

‘This brother’ ♪

You can hear the recording if you click on the note icon above. If you do not like the icon, you can change it to any text using an audio_label argument.

Adding video is also possible:

Ukrainian Sign Language (video from https://www.spreadthesign.com)

gloss_example("PIECE",
              "piece",
              video_path = "USL_piece.mp4")

PIECE

piece

There are additional arguments video_width and video_hight for width and hight.

2.4 In-text examples

When an example is small, the author may not want to put it in a separate paragraph, but prefer to display it as part of the running text. This is possible to achieve using the standard for rmarkdown inline code. The result of the R code can be inserted into the rmarkdown document using the backtick symbol and the small r, for example `r 2+2` will be rendered as 4. Currently lingglosses can not automatically detect whether code was provided via code chunk or inline. So if you want to use an in-text glossed example and want the glosses to appear in list, it is possible to write them using the gloss_example() with the intext = TRUE argument. Here is a Turkish example from (DeLancey (1997)): Kemal gel-miş (Kemal come-mir) that was produced with the following inline code:

`r gloss_example("Kemal gel-miş", "Kemal come-MIR", intext = TRUE)`

In the third section I show how you can create a semi-automatically compiled list of abbreviations for your document. As an example I provide the list for this exact document. Even though the mir gloss appears only in this exact section in the in-text example above, it appears in the lists presented in the third section.

2.5 Stand-alone glosses with `add_gloss()`

Sometimes glosses are used in other environments besides examples, e.g. in a table or in the text. So if you want to use in-text glosses and want them to appear in the glosses list, it is possible to add them using the add_gloss() function. As an example I adapted part of the verbal inflection paradigm of Andi (East Caucasian) from Table 2 (Verhees 2019: 199):

	aff	neg
aor	-∅	-sːu
msd	-r	-sːu-r
hab	-do	-do-sːu
fut	-dja	-do-sːja
inf	-du	-du-sːu

that is generated using the folowing markdown⁴ code⁵:

|                      | `r add_gloss("AFF")` | `r add_gloss("NEG")` |
|----------------------|----------------------|----------------------|
| `r add_gloss("AOR")` | -∅                   | *-sːu*               |
| `r add_gloss("MSD")` | *-r*                 | *-sːu-r*             |
| `r add_gloss("HAB")` | *-do*                | *-do-sːu*            |
| `r add_gloss("FUT")` | *-dja*               | *-do-sːja*           |
| `r add_gloss("INF")` | *-du*                | *-du-sːu*            |

In the third section I show you how to create a semi-automatically compiled abbreviation list for your document. As an example I provide the list of abbreviations for this exact document. Even though the fut and msd glosses appears only in this exact section in the table above, it appears in the lists presented in the third section.

2.6 Glossing Sign languages

Unfortunately, gloss extraction implemented in lingglosses is case sensitive. That makes it hard to use for the glossing of Sign Languages, because:

Sign linguists gloss lexical items with capitalized English translations;
Sign language glosses are sometimes split into two lines, each of which is associated with one hand (or even more if you want to account for non-manual markers);
Sign language glosses should be somehow aligned with video/pictures (see the fascinating signglossR by Calle Börstell);
There can be empty space in glosses;
There can be some placeholders that corresponds to an utterance by one articulator (e.g. a hand), which are held stationary in the signing space during the articulation made by another articulator.

I will illustrate these problems with an example from Russian Sign Language (Kimmelman 2012: 421):

gloss_example(glosses = c("LH: {CHAIR} ________",
                          "RH: {} CL:{SIT}.{ON}"),
              free_translation = "The cat sits on the chair", 
              comment = "[RSL; Eks3–12]",
              drop_transliteration = TRUE)

lh:	CHAIR	________
rh:		cl:SIT.ON
[RSL; Eks3–12]
‘The cat sits on the chair’

The capitalization that is not used for morphemic glossing is embraced with curly brackets, so that lingglosses does not treat these items as glosses. Two separate gloss lines for different hands are provided with a vector with two elements (see c() function for the vector creation). It is important to provide the drop_transliteration = TRUE argument, otherwise internal tests within the gloss_example() function will fail.

It is also possible to use pictures in a transliteration line, see an example from Kazakh-Russian Sign Language (Kuznetsova et al. 2021: 51) (pictures are used with the permission of the author Anna Kuznetsova):

gloss_example("![](when.png) ![](mom.png) ![](tired.png)",
              c("br_raise_______ {} {}",
                "chin_up_______ {} {}",
                "{WHEN} {MOM} {TIRED}"),
              "When was mom tired?")


br_raise_______
chin_up_______
WHEN	MOM	TIRED
‘When was mom tired?’

The first line corresponds to pictures in markdown format that should be located in the same folder (otherwise you need to specify the path to them, e.g. ![](images/your_plot.png)). The next three lines correspond to different lines in the example with some non-manual articulation: as before, all glossing lines are stored as a vector of strings. The user can replace {} with _______ in order to show the scope of non-manual articulation.

3 Create semi-automatic compiled abbreviation list

After you finished your text, it is possible to call the make_gloss_list() function in order to automatically create a list of abbreviations.

make_gloss_list()

1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affix; aor — aorist; aux — auxiliary; cl — clitic; cvb — converb; dat — dative; dcl — declarative; def — definite; dem — demonstrative; fact — factitive; fut — future; hab — habitual; imp — imperative; in — in a container; inf — infinitive; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — neutral plural; npst — non-past; obl — oblique; pfv — perfective; pl — plural; poss — possessive; pot — potential; prf — perfect; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sens — sensory evidential; sfx — suffix

This function works with the built-in dataset glosses_df that is compiled from Leipzig Glosses, Wikipedia page and articles from the open access journal Glossa ⁶. Everybody can download and change this dataset for their own purposes. I would be grateful if you leave your proposals for changes to the dataset for this list in the issue tracker on GitHub.

It is possible that the user is not satisfied with the result of the make_gloss_list() function. In this case there are two possible strategies. The first strategy is to copy the result of the make_gloss_list(), modify it and paste it into your rmarkdown document. Sometimes you work on some volume dedicated to a particular group of languages and you want to assure that glosses are the same across all articles. Then you can compile your own table with the columns gloss and definition_en and use it within the make_gloss_list function. As you can see, all glosses specified in the my_abbreviations dataset changed their values in the output below:

my_abbreviations <- data.frame(gloss = c("NPST", "HAB", "INF", "NEG"),
                               definition_en = c("non-past tense", "habitual aspect", "infinitive", "negation marker"))
make_gloss_list(my_abbreviations)

1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affix; aor — aorist; aux — auxiliary; cl — clitic; cvb — converb; dat — dative; dcl — declarative; def — definite; dem — demonstrative; fact — factitive; fut — future; hab — habitual aspect; imp — imperative; in — in a container; inf — infinitive; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation marker; np — noun phrase; npl — neutral plural; npst — non-past tense; obl — oblique; pfv — perfective; pl — plural; poss — possessive; pot — potential; prf — perfect; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sens — sensory evidential; sfx — suffix

Unfortunately, some glosses can have multiple meanings in different traditions (e.g. ass can be either an associative plural or assertive mood). By default make_gloss_list() shows only some entries that were chosen by the author of this package. You can see all the possibilities if you add the argument all_possible_variants = TRUE. As you can see, there are multiple possible values for aff, ass, cl, imp, in, ins, and prf:

make_gloss_list(all_possible_variants = TRUE)

1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affirmative; aff — affix; aor — aorist; ass — assertive; ass — associative; aux — auxiliary; cl — classifier; cl — clitic; cvb — converb; dat — dative; dcl — declarative; def — definite; dem — demonstrative; fact — factative; fact — factitive; fact — factive; fut — future; hab — habitual; imp — imperative; imp — imperfect; imp — imperfective; imp — impersonal; in — in a container; in — inclusive; in — inessive; inf — infinitive; ins — instantiated; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — neutral plural; npst — non-past; obl — oblique; pfv — perfective; pl — plural; poss — possessive; poss — possessor; pot — potential; prf — perfect; prf — perfective; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sens — sensive; sens — sensory evidential; sfx — suffix

You can notice that problematic glosses (those which lack a definition or are duplicated) are colored. This can be switched off adding the argument annotate_problematic = FALSE:

make_gloss_list(all_possible_variants = TRUE, annotate_problematic = FALSE)

In case you want to remove some glosses from the list, you can use the argument remove_glosses:

make_gloss_list(remove_glosses = c("1SG", "3SG"))

3 — third person; a — agent-like argument of canonical transitive verb; abs — absolutive; add — additive; aff — affix; aor — aorist; ass — associative; aux — auxiliary; cl — clitic; cvb — converb; dat — dative; dcl — declarative; def — definite; dem — demonstrative; fact — factitive; fut — future; hab — habitual; imp — imperative; in — in a container; inf — infinitive; ins — instrumental; io — indirect object; ipf — imperfective; lh — left hand; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — neutral plural; npst — non-past; obl — oblique; pfv — perfective; pl — plural; poss — possessive; pot — potential; prf — perfect; prfx — prefix; rh — right hand; root — root; sbjv — subjunctive; sens — sensory evidential; sfx — suffix

It is really important that one should not treat the results of the make_gloss_list() function as carved in stone: once it is compiled you can copy, modify and paste it in your document. You can try to spend time improving the output of the function, but at the final stage it is probably faster to correct it manually.

4 Other output formats

Both kniting to .pdf and .docx outputs are possible, but there are some known restrictions:

markdown bold and italic annotations do not work;
example numbers appear above the example;
there is no non-breaking space in the list of glosses.

So if you want to avoid these problems, the best solution is to use one of the latex glossing packages listed in the first footnote and the package glossaries for automatic compilation of glosses.

5 About the `glosses_df` dataset

As mentioned above, the make_gloss_list() function’s definitions are based on the glosses_df dataset.

str(glosses_df)

## 'data.frame':    1342 obs. of  4 variables:
##  $ gloss        : chr  "1" "2" "3" "&" ...
##  $ definition_en: chr  "first person" "second person" "third person" "coordination marker" ...
##  $ source       : chr  "Leipzig Glossing Rules" "Leipzig Glossing Rules" "Leipzig Glossing Rules" "lingglosses" ...
##  $ weight       : num  1 1 1 1 1 1 1 1 1 1 ...

Most definitions are too general on purpose: asc, for example, is defined as associative, which can be associative case, associative plural, associative mood, or associated motion. Since the user can easily replace the output with their own definitions, it is not a problem for the lingglosses package. However, it will make things easier, and more comparable and reproducible, if linguists would create a unified database of glosses, similar to the concepticon (Johann Mattis List et al. 2021) (Max Ionov mentioned to me that this could be Ontologies of Linguistic Annotation). If you think that some glosses and definitions should be changed, do not hesitate to open an issue on the GitHub page of the lingglosses project.

6 Towards a database of interlinear glossed examples

There have been several alternative to lingglosses infrastructure for interlinear glossed examples that might be interesting for the reader:

multiple packages for glossing in LaTeX:
- gb4e,
- langsci,
- expex,
- philex
ODIN project (Lewis and Xia 2010) (looks like this project is not longer active);
a Java-script library Leipzig.js;
a Python library Xigt (Goodman et al. 2015);
scription format and scription2dlx Java-script library (Hieber 2020);
a Python library pyigt (Johann-Mattis List, Sims, and Forkel 2021).

Only several of them (ODIN, Xigt, scription and pyigt) are attempts towards creating a standard for the databases of interlinear glossed examples. I also wanted to mention paper by (Round et al. 2020), where authors provided a script for the automated identification and parsing of interlinear glossed text from scanned page images. The motivation for creating cross-linguistic database of interlinear glossed examples is the following:

Prevent from disappearing of linguistic facts due to the projects fail (for example field notes of the researcher that did not manage to finish his work: article, dictionary, grammar etc.);
Fight with the publication bias, which cause some linguistic facts left unpublished since they not support a basic idea of author;
Make linguistic work more reproducible and linguistic facts reusable (cf. with human genome database, biodiversity databases or astronomical catalogues).

The lingglosses package make an attempt for going in this direction and provide an ability to extract examples in table format that can be further transformed into other formats. Each interlinear glossed example could be easily represented as a table using the convert_to_df() function.

convert_to_df(transliteration = "bur-e-**ri** c'in-ne-sːu",
              glosses = "fly-NPST-**INF** know-HAB-NEG",
              free_translation = "I cannot fly.",
              comment = "(lit. do not know how to)",
              annotation = "Бурери цIиннессу.")

This table lists all the parameters that could be useful for a database, and has the following columns:

id — unique identifier through the whole table;
example_id — unique identifier of particular examples;
word_id — unique identifier of the word in the example (delimited with spaces and other punctuation);
morpheme_id — unique identifier of the morpheme within the word (delimited with - or =);
transliteration — language material;
gloss — glosses;
delimiter — delimiters: space, - or =
transliteration_orig — original string with transliteration;
glosses_orig — original string with glosses;
free_translation — original string with the free translation;
comment — original string with a comment;

When you use the gloss_example() function, a table of the structure described above is added to the database, so in the end you can extract it by saving the output of the get_examples_db() function to the file:

get_examples_db()

Of course one can just use a subset of some columns:

unique(get_examples_db()[, c("example_id", "transliteration_orig", "glosses_orig")])

References

Arkadiev, P., and Y. Lander. 2020. “The Northwest Caucasian Languages.” In The Oxford Handbook of the Languages of the Caucasus, 369–446.

Comrie, B., M. Haspelmath, and B. Bickel. 2008. “The Leipzig Glossing Rules: Conventions for Interlinear Morpheme-by-Morpheme Glosses.”

DeLancey, S. 1997. “Mirativity: The Grammatical Marking of Unexpected Information.” Linguistic Typology 1 (1): 33–52.

Goldsmith, J. 1979. “The Aims of Autosegmental Phonology.” In Current Approaches to Phonological Theory, edited by D. A. Dinnsen, 202–22. Indiana University Press Bloomington, IN.

Goodman, M. W., J. Crowgey, F. Xia, and E. M. Bender. 2015. “Xigt: Extensible Interlinear Glossed Text for Natural Language Processing.” Language Resources and Evaluation 49 (2): 455–85.

Hieber, Daniel W. 2020. “Digitallinguistics/Scription: V0.7.0.” Zenodo. https://doi.org/10.5281/zenodo.3937864.

Jacques, Guillaume. 2021. A Grammar of Japhug. Vol. 1. Language Science Press.

Kimmelman, V. 2012. “Word Order in Russian Sign Language.” Sign Language Studies 12 (3): 414–45.

Kuznetsova, A., A. Imashev, M. Mukushev, A. Sandygulova, and V. Kimmelman. 2021. “Using Computer Vision to Analyze Non-Manual Marking of Questions in KRSL.” In Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL), 49–59. Association for Machine Translation in the Americas. https://aclanthology.org/2021.mtsummit-at4ssl.6.

Lewis, William D, and Fei Xia. 2010. “Developing ODIN: A Multilingual Repository of Annotated Language Data for Hundreds of the World’s Languages.” Literary and Linguistic Computing 25 (3): 303–19.

List, Johann Mattis, Christoph Rzymski, Simon Greenhill, Nathanael Schweikhard, Kristina Pianykh, Annika Tjuka, Carolin Hundt, and Robert Forkel, eds. 2021. Concepticon 2.5.0. Leipzig: Max Planck Institute for Evolutionary Anthropology. https://concepticon.clld.org/.

List, Johann-Mattis, Nathaniel A. Sims, and Robert Forkel. 2021. “Toward a Sustainable Handling of Interlinear-Glossed Text in Language Documentation.” ACM Trans. Asian Low-Resour. Lang. Inf. Process. 20 (2). https://doi.org/10.1145/3389010.

Maisak, T., and S. Tatevosov. 2007. “Beyond Evidentiality and Mirativity: Evidence from Tsakhur.” In L’Énonciation médiatisée II, 377–406.

Round, E., M. Ellison, J. Macklin-Cordes, and S. Beniamine. 2020. “Automated Parsing of Interlinear Glossed Text from Page Images of Grammatical Descriptions.” In Proceedings of the 12th Language Resources and Evaluation Conference, 2878–83.

Verhees, S. 2019. “General Converbs in Andi.” Studies in Language. International Journal Sponsored by the Foundation “Foundations of Language” 43 (1): 195–230.

Xie, Y., J. J. Allaire, and G. Grolemund. 2018. R Markdown: The Definitive Guide. CRC Press.

If you want to render a .pdf version you can either use latex and multiple linguistic packages developed for it (see e. g. gb4e, langsci, expex, philex), or you can render .html first and convert it to .pdf afterwards.↩︎
I used annotation for representing orthography, but it also possible to use this tier for the annotation of words, like here:

HL H L H

eze a za a

np prfx root sfx

‘Eze swept… (Igbo, from (Goldsmith 1979: 209))’

↩︎
Sometimes it is make sense to set this option ones for the whole document using the following code options("lingglosses.italic_transliteration" = FALSE).↩︎
The table generated with markdown is visually poor. There are a lot of other ways to generate a table in R: kable() from knitr; kableExtra package, DT package and many others.↩︎
It is easier to generate Markdown or Latex tables with Libre Office or MS Excel and then use an online table generator website like https://www.tablesgenerator.com/.↩︎
The script for collecting glosses is available here. The list was manually corrected and merged with glosses from other sources. This kind of glosses are marked in the glosses_df dataset as lingglosses in the source column.↩︎

Introduction to `lingglosses`

George Moroz

2025-03-05

1 Introduction

2 Create glossed examples with `gloss_example()`

2.1 Basic usage

2.2 Multiline examples

2.3 Add audio and video

2.4 In-text examples

2.5 Stand-alone glosses with `add_gloss()`

2.6 Glossing Sign languages

3 Create semi-automatic compiled abbreviation list

4 Other output formats

5 About the `glosses_df` dataset

6 Towards a database of interlinear glossed examples

References

bur-e-ri	c’in-ne-sːu
fly-npst-inf	know-hab-neg
(lit. do not know how to)
‘I cannot fly. (Zilo Andi, East Caucasian)’

HL	H	L	H
eze	a	za	a
np	prfx	root	sfx
‘Eze swept… (Igbo, from (Goldsmith 1979: 209))’

Introduction to lingglosses

George Moroz

2025-03-05

1 Introduction

2 Create glossed examples with gloss_example()

2.1 Basic usage

2.2 Multiline examples

2.3 Add audio and video

2.4 In-text examples

2.5 Stand-alone glosses with add_gloss()

2.6 Glossing Sign languages

3 Create semi-automatic compiled abbreviation list

4 Other output formats

5 About the glosses_df dataset

6 Towards a database of interlinear glossed examples

References

Introduction to `lingglosses`

2 Create glossed examples with `gloss_example()`

2.5 Stand-alone glosses with `add_gloss()`

5 About the `glosses_df` dataset