This repo contains the inflection tables for Gulf Arabic (ISO 639-3 afb
)
afb
: entries based on lemmas that appear in the Annotated Gumar Corpus.afb.args
: a UniMorph 4.0 compatible verion ofafb
afb.gloss
: English glosses for the lemmas inafb
.README.md
: this file.
- The inflections of most of the verb lemmas were generated through the CamelTools morphological generator component (demo, API) (Obeid et al., 2020). The morphological database used is CALIMA-GLF (Khalifa et al., 2017).
- The forms for the all nominal lemmas and some verbs are what appear in the Annotated Gumar Corpus (Khalifa et al., 2018). Therefore, the paradigms for them might not be complete. Additionally, to eliminate noisy entries arising from possible gold errors we used Morph/POS statistics from the same corpus to eliminate incorrect forms as much as possible.
- The POS and morphological features are then mapped to UniMorph according to the current schema (Sylak-Glassman 2016).
- The core POS of the lemmas are Verbs, Nouns, and Adjectives.
- The total number of lemmas is 6,707, with the following POS distribution:
V
: 2,183 (32.6%) lemmasN
: 3,003 (44.8%) lemmasADJ
: 1,520 (22.7%) lemmas
- The Annotated Gumar Corpus (Khalifa et al., 2018). The corpus can be found here.
- Clitics were not included or marked in the inflection tables. The only clitic
included is the determiner
Al+
in order to be consistent with the other Arabic varieties in UniMorph. - All the lemmas are diacritized. However, only the verb forms coming from CALIMA-GLF are diacritized. Removing all the diacritics is straightforward and can be done through a simple regex. Alternatively, CamelTools provides a dediacritization utility: an API and a CLI.
- All nominals with
Al+
will be tagged withDEF
for definiteness. All nominals withoutAl+
will be repeated twice: once asINDF
and once asPSSD
. That is because in most cases possession marking is not overt due to the orthography. - All verbs are by default in the active voice.
Salam Khalifa and Nizar Habash (CAMeL Lab @ NYU Abu Dhabi)
The complete inflection table for the noun lemma بَركَن 'park (a vehicle)'
بَركَن بَركَنتَوا V;PFV;PL;2
بَركَن تبَركِن V;IPFV;FEM;SG;3
بَركَن بَركَنَوا V;PFV;PL;3
بَركَن بَركِنَوا V;IMP;PL;2
بَركَن بَركَن V;PFV;MASC;SG;3
بَركَن بَركَنت V;PFV;MASC;SG;2
بَركَن بَركَنت V;PFV;SG;1
بَركَن بَركَنَّا V;PFV;PL;1
بَركَن بَركَنَت V;PFV;FEM;SG;3
بَركَن تبَركِنُون V;IPFV;PL;2
بَركَن اَبَركِن V;IPFV;SG;1
بَركَن تبَركِنِين V;IPFV;FEM;SG;2
بَركَن يبَركِنُون V;IPFV;PL;3
بَركَن بَركَنتِي V;PFV;FEM;SG;2
بَركَن بَركِن V;MASC;IMP;SG;2
بَركَن يبَركِن V;IPFV;MASC;SG;3
بَركَن تبَركِن V;IPFV;MASC;SG;2
بَركَن نبَركِن V;IPFV;PL;1
بَركَن بَركِنِي V;FEM;IMP;SG;2
The complete inflection table for the noun lemma سِيّارَة 'car'
سِيّارَة السيارة N;DEF;FEM;SG
سِيّارَة سياير N;INDF;FEM;PL
سِيّارَة سيار N;INDF;FEM;SG
سِيّارَة سيار N;FEM;SG;PSSD
سِيّارَة سيارتين N;INDF;FEM;DU
سِيّارَة السياير N;DEF;FEM;PL
سِيّارَة سياير N;FEM;PL;PSSD
سِيّارَة سيارتين N;FEM;DU;PSSD
سِيّارَة السيارتين N;DEF;FEM;DU