This repo contains the inflection tables for Egyptian Arabic (ISO 639-3 arz
)
arz
: entries based on lemmas that appear in the Egyptian Penn Arabic Treebank (ARZATB).arz.args
: a UniMorph 4.0 compatible verion ofarz
arz.gloss
: English glosses for the lemmas inarz
.README.md
: this file.
- The inflections of all the lemmas were generated through the CamelTools morphological generator component (demo, API) (Obeid et al., 2020). The morphological database used is CALIMA-ARZ (Habash et al., 2012). The CALIMA-ARZ database was designed as an analyzer and not a generator, and some portions of it came through imperfect automatic extensions. As a result, it over generated implausible forms. We used Morph/POS statistics from ARZATB to eliminate incorrect forms as much as possible.
- The POS and morphological features are then mapped to UniMorph according to the current schema (Sylak-Glassman 2016).
- The core POS of the lemmas are Verbs, Nouns, and Adjectives.
- The total number of lemmas is 6,347, with the following POS distribution:
V
: 1,323 (20.8%) lemmasN
: 3,248 (51.2%) lemmasADJ
: 1,776 (28.0%) lemmas
- Egyptian Penn Arabic Treebank (ARZATB): All the lemmas in both ARZATB (Maamouri et al., 2014) and in CALIMA-ARZ.
- Clitics were not included or marked in the inflection tables. The only clitic
included is the determiner
Al+
in order to be consistent with the ARZATB. - All the lemmas and the inflected forms are fully diacritized following the same convention in the ARZATB. Removing all the diacritics is straightforward and can be done through a simple regex. Alternatively, CamelTools provides a dediacritization utility: an API and a CLI.
- All nominals with
Al+
will be tagged withDEF
for definiteness. All nominals withoutAl+
will be repeated twice: once asINDF
and once asPSSD
. That is because in most cases possession marking is not overt due to the orthography. - All verbs are by default in the active voice.
Salam Khalifa and Nizar Habash (CAMeL Lab @ NYU Abu Dhabi)
The complete inflection table for the verb lemma سِمِع 'listen'
سِمِع اَسْمَع V;IPFV;SG;1
سِمِع اِسْمَع V;MASC;IMP;SG;2
سِمِع اِسْمَعُوا V;MASC;IMP;PL;2
سِمِع اِسْمَعِي V;FEM;IMP;SG;2
سِمِع تِسْمَع V;IPFV;MASC;SG;2
سِمِع تِسْمَع V;IPFV;FEM;SG;3
سِمِع تِسْمَعُوا V;IPFV;MASC;PL;2
سِمِع تِسْمَعِي V;IPFV;FEM;SG;2
سِمِع سِمِع V;PFV;MASC;SG;3
سِمِع سِمِعت V;PFV;SG;1
سِمِع سِمِعت V;PFV;MASC;SG;2
سِمِع سِمِعتُوا V;PFV;MASC;PL;2
سِمِع سِمِعتِي V;PFV;FEM;SG;2
سِمِع سِمِعنا V;PFV;PL;1
سِمِع سِمْعُوا V;PFV;MASC;PL;3
سِمِع سِمْعِت V;PFV;FEM;SG;3
سِمِع نِسْمَع V;IPFV;PL;1
سِمِع يِسْمَع V;IPFV;MASC;SG;3
سِمِع يِسْمَعُوا V;IPFV;MASC;PL;3
The complete inflection table for the adjective lemma كُوَيِّس 'good'
كُوَيِّس الكُوَيِّس ADJ;DEF;MASC;SG
كُوَيِّس الكُوَيِّسَة ADJ;DEF;FEM;SG
كُوَيِّس الكُوَيِّسِين ADJ;DEF;MASC;PL
كُوَيِّس كُوَيِّس ADJ;INDF;MASC;SG
كُوَيِّس كُوَيِّس ADJ;MASC;SG;PSSD
كُوَيِّس كُوَيِّسَة ADJ;INDF;FEM;SG
كُوَيِّس كُوَيِّسَة ADJ;FEM;SG;PSSD
كُوَيِّس كُوَيِّسِين ADJ;INDF;MASC;PL
كُوَيِّس كُوَيِّسِين ADJ;MASC;PL;PSSD