Proto-Semitic is the hypothetical proto-language of the Semitic languages.



The earliest attestations of a Semitic language are in Akkadian, dating to ca. the 23rd century BC (see Sargon of Akkad) and Eblaite, but earlier evidence of Akkadian comes from personal names in Sumerian texts. Researchers in Egypt also claim to have discovered Canaanite snake spells that "date from between 3000 and 2400 B.C." [1].


Migration from Arabia into the fertile crescent has been a constant pattern of human movement in the Middle East since antiquity. As such, the Arabian peninsula has long been accepted as the original Semitic Urheimat by a majority of scholars.[2][3][4][5]Older theories positing Mesopotamia as the Semitic homeland were severely undermined by the identification of the non-Semitic Sumerian culture in Mesopotamia in the late 19th century, which is now generally believed to have predated the Semitic culture in Mesopotamia by many centuries. A mainstream view nowadays maintains that the first wave of Semitic-speakers infiltrated Mesopotamia in the first half of the third millennium BC. A second Amorite wave is generally believed to have followed around 2000 BC. This Amorite wave was responsible for emergence of the Old Babylonian Empire and of such urban centers in the west as Ugarit. An Aramean wave of migration towards the fertile crescent followed in the second half of the second millennium BC. The emergence of the Israelite nation in Canaan should have occurred around this time, although the origin of the Israelites remains a matter of debate. The Arab waves of migration toward the fertile crescent started in the last millennium BC and culminated in the 7th century CE with the great Islamic expansion, which by far surpassed all previous expansions, reaching a maximum extent from southern France to the borders of China.

The presence of a non-Semitic culture predating the Canaanites in Canaan has not been proven by archeology. However, a traditional account transmitted by many Greek historians and accepted unanimously in pre-modern times points to a Phoenician (Canaanite) origin in Mesopotamia, to which the Phoenicians had reportedly arrived from the Arabian shores of the Persian Gulf. Although many attempts have been made to discredit this entire story, it remains accepted by the highest living authority on the subject of Phoenicia.[6]

Given that Proto-Semitic would have been an Afroasiatic language, some believe that the first prehistoric speakers of the ancestral Proto-Semitic language came from Ethiopia, which would have been the Proto-Semitic homeland.[7] Most scholars, however, believe that South Semitic-speakers crossed the Yemen gap to Africa before the 8th century BC (see Dʿmt). This is also supported by the presence of nouns in proto Semitic that seemingly make an African origin for the language impossible - ice, oak, horse and camel. The camel[8] and horse[9] did not arrive in Africa until nearly two thousand years after Semitic languages were being written in the Mesopotamia area.

Other more recent work suggests Syria/Mesopotamia as the homeland for proto Semitic, due to the flora and fauna described by it, which include oak, pistachio and almond trees and the horse.[citation needed] The presence of ice and four different words for hill also suggest a colder, more mountainous area than Arabia.[citation needed] Eblaite, one of the oldest Semitic languages, when deciphered turned out to have almost no non-Afroasiatic nouns in its lexicon, suggesting a very long presence in the Syria area.[citation needed] Bitumen and naphtha were also well known and have root words, and these are resources not found in Africa or Arabia, but commonly in the northern parts of the Levant. Christopher Ehret shows on this basis that there are two possible homelands for Semitic, Northern Mesopotamia where Western Semitic broke away from Eastern Semitic; or Syria-Palestine. Ehret states "Because of the many indications that non-Semitic languages predominated in Mesopotamia and all around its northern and eastern flanks in the pre-state eras—and that Akkadian therefore was likely intrusive to that region—the second solution seems by far the more probable of the two. The Syria-Palestine regions, as the part of Asia nearest and more directly connected to Africa, also make much better sense as the proto-Semitic territory, considering the solely African locations of all the rest of the Afrasan family."[10] More recent study by Ehret and others using Bayesian techniques in phylogenetic analysis identifies a place of origin in the Levant, giving rise to the most basal of Semitic languages in Akkadian.[11]

Recently, Juris Zarins has suggested the development of a Circum-Arabian Nomadic Pastoral Complex of cultures in the period of the 6,200 BCE climatic crisis, stretching from Southern Palestine down the Red Sea shoreline and northeastward into Syria and Iraq, which spread Proto-Semitic languages through the region[12]. This complex may have developed from the fusion of Harifian and Pre-Pottery Neolithic B cultures in Southern Palestine.

As Harifian used the Outacha retouch point technique found earlier in the Fayyum, it has been suggested that Proto-Semitic may have come from Egypt across the Sinai.[13] Given the fact that Semitic is most closely related to the Ancient Egyptian language of all the Afro-Asiatic languages,[14] this origin is also distinctly possible. However, regarding resemblances among language subgroups, recent "research into the lexicon would seem to suggest a closer relationship between Chadic and ancient Egyptian".[15]

Sound system

The reconstruction of Proto-Semitic was originally based primarily on the Arabic language.[16] Thus, the phonemic inventory of reconstructed Proto-Semitic is very similar to that of Arabic, with only one phoneme less in Arabic than in reconstructed Proto-Semitic. As such, Proto-Semitic is generally reconstructed as having the following phonemes (as usually transcribed in Semitology)[17]:


Consonant phonemes
  Labial Inter-
Palatal Velar Pharyn-
Central Lateral
Nasal *m [m]   *n [n]          
Stop voiceless *p [p]   *t [t]     *k [k]   *’ [ʔ]
voiced *b [b]   *d [d]     *g [ɡ]    
emphatic *ṭ     *q  
voiceless   *ṯ *s₁ (š)
*s₃ (s)
*s₂ (ś)   *ḫ [x] *ḥ [ħ] *h [h]
voiced   *ḏ *z     [ɣ] [ʕ]  
emphatic *ṱ *ṣ *ṣ́        
Trill     *r [r]          
Approximant       *l [l] *y [j] *w [w]    

The probable phonetic realization of most consonants is straightforward, and is indicated in the table with the IPA. Two subsets of consonants however call for further comment:


The sounds notated here as "emphatic" sounds occur in nearly all Semitic languages, as well as in most other Afroasiatic languages, and are generally reconstructed as glottalized in Proto-Semitic. [nb 1] Thus, *ṭ for example represents [tʼ]. (See below for the fricatives/affricates).

In modern Semitic languages, emphatics are variously realized as pharyngealized (Arabic, Aramaic: eg. [tˤ]), glottalized (Ethiopian Semitic languages, Modern South Arabian languages: eg. [tʼ]), or as unaspirated (Turoyo of Tur-Abdin: eg. [t˭])[18]; Modern Hebrew and Maltese are exceptions to this general retention, with all emphatics merging into plain consonants.

An emphatic labial occurs in some Semitic languages but it is unclear whether it was a phoneme in Proto-Semitic.

  • Hebrew developed an emphatic /ṗ/ phoneme to represent unaspirated /p/ in Iranian and Greek.[19]
  • Ge'ez is unique among Semitic languages for contrasting all three of /p/, /f/, and /pʼ/. While /p/ and /pʼ/ mostly occur in loanwords (especially Greek), there are many other occurrences where the origin is less clear (e.g. hepʼä 'strike', häppälä 'wash clothes').[20]


PS is reconstructed as containing six "s"-type sounds: voiced *z, two emphatics *ṣ, *ṣ́, and three plain voiceless *s, *ś, *š. The exact nature of the distinctions within each class has always been a perplexing problem.

In current view the difference between *s, *š is taken as that between an affricate and a plain fricative [ts, s]. As the consonants *z, *ṣ are the voiced and emphatic counterparts of *s specifically, they are reconstructed as affricates as well: [dz, tsʼ].

Affricates in PS were proposed long since but the idea only seems to have met general acceptance since Alice Faber (1981)[citation needed], toppling the older approach to retain all four as fricatives - *s, *z, *ṣ as plain alveolar [s, z, sʼ], and *š as postalveolar [ʃ].

The Semitic languages that have survived to the modern day mostly have fricatives for these consonants. Ethiopic languages and Hebrew (in many reading traditions) still have an affricate for *ṣ. [21] Many sources of evidence lend plausibility for further affricates in not only Proto-Semitic, but also ancient Semitic languages:

  • The sign from the Old Akkadian script representing s, z, ṣ was borrowed by other languages (e.g. Hittite) to represent affricates. [22]
  • In Akkadian underlying ||t, d, ṭ + š|| was realized as ss. This is much more natural if the law was phonetically ||t, d, ṭ + [s]|| → [tts]. [22]
  • The Canaanite sound change of *ṯ is also much more natural if *š was [s], than if it was [ʃ].[citation needed]
  • Egyptian transcriptions of Semitic names and loanwords render *z, *s, *ṣ as dz and ts.
  • Aramaic and Syriac had an affricated realization of *ṣ up to some point, as seen in Old Armenian loanwords (e.g. Aram. צרר 'bundle, bunch' → OArm. 'crar' /tsɹaɹ/)). [22]
  • Older Semitic borrowings in Armenian have also /tsʰ/ and /dz/ for *s and *z. [21]
  • Other branches of Afro-Asiatic also have affricates corresponding to these consonants, and /*s/ for PS /*š/.[citation needed]

Judging by evidence from South Arabian[citation needed], it was determined that *ś, *ṣ́ were likely not sibilants, but lateral obstruents: [ɬ, (t)ɬʼ] (where the emphatic can also be reconstructed as an affricate).

The shift →h occurred in most Semitic languages (besides Akkadian, Minaian, Qatabanian) in grammatical and pronominal morphemes, and it is unclear whether reduction of began in a daughter proto-language or in PS itself. Given this, some suggest that weakened * may have been a separate phoneme in PS.[23]

Reflexes of Proto-Semitic sounds in daughter languages


Each Proto-Semitic phoneme was reconstructed to explain a certain regular sound correspondence between various Semitic languages. Note that Latin letter values (italicized) for extinct languages are a question of transcription; the exact pronunciation is not recorded.

Most of the attested languages have merged a number of the reconstructed original fricatives, though South Arabic retains all fourteen (and has added a fifteenth from *p → f).

In Aramaic and Hebrew, all non-emphatic stops were softened to fricatives when occurring singly after a vowel, leading to an alternation that was often later phonemicized as a result of the loss of gemination.

In languages exhibiting pharyngealization of emphatics, the original velar emphatic has rather developed to an uvular stop [q].

Proto-Semitic Akkadian Arabic1 Ugaritic Phoenician Hebrew Modern Hebrew Aramaic Ge'ez Modern
South Arabian
*b b ب b b Phoenician beth.png b ב /b /v/, /b/ ב /b /b/ /b/
*d d د d d Phoenician daleth.png d ד /d /d/ ד /d /d/ /d/
*g g ج ǧ *[ɡʲ]→[d͡ʒ]1 g Phoenician gimel.png g ג /g /ɡ/ ג /g /ɡ/ /ɡ/
*p p ف f p Phoenician pe.png p פ /p /f/, /p/ פ /p /f/ /f/
*t t ت t t Phoenician taw.png t ת /t /t/ ת /t /t/ /t/
*k k ك k k Phoenician kaph.png k כ /k /χ/, /k/ כ /k /k/ /k/
- ء ʼ [ʔ] ʼ Phoenician aleph.png ʼ א ʼ /ʔ/, - א ʼ /ʔ/ /ʔ/
*ṭ ط [tˁ] Phoenician teth.png ט /t/ ט /tʼ/ /tˁ/
*ḳ q ق q q q ק q /k/ ק q /kʼ/ /q/
*ḏ z ذ [ð] d Phoenician zayin.png z ז z /z/ ז4 4/d /z/ /ð/
*z ز z z ז z /z/
*ṯ š ث [θ] Phoenician sin.png š שׁ š /ʃ/ ש4 4/t /s/ /θ/
س s š שׁ š /ʃ/, /h/
ش š [ʃ] שׂ2 ś2 /s/ שׂ4 ś4/s /ɬ/ /ɬ/
*s s س s s Phoenician samekh.png s ס s ס s /s/ /s/
*ṱ ظ [ðˁ] ġ ṣ צ /ts/ צ4 4/ /tsʼ/ /θˁ/
*ṣ ص [sˁ] צ /sˁ/
*ṣ́ ض *[ɮˁ]→[dˁ]1 ע ʻ /ɬʼ/ /ɬˁ/
- غ ġ ġ,ʻ Phoenician ayin.png ʻ ע3 ʻ3 /ʔ/, - ק4 ġ4/ʻ /ʕ/ /ɣ/
-5 ع ʻ [ʕ] ʻ ע ʻ /ʕ/
*ḫ خ [x] Phoenician heth.png ח /χ/ ח /χ/ /x/
*ḥ -5 ح [ħ] /ħ/ /ħ/
*h - ه h h Phoenician he.png h ה h /h/, - ה h /h/ /h/
*m m م m m Phoenician mem.png m מ m /m/ מ m /m/ /m/
*n n ن n n Phoenician nun.png n נ n /n/ נ
/n/ /n/
*r r ر r r Phoenician res.png r ר r /ʁ/ ר r /r/ /r/
*l l ل l l Phoenician lamedh.png l ל l /l/ ל l /l/ /l/
*w w و w w
Phoenician waw.png
Phoenician yodh.png
/w/ /w/
*y y ي y [j] y Phoenician yodh.png y י y /j/ י y /j/ /j/
Proto-Semitic Akkadian Arabic Ugaritic Phoenician Hebrew Modern Hebrew Aramaic Ge'ez Modern
South Arabian


  1. Arabic pronunciation is that of reconstructed Qur'anic Arabic of the 7th and 8th centuries CE. If the pronunciation of Modern Standard Arabic differs, this is indicated (for example, [ɡʲ]→[d͡ʒ]).
  2. Proto-Semitic appears to have merged with *s in Tiberian Hebrew, but is still distinguished graphically.
  3. Biblical Hebrew as of the 3rd century BCE apparently still distinguished ġ and (based on transcriptions in the Septuagint).
  4. Although early Aramaic (pre-7th century BCE) had only 22 consonants in its alphabet, it apparently distinguished at least 27 of the original 29 Proto-Semitic phonemes, including *ḏ, *ṯ, *ṱ, , *ṣ́, . This conclusion is based on the shifting representation of words etymologically containing these sounds; in early Aramaic writing, they are merged with z, š, , š, q, respectively, but later with d, t, , s, ʻ.[24] (Also note that due to begadkefat spirantization, which occurred after this merger, OAm. t→ṯ and d→ḏ in some positions, so that PS *t,ṯ and *d,ḏ may be realized as either of t,ṯ and d,ḏ respectively.)
  5. These are only distinguished from the zero reflexes of *h, *ʔ by e-coloring adjacent *a, e.g. pS *ˈbaʕal-um 'owner, lord' → Akk. bēlu(m)[25].


Proto-Semitic vowels are in general harder to deduce due to the templatic nature of Semitic languages. The history of vowel changes in the languages makes drawing up a complete table of correspondences impossible, so only the most common reflexes can be given:

Vowel correspondences in Semitic languages (in proto-Semitic stressed syllables)[26]
pS Hebrew Aramaic Arabic Ge'ez Akkadian
/ˈ_|1 /ˈ_Cː2 /ˈ_C|C3 usually4 /_C|ˈV
*a ā a ɛ a ə a a a,e,ē5
*i ē e ɛ. e e, i,
WSyr. ɛ
ə i ə i
*u ō o o u,o ə u ə, ʷə6 u
ō[nb 2] ā ā ā ā,ē
ī ī ī ī ī
ū ū ū ū ū
*ay| ayi,ay BA,JA ay(i), ē,
WSyr. ay/ī & ay/ē
ay ay, ē ī
*aw| ō,
pausal ˈāwɛ
WSyr. aw/ū
aw ō ū
  1. in a stressed open syllable
  2. in a stressed closed syllable before a geminate
  3. in a stressed closed syllable before a consonant cluster
  4. when the proto-Semitic stressed vowel remained stressed
  5. pS *a,*ā → Akk. e,ē in the neighborhood of pS *ʕ,*ħ and before r.
  6. I.e. pS *g,*k,*ḳ,*χ → Ge'ez gʷ,kʷ,ḳʷ,χʷ / _u

Correspondence of Sounds with other Afroasiatic languages

See table at Proto-Afroasiatic#Consonant correspondences.


Independent Personal Pronouns

English PS Akkadian Arabic Ge'ez Hebrew Aramaic
standard vernacular
I *ʔanāku[nb 3], *ʔaniya anāku ʔanā ʔanā, ʔāniy ʔana ʔānoxiy, ʔāniy ʔanā
Thou (sg., masc.) *ʔanka → *ʔanta atta ʔanta ʔinta ʔánta ʔattāh ʔantā
Thou (sg., fem.) *ʔanti atti ʔanti ʔinti ʔánti ʔatt ʔanti
He *suʔa huwa huwwa wəʔətu huwʔ huwʔ
She *siʔa hiya hiyya yəʔəti hiyʔ hiyʔ
We *niyaħnū, *niyaħnā nīnu naħnu niħnā nəħnā ʔanaħnuw náħnā
Ye (dual) *ʔantunā ʔantumā
They (dual) *sunā humā
Ye (pl., masc.) *ʔantunū attunu ʔantumu ʔintū ʔantəmu ʔattem ʔantun
Ye (pl., fem.) *ʔantinā attina ʔantunna ʔantən, ∅ ʔantən ʔatten ʔanten
They (masc.) *sunū sunu humu humma ʔəmuntu hēm hinnun
They (fem.) *sinā sina hunna hən, ∅ ʔəmāntu hēn hinnin

Cardinal numerals

English Proto-Semitic
One *ḥad-, *ʻišt-
Two *ṯin-, *kilʼ-
Three *śalāṯ-[nb 4]
Four *rabaʻ-
Five *ḫamš-
Six *šidṯ-
Seven *šabʻ-
Eight *ṯamān-
Nine *tišʻ-
Ten *ʻaśr-

These are the basic numeral stems without feminine suffixes. Note that in most older Semitic languages, the forms of the numerals from 3 to 10 exhibit gender polarity (also called "chiastic concord" or reverse agreement), i.e. if the counted noun is masculine, the numeral would be feminine and vice versa.


  1. ^ This explains why there is no voicing distinction in the emphatic series (which wouldn't be necessary if the emphatics were pharyngealized).
  2. ^ see Canaanite shift
  3. ^ While some believe that *ʔanāku was an innovation in some branches of Semitic utilizing an "intensifying" *-ku, comparison to other Afro-Asiatic 1ps pronouns (e.g. Eg. 3nk, Coptic anak, anok, proto-Berber *ənakkʷ) suggests that this goes back farther. (Dolgopolsky 1999, pp. 10-11.)
  4. ^ This root underwent long-distance assimilation to *ṯalāṯ- in the Central Semitic languages. This parallels the long-distance assimilation of *ś...š→*š...š in proto-Canaanite or proto-North-West-Semitic in the roots *śam?š→*šamš 'sun' and *śur?š→*šurš 'root'.(Dolgopolsky pp.61-62.)

  • Kienast, Burkhart. (2001). Historische semitische Sprachwissenschaft.
  • Dolgopolsky, Aron (1999). From Proto-Semitic to Hebrew. Milan: Centro Studi Camito-Semitici di Milano. 
  • Taylor; Francis (1997). The Semitic languages. Cambridge University Press. pp. 572. ISBN 0415057671. 
  • Woodard, Roger (2008). The Ancient Languages of Mesopotamia, Egypt and Aksum. Cambridge University Press. pp. 250. ISBN 0521684978. 

