Analyzing Language on your Finds

cti4sw · Dec 28, 2012

Often enough I see posts from people asking what language something is in and what it translates to. There are so many languages out there that bear similarities to other languages, it would help if the finder could reasonably analyze what language the find has rather than guessing. So here is a short guide to assist in language analysis for nonlinguistically-inclined people, from a former US Navy linguist :goldmedalred:

Written language in general has three accepted categories: abjads (no vowel representation), abugidas (diacritic vowel representation), and alphabetical (symbolic vowel representation).

From there, there are several types but 4 major types: phonetic (symbol for sound), syllabic (symbol for syllable/confluence of sounds), cuneiform/hieroglyphic (image for sound/word), and logographic (symbol for word). Some may argue that logographic and cuneiform are the same, and for the most part they'd be correct, except that logographic languages generally represent words only and not individual sounds. From syllabic we also have alphasyllabic, which only represents consonants vs. syllabic, which represents both consonants and vowels.

I used Google Translate for the sample sentences; the Spanish and French translations were accurate, so I have no reason to believe that the others won't be.

Here is a brief guide on modern languages:

Level 1: Romance Languages
Romance languages are Latin-based languages: Spanish, French, Italian, Portuguese, and Romanian.
They are generally considered to be the easiest languages to learn.
Romance languages are read from left to right.

Spanish words can be easily recognized by use of the tilde (ñ). French, Italian, Portuguese, and Romanian do not use tildes. In Spanish, "LL" is considered a letter in the alphabet and may open a word; again, not so with other Romance languages. Spanish is also the only language that uses reverse punctuation, such as ¡ and ¿.

A sample sentence for reference: "Ayer encontré dos monedas de plata y un anillo de oro detrás de la escuela. ¡Arriba!"

French words contract pronouns; other Romance languages do not. Examples: "Je t'aime", "d'ouerves", "l'école". French frequently uses an X as a means to describe plural: "chateau"/"chateaux". French also uses a wider variety of accents: è, é, ê, ë.

The sample sentence in French: "Hier, j'ai trouvé deux pièces d'argent et un anneau d'or dans le parc derrière l'école."

Italian bears more similarities to Spanish than French, but is still distinct due to frequent consonant doubling, especially S and Z. It does sometimes use a contracted form of the word "of": "d'oro" instead of "de oro". Italian also genderizes the word "of": "de" and "di"
The letter combination "uo" is often seen in Italian but rarely in French or Spanish.

The sample sentence in Italian: "Ieri ho trovato due monete d'argento e un anello d'oro al parco dietro la scuola."

Portuguese and French both use the letter ç, and Portuguese letters bear written similarity to other Romance languages, but their pronunciation may differ: "M" is pronounced "NG". Portuguese and Italian do not use the tilde; they use the letter combination "NG" to effect the "NY" sound.

The sample sentence in Portuguese: "Ontem encontrei duas moedas de prata e um anel de ouro no parque atrás da escola."

I know little of Romanian grammar, but I can see that Romanian uses these accents, where the others do not: č, ş
Romanian appears to be most similar to Latin in vocabulary and most similar to Italian & French in grammar.

The sample sentence in Romanian: "Ieri am găsit două monede de argint si un inel de aur de la parcul din spatele școlii."

Level 2: Germanic Languages
Germanic languages are English, German, Dutch, Afrikaans, Norse, Swedish, Danish, Icelandic, and Faroese.
Germanic languages are very guttural and difficult to learn.
Germanic languages are read from left to right.

English is pretty obvious; this entire post is in (American) English. True English words do not use any form of written stress or accents. English does not generally combine multiple words to form one; i.e. the German "kindergaarten" ("children" + "garden" = playground). Many modern English nouns are actually "stolen" words from other languages: alcohol, algebra, bodega. In English, adjectives ALWAYS precede the noun and are never quantitative or genderized: yellow car, smelly feet.

The sample sentence in English: "Yesterday, I found 2 silver coins and a gold ring at the park behind the school."

German is the root of this language category. German puts the most stress on the first syllable of its words. It also has the second-largest individual vowel list (14) behind Swedish (17). German verbs, like other Germanic languages, must precede or be preceded by at least one noun as subject and also uses capital letters to designate nouns. There are other non-English letters: β, þ, ɣ, and ∅.

The sample sentence in German: "Gestern fand ich 2 Silbermünzen und einen goldenen Ring im Park hinter der Schule."

Afrikaans traces its origins to Dutch colonization in Africa. It uses a mix of Dutch and tribal vocabulary with Dutch grammar. Dutch bears similarities to some German dialects but not mainstream German, much like Romance language similarities.

The sample sentence in Afrikaans: "Gister het ek het 2 silwer munte en 'n goue ring by die park agter die skool."
The sample sentence in Dutch: "Gisteren vond ik twee zilveren munten en een gouden ring in het park achter de school."

Norse, Swedish, and Danish can also trace their roots to the Scandinavian Runic language varieties, which were phonetic. You can see the similarities to German grammar, but also how these languages retain their original vocabulary.

The sample sentence in Norse: "I går fant jeg to sølvmynter og en gullring i parken bak skolen."
The sample sentence in Swedish: "Igår hittade jag två silvermynt och en guldring i parken bakom skolan."
The sample sentence in Danish: "I går, fandt jeg to sølvmønter og en guldring i parken bag skolen."

Icelandic and Faroese are island languages of Iceland and the Faroe Islands. Again, Germanic grammar coupled with native vocabulary. Iceland & the Faroes were largely settled by Vikings. Unfortunately, Google Translate doesn't have a Faroese option.

The sample sentence in Icelandic: "Í gær fann ég tvö silfur mynt og gullhring í garðinum á bak við skólann."

While Gaelic is not a major Germanic language, you can certainly see the similarities in the grammar:
The sample sentence in Gaelic: "Inné, fuair mé dhá bhonn airgid agus fáinne óir ar an pháirc taobh thiar den scoil."
Gaelic likely also traces roots to Scandinavian runic languages, but I am no expert on that.

Level 3: Cyrillic-Based Languages & Indo-European Languages
Cyrillic is not actually a language class but an alphabet type created by St. Cyril & Methodius in the 10th century. It's a syllabic alphabet; it does represent vowels but also represents syllables rather than only individual sounds. Cyrillic-based languages are read from left to right. Russia, Bosnia, Serbia, Czechoslovakia, Ukraine, Mongolia, and a few northern -istans all use a Cyrillic alphabet.
It is very difficult for non-speakers to differentiate between the Cyrillic varieties, so only a Russian sample will be shown.

The sample sentence in Russian: "Вчера я нашел две серебряные монеты и золотые кольца в парке за школой."

Indo-European languages are made up of the many Greek dialects (all using the Greek alphabet). Greek has a syllabic (no vowels!) alphabet that is similar in appearance to Cyrillic alphabets.

The sample sentence in Greek: "Χθες, βρήκα δύο ασημένια νομίσματα και ένα χρυσό δαχτυλίδι στο πάρκο πίσω από το σχολείο."

Level 4: Eurocentric & Indo-Aryan Languages
Eurocentric languages refer to most of the Middle Eastern languages: Arabic, Hebrew, Aramaic, Amharic, Farsi, Urdu, and Tigrinya. Indo-Aryan languages refer to the languages of India and Pakistan: Punjabi, Hindi, and Sanskrit. Both types of languages are read from right to left. Arabic and Farsi are very similar, with exceptions to the additional Farsi letters C (پ) and G (گ), and the word-symbols for "the" (ي) and "no" (چ). No Level 4 language uses capitalization, and all use the adjective-follows-noun pattern.

The sample sentence in Arabic: ".أمس، وجدت الفضة سنتين وخاتم من الذهب في الحديقة وراء المدرسة"
The sample sentence in Hebrew: ".אתמול, מצאתי שני מטבעי כסף וטבעת זהב בפרק שמאחורי בית הספר"
The sample sentence in Farsi: ".دیروز، من دو سکه نقره و یک انگشتر طلا در پارک پشت مدرسه"
The sample sentence in Hindi: ".कल, मैं दो चांदी के सिक्के और स्कूल के पीछे पार्क में एक सोने की अंगूठी पाया"

Arabic and Farsi (Persian/Iranian) both do not have a print style; they are cursive-only. Each letter has at least 2 but up to 4 forms depending on where in the word it can be found (beginning, middle, end, and standalone). Each letter has at least an end form and a standalone form.

Level 5: Asian Languages
Asian languages are largely logographic languages. They are by far the most difficult languages to learn. Very few, such as the countries in the southeast (Vietnam, Laos, etc) use phonetic alphabets but retain their Asian roots. The major Asian languages are Chinese (3 dialects), Japanese, Korean, and Indonesian dialects. Chinese alone contains more than 50,000 individual characters. Japanese characters are far simpler than Chinese characters. Most logographic Asian languages are printed left to right with minimal punctuation. Calligraphic writings are written from the top down, from right to left. I have absolutely no experience in Asian grammar; I can recognize and differentiate, but that's about it. I do happen to know that Japanese does use SOME Chinese characters but identification depends on which writing system is predominant. Chinese does not, that I'm aware of, use any Japanese characters.

The sample sentence in Traditional Chinese: "昨天，我發現了兩個銀幣和一枚金戒指在學校後面的公園."
The sample sentence in Japanese: "昨日、私は二つの銀貨や学校の背後にある公園で金の指輪を発見した。"
The sample sentence in Korean: "어제, 두 실버 동전과 학교 뒤 공원에서 황금 반지를 발견했다."
The sample sentence in Vietnamese: "Hôm qua, tôi đã tìm thấy hai đồng bạc và một chiếc nhẫn vàng tại công viên phía sau trường."

Everywhere Else
Most other countries owe their linguistic roots to whomever last conquered them or whichever culture dominantly immigrates there. When countries are conquered, or experience excessive immigration, their original native language typically merges with the conquering/immigrating people's language, much like Vietnamese, Icelandic, Faroese, and Philipino. Other language groups include Polynesian (Pacific island languages), Creole languages, and Caribbean languages (Haitian, Papiamentu, etc). Islandic "nations" are frequently subjected to this sort of interracial and intercultural intercourse.

Obsolete Languages

I am no anthropologist, no philologist, no etymologist, etc. so any information contained hereafter is based on what I read online, what I deduce by common logic, and my prior education.

Pictographic Alphabets
Almost all known ancient indigenous aboriginal written language systems were pictographic or logographic. You have but to look from ancient Egyptian to Incan and Aztec to see what I mean. Even the native American tribes, when they had written language systems, initially used pictographs for communication. Simplistic primitive cultures - so-called "cavemen" (Neanderthals, Cro-Magnons, etc) - drew picture-stories on the walls of their homes and sacred areas.

Logographic Systems
Chinese character logography has been around for thousands of years. Much of modern Asian written language is based on Chinese characters.

Syllabic Alphabets
These systems follow the patterns of Runic and other syllabic alphabets like Inuktitut, the language of the Inuit (Eskimos). Inuktitut appears to have written roots in Semitic language systems such as Hebrew. While the only native American language to effect a writing system was Cherokee, which uses a syllabic Latin-based diacritic alphabet, native American pictographs are not uncommon.

Hope this helps!

EDIT: There is a huge difference in (possible) translations between those of a native speaker, who will try to translate the idiom into everyday language, and a linguist, who will analyze the word or phrase for its true meaning(s). Native-speaker translations may be correct translations but may not be accurate for the piece (and vice versa).

DocBeav · Dec 29, 2012

Great job man! Hopefully that will point some people in the right direction!

sagittarius98 · Dec 30, 2012

I knew some of this stuff, but it's good to have a nice reference source.

I know Polish, so you can add it as part of your reference guide. It's hard for me to express grammar rules in any language, as it just comes naturally. You can use this however.

http://polish.slavic.pitt.edu/firstyear/nutshell.pdf

The sample sentence: Wczoraj, ja znalazłem dwa srebrne pieniądze i złoty pierśćionek w parku z tyłu szkoły.

cti4sw · Jan 2, 2013

sagittarius98 said:
I knew some of this stuff, but it's good to have a nice reference source.

I know Polish, so you can add it as part of your reference guide. It's hard for me to express grammar rules in any language, as it just comes naturally. You can use this however.

http://polish.slavic.pitt.edu/firstyear/nutshell.pdf

The sample sentence: Wczoraj, ja znalazłem dwa srebrne pieniądze i złoty pierśćionek w parku z tyłu szkoły.

Is Polish considered a Slavic (Eastern European) or Germanic language? I wasn't sure, so I didn't include it with any of the Eastern European languages.

Your post prompts me to add an important note that I'd forgotten... "There is a huge difference in (possible) translations between those of a native speaker, who will try to translate the idiom into everyday language, and a linguist, who will analyze the word or phrase for its true meaning(s). Native-speaker translations may be correct translations but may not be accurate for the piece."

Not trying to knock native fluency, just noting an important fact that people often overlook when they "get something translated by [their] close/dear/cherished/best friend/relative." The other oversight that befalls native speakers is the confluence of languages; for instance, 12% of all English vocabulary are words of Greek origin. Yet someone who reads those words would translate them to English, when they may actually be Greek. And so on. Linguists, on the other hand, look at the language differently, and will narrow the language down to what it actually is, and base the translation off the analysis. Which is more correct for identification anyway.

sagittarius98 · Jan 7, 2013

cti4sw said:
Is Polish considered a Slavic (Eastern European) or Germanic language? I wasn't sure, so I didn't include it with any of the Eastern European languages.

Your post prompts me to add an important note that I'd forgotten... "There is a huge difference in (possible) translations between those of a native speaker, who will try to translate the idiom into everyday language, and a linguist, who will analyze the word or phrase for its true meaning(s). Native-speaker translations may be correct translations but may not be accurate for the piece."

Not trying to knock native fluency, just noting an important fact that people often overlook when they "get something translated by [their] close/dear/cherished/best friend/relative." The other oversight that befalls native speakers is the confluence of languages; for instance, 12% of all English vocabulary are words of Greek origin. Yet someone who reads those words would translate them to English, when they may actually be Greek. And so on. Linguists, on the other hand, look at the language differently, and will narrow the language down to what it actually is, and base the translation off the analysis. Which is more correct for identification anyway.

Polish is a West Slavic language (Russian is East Slavic). I translated it word for word, as it made sense that way. In normal speaking, I would omit the "ja" (which means I) part, as it is not necessary. That is also the case in Spanish, when you can say the verb without the subject when it is clear.

cti4sw · Jan 7, 2013

sagittarius98 said:
Polish is a West Slavic language (Russian is East Slavic). I translated it word for word, as it made sense that way. In normal speaking, I would omit the "ja" (which means I) part, as it is not necessary. That is also the case in Spanish, when you can say the verb without the subject when it is clear.

Right; in my Spanish translation I did omit the "yo" because I know it's not necessary. If it's in the other translations, it'd be because I'm not familiar with the language's grammar.

sagittarius98 · Mar 14, 2013

Language index

cti4sw · Mar 18, 2013

sagittarius98 said:
Language index

Omniglot's a great site if you already have an idea of what you're looking for.

Analyzing Language on your Finds

cti4sw

Bronze Member

Amazon Forum Fav 👍

DocBeav

Bronze Member

sagittarius98

Gold Member

cti4sw

Bronze Member

sagittarius98

Gold Member

cti4sw

Bronze Member

sagittarius98

Gold Member

cti4sw

Bronze Member

Cold weather battery life extender ....

When to stop

Top Member Reactions

Users who are viewing this thread

Latest Discussions