Korean for Programmers


I was inspired to write this after reading German for Programmers. It's highly likely that I'm just biased to see things a certain way because I work with programming languages for a living, but I often find myself drawing comparisons among Korean and programming.

Note that I will not provide romanization. See Lack of Romanization at the end for more information as to why.

Hangeul Basics

Before we can get into sentences, some explanation about Hangeul–the Korean alphabet–is needed. Korean is actually like English in that letters are put together, so you can read any Korean once you learn Hangeul. However, Korean letters join together in syllable blocks.

You may have seen the infamous "Learn how to read Hangeul in 10 minutes" graphic. While Hangeul is indeed very easy to learn, it does have exceptions–quite a few actually. So maybe you can read 독립 (independence), but the actual pronunciation is quite different due to ㄹ following ㄱ. These are called 받침 rules and one just has to memorize 'em.

Grid System

Unlike Chinese or Japanese, Korean syllable blocks take the hangeul letters and make them into grids, in a specific order.

Korean glyphs are always constructed the same way, depending on whether the vowel is vertical (ㅣ ㅏ ㅓ) or horizontal (ㅡㅜㅗ). The second character is always a vowel. Always. So if you want to say "ah", ㅇ is used as a silent character in the first slot–which makes 아.

Some vowels can also be combined: ㅚ is + , ㅙ is + .

Finished blocks like 쀍 (ㅃㅜㅔㄹㄱ) are included because it's possible to construct them via grid rules, but they're essentially gibberish. Hundreds, maybe thousands of these exist in the Hangeul unicode range. Korean (hangeul) is represented by 5 unicode blocks in total and I recommend reading about how they get composed together. [TODO]

Sentence Order

English is an SVO (Subject Verb Object) language. "Billy ate an apple". Korean is an SOV language. "Billy apple ate". This reversal makes thinking in Korean actually quite difficult at first, once one gets to more difficult sentences.

English: I went to the park because it was sunny out today. Korean: It was sunny out today so I went to the park.

English: I was so tired I could barely walk. Korean: To the extent that I could barely walk I was tired.

From the perspective of a native English speaker, I find this utterly fascinating.

Subject Elision, especially -you-

The subject/topic can be omitted a lot of the time in Korean. In fact, including it in every sentence isn't natural at all! Korean is highly contextual, so if you say:

먹었어요 – literally: ate, interpreted as: "I ate"

In a lot of contexts, that would make most sense as (I) ate. However if someone asks you, "Did Charlie eat?", you can still say:

(네,) 먹었어요. – (yes,) ate. "Yes, Charlie ate."

Similarly if you ask someone else a question, it's usually obvious if they are the subject. If you use honorifics it's even more obvious because you can't use honorifics on yourself. When would you directly ask someone "Did I read this book"? (If you want to muse to yourself like "ah, did I already read this book..?", that has a different grammar point)

이 책을 읽었어요? – this book read? "Did you read this book?" 이 책을 읽으셨어요? – this book read{honorific}?


The subject/topic is still included when the subject changes or it's ambiguous.

In this sense I guess one could say Korean leans toward being dynamically typed–but that's a tenuous comparison. Anyway, it's different from English where the subject must always be included (statically typed...?), except in some spoken slang. Even very informal slang like "d'ya eat?" still includes the subject. We're playing a bit fast and loose here with the comparisons.


A word of caution: Using the words for "you" (너, 네가, 당신) is also quite rude unless you are close with someone. Papago and google translate often translate "you" to 너 or 네가. Don't use these until you know what you're doing, and certainly not with anyone besides close friends! 당신 has different nuances and should also be avoided. It's really hard for English speakers to let go of "you", but you must.

Sentences & Grammar

Let's start with a simple sample sentence in Korean.

저는 사과를 어제 먹었어요.

저(는) 사과(를) 어제 어요 .

subject + marker object + marker adverb verb stem past tense modifier politness conjugation .

Markers

Korean has a number of postpositional particles that imbue meaning, some of which vary if the last character is a vowel or not. We can easily use a pure function to model this:

fn get_topic_marker(text: &str) -> &str {
  match text.get_last_character_kind() {
    Vowel => { "는" },
    Consonant => { "은" }
  }
}

fn get_plural_marker() -> &str {
  "들"
}

Here are a few other (simplified) markers. The text inside the parentheses is used if the last character is a consonant:

  • ~에서 => from
  • ~까지 => until
  • ~(으)로 => several meanings. roughly-speaking it shows how/via what method or material something is carried out, or "toward" a place if used with "to go", etc.

Pipelining Data Transformations

Where the previous sentence started to get interesting was at the end, with the verb, tense, and politeness. That's not all we can transform though. The next two sentences here are "I closed the door" (informal impolite, then informal polite), after is "My parents closed the door" (formal, polite).

나는 그것을 .

저는 그것을 어요 .

우리 부모님은 그것을 으시 습니다 . (Note that I'm showing here separately to break down the components but they would get merged to 셨)

The main verb here is 닫다–to close. All Korean verbs end in 다, so the first thing we need to do is remove and keep the verb stem.

Now we need the honorific level. In the first two sentences I'm talking about myself, so I can't use honorifics. However with "my parents" it's common to use honorifics, which is ~(으)시.

Before we can add a tense, we need to determine what vowel to add, and if it should merge or not. 닫 ends in a consonant, so merging is not possible, but the last vowel is 아 so we append .

One way to do past tense is ~ㅆ, which gets merged with the previous vowel. Other tenses can depend on the last character being a vowel or not, like future tense (~ㄹ/을).

Finally, we need to think about our relationship to the audience and append or merge a politeness/speech level. See politeness below for more details.

We can model this as a basic pipeline à la Clojure:

(defn conjugate-verb
  [subject verb speaker audience]
  (->> verb
    (remove-stem)
    (maybe-append-honorific subject)
    (append-or-merge-vowel)
    (append-or-merge-tense)
    (append-or-merge-politeness-level speaker audience)
  )
)

Note that there are even more transformations that we can apply depending on the grammar point and the nuance of what one is trying to say.

Adding to the Stack

In linguistics, nominalization or nominalisation is the use of a word which is not a noun (e.g., a verb, an adjective or an adverb) as a noun, ... The term refers, for instance, to the process of producing a noun from another part of speech by adding a derivational affix (e.g., the noun legalization from the verb legalize).

In Korean, one can nominalize entire clauses and use them in other constructs! Korean lets you do this with the ~는 것 principle. 것 means thing, but any noun can be used in place of . Based on the tense, verb type, and whether the verb ends in a vowel, has variations like ~ㄴ, and . It can also combine with other grammar forms, like , which is 더~ + ~ㄴ/은. Digging into these would be beyond the scope of this post.

Sentence the First

So, let's take 'the girl walked to school'. In English and Korean this is straightfoward enough:

여자는 학교 걸어 갔어요 The girl walked to school

But what if you wanted to talk about that person? You could say "the girl who walked to school". In English, this these are known as relative causes. They can begin with who, which, that, where, etc, following the noun. Korean uses the ~는 것 nominalizer before the noun, which leads to:

학교 걸어 여자 ! Note that changed to . is 가다 (to go) + ~ㅆ (past tense). But the past tense nominalization form uses ~ㄴ (것). Instead of 것 (thing) we swapped it for another noun 여자 (woman).

Not that one would only say "the girl who walked to school" by itself, but we can now use the entire construct as a noun in other sentences:

저는 학교로 걸어 간 여자를 알았어요 I the girl who walked to school knew

Sentence the Second

We can try a more complex sentence now: "That's the place (that) I thought I went to!". First, we need to break it down in a sentence can that be nominalized. "I thought I went". Here we can use the grammar point ~ㄴ 줄 알다 which when used means the speaker thought something was true, but realized it wasn't–due to a lapse in judgement, etc.

제가 간 줄 알다 I thought (that) I went

You may have noticed that this grammar point itself uses ~ㄴ 것, but with instead of 것! This 줄 is a bound noun, meaning that it can only be described by a ~는 것 clause. Outside of ~ㄴ 줄 알다, 줄 can also be a regular noun meaning line/rope.

그곳은 제가 어디에 간 줄 알았 이야! That is the place I thought I went to!

Sentence the Third

Can we go even deeper?

[나는] (( 상황이 억울하다고 말하는 불평불만 ) 만 하는 사람은 ) ( 별로 좋아하 지 않는다 ).

[I'm] ( not keen on ) ( people who only ( complain that things are unfair )).

This sentence doesn't really translate 1:1 to English, as is the case with most intermediate/advanced Korean sentences.


Nominalizing with ~는 것 is my favorite aspect of Korean because it's an important grammar point that blew my mind once I learned how it worked. It's quite commonly used–in day to day usage I might say something like the house (that) I used to live (in) –  제가 았던 , et cetera.

Language Tidbits

These are some cool traits about Korean, or things related to this post, that don't necessarily have to deal with programming.

Politeness / Formality

The Korean language conjugates differently based on the status of speaker and intended audience. For example, one of the simplest ways to conjugate any verb is to add ~어/아/여 to it. This is based on the last vowel, not the last character.

For example, you may have seen 감사합니다 before ("thank you", formal polite). This is 감사하다, merged with ~ㅂ니다 because ends in a vowel. 고마워요 is another way to say thank you(informal polite): 고맙다 + apply irregular ㅂ consonant ending filter + ~아/어/여요.

Impolite doesn't mean "rude" here, by the way.

fn get_vowel_for_verb(verb: &str, formality: Formality) -> {
  // ha = 하
  if verb.stem_ends_with_ha() {
    "여"
  } else {
    match verb.last_vowel() {
      "아" => "아"
      "오" => "아"
      "어" => "어"
      "우" => "어"
      "이" => "어"
      "의" => "어"
      "위" => "어"
    }
  }
}

Korean has seven speech levels [TODO]. When learning Korean, the 아/어/여요 and ~ㅂ/습니다 levels are commonly used, in that order. Using 아/어/여 (no ) to anyone other than close friends (who have agreed to use lowered speech) or young kids is rude. Foreigners get a pass at first but it's still impolite.

Plain (sometimes known as diary) form is also used, such as in diaries and books/novels.

English lacks this concept, as we use the same conjugation for everyone – "the prisoner ate", "the king ate", and "a God ate". What English does have is different registers [TODO], such as when you text versus when you write an academic paper or a business email. This includes overly polite language like "Might you be interested in eating, sir?", but nevertheless the verb remains the same.

Quoting Statements

Quoting plain statements in Korean is very easy. All you need to do is take the sentence, conjugate the verb into plain form, and append ~고 (말)하다. For verbs, ~ㄴ/는다 is the plain form. For adjectives, it's just , or the base verb.

(저는) 먹었어요. (제가) 먹었 다고 말했다 – I ate. I said I ate.

Depending on the type of statement, different particles than 다 are used.

  • declarative =>
  • inquisitive =>
  • propositive (let's ...) =>
  • imperative or =>
  • declarative with 이다 (to be) as the verb =>

이것을 좋아하다고? – (You said that) you like this?

Since this is used so much in speech, the (말)하다 (to say) part is often omitted. Korean learners hear this a lot from natives because Korean pronunciation is tough.

Lack of Romanization

Why does this article lack romanization? Because romanization is bad. English and Korean sounds do not map neatly to one another. The issue here is that Korean learners mentally map {some english sound} => {some hangeul letter} and it hurts their pronunciation skills immensely. For example, you may see this in beginner resources:

d = ㄷ

Except this is wildly wrong because only ㄷ is ㄷ. It is close to d, in the same sense that water is close to salt water. If you're learning Korean, listen to videos that teach the sound, not the most approximte English letter.

I also have seen people write things like anyeonghasaeyo jal jinaeyo? or similar, and it hurts my brain and heart trying to read it.

Furthermore, romanization systems can change over time. 조 used to be romanized as Cho, now it's Jo. So when I read older books that have romanized Korean it forces me to go and learn the older system as well. 조 isn't Jo or Cho anyway... it's somewhere in between.

Contributors

  • Article – Andrew Zah
  • Editing, sentence suggestions – 웁스

References

  • [0] ↩ Grid Order

  • Hangul in Unicode - https://en.wikipedia.org/wiki/Korean_language_and_computers#Hangul_in_Unicode

  • https://en.wikipedia.org/wiki/Korean_speech_levels

  • 받침 (Final Consonant) exception rules - http://www.koreanwikiproject.com/wiki/%EB%B0%9B%EC%B9%A8