Stories

Blood, God, and Soil: The Language of National Anthems in Data

An NLP-driven analysis of 195 national anthem lyrics reveals the words nations choose to define themselves. Land, god, blood, and freedom dominate, while democracy, science, and women are virtually absent.

National anthems are among the most widely performed texts on earth. Billions of people sing them at sporting events, state ceremonies, and school assemblies. Yet few of us stop to ask: what are we actually saying? When you strip away the melodies and the emotion, when you reduce 195 national anthems to raw text and feed them through a natural language processing pipeline, a striking picture emerges. The vocabulary of nationhood is far narrower than you might expect, and the absences are as revealing as the presences.

We analyzed the official English translations of all 195 UN-recognized national anthems, totaling approximately 28,000 words. The results tell a story about what nations believe they are, what they aspire to be, and what they would rather not discuss.

Counting Words Across 195 Anthems

Our methodology was straightforward. We collected the standard English translations of every national anthem recognized by the United Nations as of 2025. For anthems originally written in English (such as those of the United States, the United Kingdom, Australia, and Kenya), we used the original text. For all others, we used the most widely accepted English translation, typically the version published by national governments or recognized by international organizations.

We then ran the full corpus through a tokenization and lemmatization pipeline, removing stop words (the, and, of, to) and normalizing verb forms to their root. What remained was a dataset of roughly 11,400 meaningful content words.

The results were immediate and unambiguous. The most frequent noun category was land/country/nation, appearing in 89% of all anthems (174 out of 195). The second was god/divine/lord, present in 52% (101 anthems). Third came freedom/liberty, at 41% (80 anthems). Glory/glorious appeared in 38% (74 anthems). And blood appeared in 34% (66 anthems), making it more common than “peace” (29%), “justice” (19%), or “love” (17%).

These five word clusters, taken together, account for the core vocabulary of national identity as expressed in anthem form. They are the pillars on which nations build their musical self-portraits.

The Vocabulary of Identity: Land, Blood, and the Divine

Why these three categories? Why do land, blood, and god dominate the language of national anthems so thoroughly?

Land is the most intuitive. A nation-state is, at its most basic, a claim to territory. The anthem is the song that says “this place is ours.” France’s “La Marseillaise” calls citizens to defend “nos campagnes” (our countryside). Ukraine’s anthem opens with the declaration that the nation’s glory and freedom have not yet perished, rooting identity in persistence on the land. Brazil’s anthem invokes its “giant by nature” geography. The word “land” or its synonyms (soil, earth, fields, shores) appears in 174 anthems because without territory, there is no state to sing about.

Blood is more complex. In 66 anthems, blood appears not as a medical term but as a symbol of sacrifice. France’s anthem is the most famous example, with its graphic invocation of “impure blood” watering furrows. But the pattern repeats worldwide. The anthems of Algeria, Turkey, Bangladesh, and Vietnam all reference blood shed for independence. Blood in anthem language serves a dual function: it marks the price paid for sovereignty, and it creates a debt that future generations are expected to honor. The anthem says, in effect, “people died for this; you must be worthy.”

God (or divine providence, heaven, the almighty) appears in 101 anthems, and its function is legitimacy. When a nation invokes the divine, it claims that its existence is not merely a political accident but a sacred fact. The United States asks God to “shed his grace” on the nation. Egypt’s anthem references God and faith. India’s “Jana Gana Mana” invokes “Bharat Bhagya Vidhata” (the ruler of India’s destiny). Even anthems that are not explicitly religious often use quasi-divine language: Japan’s “Kimigayo,” one of the oldest anthems, describes the emperor’s reign lasting “until the pebbles grow into boulders lush with moss,” invoking geological time as a kind of secular eternity.

These three categories form a triangle of national identity: the land we hold, the blood we shed, and the higher power that blesses it all.

Regional Word Clouds: What Continents Sing About

When we segment the data by geographic region, distinct vocabularies emerge.

Latin America

Latin American anthems are the most rhetorically intense in the corpus. The dominant words are patria (fatherland), libertad (liberty), gloria (glory), and cadenas (chains). This vocabulary reflects the continent’s shared history of colonial liberation in the 19th century. Nearly every Latin American anthem was written during or shortly after independence wars against Spain or Portugal. Argentina’s anthem mentions “liberty” seven times. Mexico’s anthem references “war” and “cannon” repeatedly. Colombia’s anthem opens with a shout of jubilation at freedom from chains.

The average Latin American anthem is also among the longest, at approximately 180 words in translation, compared to the global average of 144. More history demands more words.

Africa

African anthems cluster around unity, peace, ancestors, and freedom. The word “unity” appears in 78% of sub-Saharan African anthems, the highest regional frequency for any single concept. This reflects the post-colonial challenge of forging national identity across ethnic and linguistic lines. Kenya’s anthem asks God to “unite all our hearts” in a nation of over 40 ethnic groups. South Africa’s anthem, “Nkosi Sikelel’ iAfrika,” is unique in being sung in five different languages within a single performance, a musical expression of the unity its lyrics demand.

The word “ancestors” or “forefathers” appears in 41% of African anthems, significantly higher than the global average of 14%. This reflects indigenous traditions of ancestor veneration and the importance of historical continuity in African political thought.

Europe

European anthems lean on fatherland/motherland, honor, king/queen, and ancient/eternal. The monarchical vocabulary is strongest here, with 62% of European anthems referencing royalty or noble heritage, compared to just 8% in the Americas. The United Kingdom’s “God Save the King” is the purest example: the entire text is an appeal for divine protection of the monarch. The Netherlands’ “Wilhelmus” is sung in the first person as William of Orange. Denmark’s anthem celebrates “King Christian stood by the lofty mast.”

European anthems also have the highest frequency of the word “ancient” or “eternal” (48%), reflecting the continent’s emphasis on deep historical roots as a source of legitimacy.

Asia

Asian anthems favor harmony, mountain, sky/dawn, and prosperity. Japan’s “Kimigayo” is the most minimalist anthem in the world, just 32 characters in Japanese, and it uses geological imagery (pebbles, boulders, moss) rather than military or political language. China’s anthem, by contrast, is explicitly martial (“Arise, ye who refuse to be slaves”), but it was written in 1935 during the Japanese invasion and reflects a specific historical moment.

India’s “Jana Gana Mana” is notable for its geographic catalog: it names Punjab, Sindh, Gujarat, Maratha, Dravida, Utkala, and Banga, binding the nation together by listing its regions. This geographic enumeration strategy appears in 23% of Asian anthems but only 6% of anthems worldwide.

Sentiment Analysis: Are Anthems Happy or Sad?

A 2025 study published in Scientific Reports applied computational sentiment analysis to a corpus of national anthem lyrics, measuring both valence (positive vs. negative emotion) and arousal (calm vs. energetic). The findings challenge simple assumptions about anthem mood.

Most anthems score as positive in valence but high in tension. They are not happy songs in the way a pop ballad is happy. They are triumphant, defiant, or solemn. The emotional profile is closer to a victory speech than a love letter.

Regional differences are significant. American anthems (both North and South) show lower valence scores, likely because of the prevalence of war imagery and references to struggle and sacrifice. The U.S. anthem is literally about a battle. Mexico’s anthem mentions “war” eleven times. These are not cheerful texts, even when they celebrate victory.

Equatorial nations tend to produce anthems with higher energy and arousal scores. The study’s authors hypothesize a correlation between climate, cultural expressiveness, and musical energy, though this remains debated. What is clear is that anthems near the equator tend to be more rhythmically driven and emotionally intense in both lyrics and melody.

The saddest anthem by valence score is Poland’s “Mazurek Dabrowskiego,” which opens with “Poland has not yet perished.” The most consistently positive anthems tend to come from small island nations in the Pacific, whose lyrics emphasize natural beauty, gratitude, and divine blessing without the martial imagery common in larger states.

The Words That Never Appear

What anthems do not say is as important as what they do. Certain words that dominate modern political discourse are almost completely absent from the world’s anthems.

Democracy appears in exactly three national anthems worldwide. Despite being the most widely claimed form of government on earth, the concept barely registers in anthem vocabulary. The reason is partly historical (most anthems predate universal suffrage) and partly structural: anthems are about identity, not governance.

Economy, trade, industry, and technology are virtually absent. Only two anthems reference economic activity (both obliquely). National anthems exist in a pre-industrial emotional space. They talk about fields and mountains, not factories and stock markets.

Science appears in zero anthems. Education appears in one (Belize). The entire framework of Enlightenment rationalism, the intellectual tradition that literally made the nation-state possible, is missing from the songs nations sing about themselves.

Women are nearly invisible. Only six anthems reference women at all, and in most cases it is a generic “motherland” personification rather than actual women. Children appear in nine anthems, usually as future defenders of the nation. The family structure that sustains every nation on earth is almost entirely absent from the texts that define them.

Climate, environment, and nature (in the ecological sense) appear in zero anthems. Anthems reference mountains, rivers, and skies constantly, but always as symbols of beauty or permanence, never as ecosystems under threat.

This vocabulary gap reveals something fundamental: national anthems are not descriptions of how countries actually work. They are mythological texts. They operate in a symbolic register that predates industrialization, women’s suffrage, environmental science, and democratic governance. They are, in a very real sense, pre-modern documents still being performed in the 21st century.

What the Data Reveals About National Mythology

The most important finding from this analysis is not any single word frequency. It is the overall pattern. National anthems, taken as a global corpus, reveal a remarkably consistent mythology of nationhood.

That mythology has three pillars: sacred territory (the land is ours and it is blessed), blood sacrifice (our ancestors died for this and we must honor them), and divine legitimacy (a higher power has ordained our existence). These three ideas appear across cultures, continents, and centuries. They appear in the anthems of democracies and dictatorships, of island nations and continental empires, of countries founded in 1776 and countries founded in 1991.

This consistency suggests that national anthems are not really about individual countries at all. They are about the concept of nationhood itself. Each anthem is a local variation on a universal template: we are a people, this is our land, we have paid for it in blood, and the heavens approve.

The anthems are also performative texts, meaning they do not merely describe reality; they create it. When millions of people stand and sing the same words, they are not reporting on national unity. They are producing it. The anthem is not a mirror reflecting the nation. It is a ritual that constructs the nation, over and over again, every time it is performed.

This is why the vocabulary is so conservative. Innovation is dangerous in a ritual text. The anthem’s power depends on repetition, on the sense that these are the same words our grandparents sang. Introducing modern concepts (democracy, technology, climate) would break the spell. The anthem needs to feel eternal, even when the nation it represents is only a few decades old.

The data confirms what anthropologists and political scientists have long suspected: nations are, at their core, storytelling projects. And the national anthem is the shortest, most widely known version of that story. It is a 90-second myth, sung in unison, that transforms a collection of strangers into a people. The words matter less for their literal meaning than for the act of saying them together. But the words we choose, and the words we leave out, tell us more about what nations truly value than any constitution or policy document ever could.

Anthems in this story