U.S. Presidential Speeches

Presidential speeches fulfill essential roles—communicating policies, inspiring citizens, and addressing crises. Whether fostering unity or navigating international relations, they are a powerful tool for leaders to shape public perception, provide direction, and address the nation's pressing concerns.

Question Mark Illustration

Has the linguistic style and topics in the US presidential speech changed over time?

Project Goal

We aim to analyze presidential speeches in the United States from George Washington to the present day. This analysis will enable us to explore the evolution of discourse styles. The primary objective of our project is to examine the semantic patterns in U.S. presidential speeches over time.

Data Description

Data collected from the Miller Center's official website, an impartial affiliate of the University of Virginia, offers public access to U.S. presidential speeches spanning from George Washington in 1789 to Joe Biden's 2023 addresses. These encompass various formats, including formal national addresses, press conferences, and informal remarks, which total in 1037 transcripts. It provides comprehensive resource for analyzing the evolution of presidential communication over the years.

1037 Transcripts

45 Presidents

1789-2023

2 Main Periods

Data Sources

Miller Center gathered the transcripts from variety of sources:





  • Many Presidential papers and records had been lost, destroyed, sold for profit, or ruined by poor storage conditions. In 1939, President Franklin D. Roosevelt sought a better alternative. (archive.org)

Quality of Sources

In our project, we analyzed transcripts of speeches delivered by 45 presidents throughout history. These transcripts encompassed various mediums, including audio and video recordings, as well as pre-audio transcripts. Notably, the transcripts before Warren Harding are classified as pre-audio, as Harding became the first president to be heard on the radio (History.com Editors, 2020). We acknowledge that our 1037 transcripts have different qualities because the choice of medium for these transcripts may influence the quality and accuracy of the linguistic style portrayed.

Transcripts based on audio and video mediums offer a better analysis opportunity as they capture not only the words spoken but also the tone, emphasis, and other vocal cues. These cues provide invaluable insights into the president's speech style.

Before the development of audio recording technology, capturing what people said in speeches relied solely on on-the-spot transcriptions and field notes (Jones, 2021). These pre-audio transcripts heavily depended on the accuracy and interpretation of the persons taking notes, which could introduce errors and omissions of the speaker's intended words and speech style. As a result, these transcripts offered a limited representation of the speeches and lacked the depth that audio or video transcripts provide.

To ensure the reliability and credibility of our data, we sourced the transcripts from the Miller Center, a reputable institution dedicated to the study of the United States presidency. The Miller Center diligently gathers transcripts from reputable and authoritative sources. For recent speeches, ranging from George W. Bush to Joe Biden, the transcripts are generally obtained from the official White House website. Older speeches are often sourced from the relevant presidential library, such as the Ronald Reagan Presidential Library or the Franklin Roosevelt Presidential Library. Additionally, the Public Papers of the Presidents serve as another valuable resource for obtaining transcripts. The Miller Center employs a process of cross-referencing and validating the accuracy and completeness of the transcripts by comparing them to multiple sources and mediums.


References:

  1. History.com Editors (June 11, 2020). Warren G. Harding becomes the first president to be heard on the radio. HISTORY. Retrieved January 13, 2024, from https://www.history.com/this-day-in-history/harding-becomes-first-president-to-be-heard-on-the-radio
  2. Jones, R. H. (2021). Data collection and transcription in discourse analysis: A technological history. The Bloomsbury handbook of discourse analysis, 9-20.

Analysis Tools & Methods

Topic Modeling

Use LDA in Python

Stylo

R package for stylometric analyses

Gephi

Voyant

The First Part

Analyzing the General Trend

Tool: Gephi

THE GOAL

Minimal Sleek Utilitarian Flower
Minimal Sleek Utilitarian Flower
Minimal Sleek Utilitarian Flower

General observation

of speech style similarities among presidents.

General analysis of potential correlations between style and topics.

General analysis of factors influencing styles and topics.

Cubist US Independence Day USA Hanging Decoration

Corpus preparation

Geometric Independence Day Fireworks
Monitor Frame Illustration

Combine Datasets


  • Preexisting dataset (sppeches from 1789 to 2019, download from Kaggle): presidential_speeches.csv.
  • New dataset (speeches by Donald Trump and Joe Biden, collecting through Miller Center API): Biden_speeches.csv, Trump_speeches.csv.
  • Texts were cleaned by removing unwanted characters, such as line breaks (\n), carriage returns (\r), multiple consecutive spaces. These elements might disrupt the collocations or co-occurred words pattern.
  • Punctuation was retained as it might influence the speech style.


Monitor Frame Illustration

Group Texts


Instead of analyzing individual speeches, our approach is adopted by grouping speeches by speakers. This methodology provides a deeper understanding of the general style and thematic patterns associated with each presidents.

Brushstroke Arrow Smooth Curve Down Small
Weathered Historical Statue of Liberty Cut-out
Brushstroke Arrow Smooth Curve Down Small
4th of July Retro Vintage Typography
Brushstroke Arrow Smooth Curve Down Small
Monitor Frame Illustration

Text Cleaning


  • Texts were cleaned by removing unwanted characters, such as line breaks (\n), carriage returns (\r), multiple consecutive spaces. These elements might disrupt the collocations or co-occurred words pattern.
  • Punctuation was retained as it might influence the speech style.
  • Removing stop words before topic modelling.
Weathered Historical Statue of Liberty Cut-out
Monitor Frame Illustration

Corpus Size


45 documents (each document representing the entire body of speeches delivered by one president).

Brushstroke Arrow Smooth Curve Down Small
Outlined Stylized Delegates of the Treaty of Paris

Prepare Edges Files


Corpus: 45 documents


Stylo

Setting Parameters


Changing different parameters to ensure the stability of the results and increase the reliability of our findings. Finally, we decide to use the following parameters:


THe findings

  • 2-gram size: Analyzing pairs of words can capture more nuanced linguistic patterns and dependencies between words compared to a unigram analysis. Also, analyzing pairs of words can detect recurring word combinations and collocations that contribute to the presidents' speech styles.


  • Minimum Frequency Words (MFW): A minimum MFW of 100, with increments of 100 up to a maximum of 1000.


  • Consensus Tree: This option will output a statistically justified compromise between a number of cluster analyses results for a variety of MFW and Culling parameter values.


  • Consensus strength 0.5: It means that such a linkage between two texts is made if it appears in at least 50% of the cluster analyses.


  • Culling: 20%. Words that appear in at least 20% of the texts in the corpus will be considered in the analysis.


  • Start at freq: 6: The first five pairs of words are not informative (“of the,” “in the,” “to the,” “and the,” “for the”).


  • Classic Delta: for English, usually Classic Delta is a good choice.

Analyzing General Trend

Prepare Nodes Files


Nodes file one: basic information of presidents

  • Preexisting dataset: basic-information presidents.csv (download from GitHub)

Nodes file two: topic probability

Topic Modelling


Technique:


  • Topic Modelling (Latent Dirichlet Allocation (LDA))


Library: tomotopy


  • We used the tomotopy library. It is an alternative to Mallet for topic modeling in Python, providing similar algorithms without the need for Java or Mallet installation.


Training Models


  • To determine the best number of topics and iterations, we trained different models with varying numbers of topics (from 5 to 15) and iterations 100 to 500 times. The log-likelihood was used to evaluate the model's training performance. After careful evaluation, we decided to use the model with 7 topics and 500 iterations.


THe findings

Analyzing General Trend

Output: topic probability file

Topic model results :top 30 topic words


THe SETTINGS

Gephi

Layout


  • ForceAtlas2
  • Visual distance = linguistic style distance. It visually represents the distance and similarities between presidents and clusters effectively. Position nodes in a way that reflects their relative distances or similarities.

Node Size


  • The size of nodes reflects the degree to which a president's style is connected to others.
  • Larger nodes indicate a stronger influence on other presidents' speech styles.


Controlling Variables


We maintained a same layout (style similarity) and same node size (style influence) while manipulating colors (to understand the factors influencing speech style).

THe findings

Analyzing General Trend

Speech styles change over time and presidents often share similarities with their contemporaries.


The gradual darkening of colors from one end to the other signifies the changing speech styles over time. It indicates how presidents' speech styles evolved chronologically. Darker green represent more recent periods, while lighter green represent earlier periods.


THe findings

Analyzing General Trend

Various clusters exist within the early periods, suggesting a style variety in the early periods.

The purple cluster represents modern times (after the 1920s), while other clusters represent earlier periods. Various colors indicate different clusters of linguistic styles. The substantial gap between the purple cluster and others indicates significant style changes during this specific period.



THe findings

Analyzing General Trend

The various topic clusters align with clusters of linguistic styles as well as with the timeline.

The gradient colors indicate topics, with darker red representing a higher probability of a specific topic, and lighter red indicating a lower probability. This intensity of color illustrates which presidents focus more on these topics.




The Second Part

Analyzing the Stylistic Change

Tool: Voyant

THE GOAL

General analysis of the stylistic change in vocabulary,sentence and tone .


THe SETTINGS

Voyant

Corpus


  • The corpus is segmented based on the Franklin D. Roosevelt era, divided into the period before the Franklin D. Roosevelt era (1796-1932) and the period after (including) the Franklin D. Roosevelt era (1932-2023).
  • Pre-Franklin D. Roosevelt Era:This corpus has 514 documents with 2,207,193 total words
  • The Post-Franklin D. Roosevelt Era (1932-2023):This corpus has 461 documents with 1,576,619 total words and 27,872 unique word forms.

Controlling Variables


We conduct text analysis through the input of keywords, combined with the comparison of multiple charts.

THe findings

In the later period, the linguistic style became more concise, and the diversity decreased.


  • Ratio: The ratio in an article represents word diversity – a higher ratio indicates a richer vocabulary and broader use of terms in the text.


  • Words/Sentence: Words/Sentence indicates the number of words in each sentence, reflecting sentence length. A higher value suggests longer, potentially more complex sentences, while a lower value indicates shorter, more concise structures.



Analyzing the Stylistic Change

Hand drawing decorative letter &

THe findings

Analyzing the Stylistic Change

The use of speech language shifted from formal towards informal

  • ‘shall’: In the Cambridge Dictionary's explanation, 'shall' are considered a somewhat outdated modal verb in English. From the word frequency statistics, it can be observed that the mention of 'shall' and ‘said’ significantly decreases in later speeches


  • ‘the said’& ‘I said’: Also, when the term “said” serves as an adjective, it tends to be used in formal contexts. Presidents after Roosevelt were prone to use “said” as a verb, and the frequency of “I said” increased from 30 times in the first corpus



In the later period, the language of the speeches became more colloquial.

‘let’s’&’right now’: Phrases like 'let's' convey informality and suggest collaboration, while 'right now' adds urgency to the conversation. Both are commonly used in casual spoken English, contributing to a friendly and approachable tone.

INTERESTING

FINDINGS

Exploring the Corpus: Insights from Voyant

Photo by Pixabay on Pexels

Presidents are increasingly focused on people

‘People’ & ‘Government’


  • We put all the corpus into Voyant, the words appear most are ‘people’, and ‘government’.
  • References to people increased, but references to government decreased, And people were eventually mentioned more than government in general. And what’s more interesting is that the turning point was Franklin Roosevelt.

People

Government

The New Deal

law*

work*

need*

job*

duty*

  • The surge in terms associated with 'jobs,' 'families,' and 'welfare' can be attributed to initiatives reminiscent of the New Deal. Historically, the New Deal prioritized the '3 R's': providing relief for the unemployed and the impoverished, facilitating the recovery of the economy to pre-depression levels, and instituting reforms in the financial system to avert future economic downturns. This transformative era has left a lasting impact on succeeding generations.

Photo by Cottonbro Studio on Pexel

“Fire” in Cold War

  • Same Tendency between ‘Soviet*’ and ‘Military*’. may be related to the fact that the rivalry between the United States and the Soviet Union was one of the focal points of this period, and the main policy of the United States was to use the arms race to contain communism.

Soviet*

Military*

  • When the 'soviet' and 'military' indicators hit a low point, 'Vietnam' reached its peak. This could be attributed to the United States' involvement in the Vietnam War, causing damage to its reputation, economy, and leading to complications in arms agreements. Upon withdrawal, the U.S. opted to enhance relations with both China and the Soviet Union.

Soviet*

Military*

Vietnam*

Conclusion

Minimal Sleek Utilitarian Flower
Minimal Sleek Utilitarian Flower
Minimal Sleek Utilitarian Flower

Linguistic style and topics in US presidential speeches have changed over time, with1920s-1930s as a turning point for changes of speech linguistic styles.


In the Post-Franklin D. Roosevelt Era, the linguistic style became more concise, leading to a decrease in diversity. The shift in speech language moved from formal to informal, resulting in a more colloquial language in the speeches.

The correlation between historical events and the corpus of presidents is highly significant. Big events tend to generate higher frequencies of related words, whether it be the New Deal or the Cold War. These merely scratch the surface of the historical iceberg, and more findings will emerge with additional distant reading.

Limitation

It is important to keep in mind that our research has some limitations. The individuals responsible for transcribing these speeches and their methods can significantly impact the results. Additionally, the use of media during presidential speeches is another influential factor. A notable disparity in the average sentence lengths between two corpora may stem from the adoption of radios since June 14, 1922. Subsequently, transcripts of presidential speeches might be derived from recordings, leading to potential variations in punctuation usage among transcribers and resulting in a dramatic change in sentence length. Despite our efforts to gather comprehensive details, we encountered challenges in tracing all relevant information. Therefore, our research findings may have some bias in certain cases. Nevertheless, we hope that our work serves as inspiration for your further research in this domain.

Warning Road Sign

the

team

Mingkai Xu

Data wrangling& Web Design

Arani Aslama

Data wrangling& Web Design

Wenjing Cai

Data wrangling& Voyant analysis

Baidan Chen

Data wrangling&

Web Design

Wuhong Xu

Data wrangling&

Web Design

Luotong Cheng

Data wrangling,Stylo analysis & Gephi analysis

Xiaoyu Zhou

Data wrangling&

Voyant analysis