Photo by History in HD on Unsplash
U.S. Presidential Speeches
Presidential speeches fulfill essential roles—communicating policies, inspiring citizens, and addressing crises. Whether fostering unity or navigating international relations, they are a powerful tool for leaders to shape public perception, provide direction, and address the nation's pressing concerns.
Has the linguistic style and topics in the US presidential speech changed over time?
Data Description
Data collected from the Miller Center's official website, an impartial affiliate of the University of Virginia, offers public access to U.S. presidential speeches spanning from George Washington in 1789 to Joe Biden's 2023 addresses. These encompass various formats, including formal national addresses, press conferences, and informal remarks, which total in 1037 transcripts. It provides comprehensive resource for analyzing the evolution of presidential communication over the years.
1037 Transcripts |
45 Presidents |
1789-2023 |
2 Main Periods |
Data Sources
Miller Center: https://millercenter.org/thepresidency/presidential-speeches
Miller Center gathered the transcripts from variety of sources:
Quality of Sources
In our project, we analyzed transcripts of speeches delivered by 45 presidents throughout history. These transcripts encompassed various mediums, including audio and video recordings, as well as pre-audio transcripts. Notably, the transcripts before Warren Harding are classified as pre-audio, as Harding became the first president to be heard on the radio (History.com Editors, 2020). We acknowledge that our 1037 transcripts have different qualities because the choice of medium for these transcripts may influence the quality and accuracy of the linguistic style portrayed.
Transcripts based on audio and video mediums offer a better analysis opportunity as they capture not only the words spoken but also the tone, emphasis, and other vocal cues. These cues provide invaluable insights into the president's speech style.
Before the development of audio recording technology, capturing what people said in speeches relied solely on on-the-spot transcriptions and field notes (Jones, 2021). These pre-audio transcripts heavily depended on the accuracy and interpretation of the persons taking notes, which could introduce errors and omissions of the speaker's intended words and speech style. As a result, these transcripts offered a limited representation of the speeches and lacked the depth that audio or video transcripts provide.
To ensure the reliability and credibility of our data, we sourced the transcripts from the Miller Center, a reputable institution dedicated to the study of the United States presidency. The Miller Center diligently gathers transcripts from reputable and authoritative sources. For recent speeches, ranging from George W. Bush to Joe Biden, the transcripts are generally obtained from the official White House website. Older speeches are often sourced from the relevant presidential library, such as the Ronald Reagan Presidential Library or the Franklin Roosevelt Presidential Library. Additionally, the Public Papers of the Presidents serve as another valuable resource for obtaining transcripts. The Miller Center employs a process of cross-referencing and validating the accuracy and completeness of the transcripts by comparing them to multiple sources and mediums.
References:
Analysis Tools & Methods
Topic Modeling
Use LDA in Python
Stylo
R package for stylometric analyses
Gephi
Voyant
Photo by Joshua Hoehne on Unsplash
The First Part
Analyzing the General Trend
Tool: Gephi
THE GOAL
General observation
of speech style similarities among presidents.
General analysis of potential correlations between style and topics.
General analysis of factors influencing styles and topics.
Corpus preparation
Combine Datasets
Group Texts
Instead of analyzing individual speeches, our approach is adopted by grouping speeches by speakers. This methodology provides a deeper understanding of the general style and thematic patterns associated with each presidents.
Text Cleaning
Corpus Size
45 documents (each document representing the entire body of speeches delivered by one president).
Photo by Adi Goldstein on Unsplash
Prepare Edges Files
Corpus: 45 documents
Stylo
Setting Parameters
Changing different parameters to ensure the stability of the results and increase the reliability of our findings. Finally, we decide to use the following parameters:
THe findings
Analyzing General Trend
Photo by Adi Goldstein on Unsplash
Prepare Nodes Files
Nodes file one: basic information of presidents
Nodes file two: topic probability
Topic Modelling
Technique:
Library: tomotopy
Training Models
THe findings
Analyzing General Trend
Output: topic probability file
Topic model results :top 30 topic words
Photo by Adi Goldstein on Unsplash
THe SETTINGS
Gephi
Layout
Node Size
Controlling Variables
We maintained a same layout (style similarity) and same node size (style influence) while manipulating colors (to understand the factors influencing speech style).
THe findings
Analyzing General Trend
Speech styles change over time and presidents often share similarities with their contemporaries.
The gradual darkening of colors from one end to the other signifies the changing speech styles over time. It indicates how presidents' speech styles evolved chronologically. Darker green represent more recent periods, while lighter green represent earlier periods.
THe findings
Analyzing General Trend
Various clusters exist within the early periods, suggesting a style variety in the early periods.
The purple cluster represents modern times (after the 1920s), while other clusters represent earlier periods. Various colors indicate different clusters of linguistic styles. The substantial gap between the purple cluster and others indicates significant style changes during this specific period.
THe findings
Analyzing General Trend
The various topic clusters align with clusters of linguistic styles as well as with the timeline.
The gradient colors indicate topics, with darker red representing a higher probability of a specific topic, and lighter red indicating a lower probability. This intensity of color illustrates which presidents focus more on these topics.
Photo by Joshua Hoehne on Unsplash
The Second Part
Analyzing the Stylistic Change
Tool: Voyant
THE GOAL
General analysis of the stylistic change in vocabulary,sentence and tone .
Photo by Adi Goldstein on Unsplash
THe SETTINGS
Voyant
Corpus
Controlling Variables
We conduct text analysis through the input of keywords, combined with the comparison of multiple charts.
THe findings
In the later period, the linguistic style became more concise, and the diversity decreased.
Analyzing the Stylistic Change
THe findings
Analyzing the Stylistic Change
The use of speech language shifted from formal towards informal
In the later period, the language of the speeches became more colloquial.
‘let’s’&’right now’: Phrases like 'let's' convey informality and suggest collaboration, while 'right now' adds urgency to the conversation. Both are commonly used in casual spoken English, contributing to a friendly and approachable tone.
INTERESTING
FINDINGS
Exploring the Corpus: Insights from Voyant
Photo by Pixabay on Pexels
Presidents are increasingly focused on people
‘People’ & ‘Government’
People
Government
The New Deal
law*
work*
need*
job*
duty*
Photo by Cottonbro Studio on Pexel
“Fire” in Cold War
Soviet*
Military*
Soviet*
Military*
Vietnam*
Conclusion
Linguistic style and topics in US presidential speeches have changed over time, with1920s-1930s as a turning point for changes of speech linguistic styles.
In the Post-Franklin D. Roosevelt Era, the linguistic style became more concise, leading to a decrease in diversity. The shift in speech language moved from formal to informal, resulting in a more colloquial language in the speeches.
The correlation between historical events and the corpus of presidents is highly significant. Big events tend to generate higher frequencies of related words, whether it be the New Deal or the Cold War. These merely scratch the surface of the historical iceberg, and more findings will emerge with additional distant reading.
Limitation
It is important to keep in mind that our research has some limitations. The individuals responsible for transcribing these speeches and their methods can significantly impact the results. Additionally, the use of media during presidential speeches is another influential factor. A notable disparity in the average sentence lengths between two corpora may stem from the adoption of radios since June 14, 1922. Subsequently, transcripts of presidential speeches might be derived from recordings, leading to potential variations in punctuation usage among transcribers and resulting in a dramatic change in sentence length. Despite our efforts to gather comprehensive details, we encountered challenges in tracing all relevant information. Therefore, our research findings may have some bias in certain cases. Nevertheless, we hope that our work serves as inspiration for your further research in this domain.
the
team
Mingkai Xu
Data wrangling& Web Design
Arani Aslama
Data wrangling& Web Design
Wenjing Cai
Data wrangling& Voyant analysis
Baidan Chen
Data wrangling&
Web Design
Wuhong Xu
Data wrangling&
Web Design
Luotong Cheng
Data wrangling,Stylo analysis & Gephi analysis
Xiaoyu Zhou
Data wrangling&
Voyant analysis