青年報告
Youth-led media is any effort created, planned, implemented, and reflected upon by young people in the form of media, including websites, newspapers, television shows, and publications. Such platforms connect writers, artists, and photographers in the age range of 13–24 all around the globe and promote and defend a free youth press. Members of these platforms not only have the freedom to express their own opinions on various issues and topics but also represent various communities and let their voices be heard.
青年領導的媒體是年輕人以媒體的形式(包括網站,報紙,電視節目和出版物)創建,計劃,實施和反思的任何努力。 這樣的平臺將全球13至24歲的作家,藝術家和攝影師聯系起來,并促進和捍衛自由的青年報刊。 這些平臺的成員不僅可以自由表達自己在各種問題和主題上的意見,而且可以代表各種社區并發表自己的聲音。
Hence, such platforms prove to be a good source of data to understand and analyze youth aspirations across various parts of the globe. In the remaining sections, we will explain our methodology of data collection and will list down our results and insights derived from the analysis of various topics.
因此,這些平臺被證明是理解和分析全球各地青年志向的良好數據來源。 在其余各節中,我們將解釋我們的數據收集方法,并將列出我們的結果和從各種主題分析中得出的見解。
本節談論什么? (What does the section talk about?)
This Section is overall given insights about the data was distributed over newspapers and articles, the insights and visualizations tell us about how youths are going on and how their sentiments change overtime period (Ranges from 2015–2020)
本節總體上給出了關于數據分布在報紙和文章上的見解,這些見解和可視化告訴我們有關青年的發展狀況以及他們的情緒隨時間變化的情況(2015-2020年的范圍)
我們為什么選擇這個主題? (Why did we choose this topic?)
This topic aims to analyze data from a different perspective i.e Outside Social media. This is the reason we choose this topic to scrape and analyze the data i.e present over there outside social media and we present our insights accordingly.
本主題旨在從不同角度(即外部社交媒體)分析數據。 這就是我們選擇該主題來抓取和分析數據(即存在于外部社交媒體上的數據)的原因,我們相應地提出了自己的見解。
目標 (Objectives)
- To scrape and process News articles from different resources, to prepare it for sentiment analysis and topic modeling, in order to draw useful insights about the sentiment of the youth from it. 從不同的資源中抓取和處理新聞文章,以準備進行情感分析和主題建模,以便從中獲得有關青年情感的有用見解。
- To conduct sentiment analysis, for understanding the youth sentiment better. 進行情緒分析,以更好地了解青年情緒。
- To collect the insights from all of these points and to visualize the results in a cogent manner for the audience. 從所有這些方面收集見解,并以令人信服的方式為觀眾呈現結果。
方法 (Methodology)
數據采集 (Data Collection)
To collect articles, we scraped data from various media platforms (ref. Table 1) using a scraper we made using BeautifulSoup and requests a library in Python. Lots of articles were scraped ranging from the year 1994 to 2020 and merged to a final dataset that we used for analysis. We also focused on extracting articles for certain categories, viz:
為了收集文章,我們使用了使用BeautifulSoup制作的抓取工具,從各種媒體平臺(參見表1)抓取了數據,并請求使用Python庫。 從1994年到2020年,我們刮掉了許多文章,并將其合并為我們用于分析的最終數據集。 我們還專注于提取某些類別的文章,即:
- Education 教育
- Environment & Climate 環境與氣候
- Human Rights 人權
- COVID-19 新冠肺炎
- Politics 政治
- Health and Leisure 健康休閑
使用的工具: (Tools Used:)
For scraping data:
對于抓取數據:
- Beautiful soup 美麗的湯
- Requests 要求
- Selenium Selenium
For visualizing data:
為了可視化數據:
- Matplotlib Matplotlib
- Seaborn Seaborn
- Python-Plotly Python皮
- Matplotlib-Animations Matplotlib動畫
- Tableau 畫面
- Python Word Clouds Python文字云
For sentiment analysis:
對于情緒分析:
- Text Blob 文字斑點
- Empath Analysis 移情分析
- Region-Based Analysis 基于區域的分析
- Knowledge Graph 知識圖
- Network Analysis 網絡分析
數據預處理 (Data Preprocessing)
With all the articles scraped, next, we focused on preprocessing the articles. While preprocessing, one of our major challenges was to identify and remove promotional content from the articles. To start with, we removed all the URLs from the articles. Next, we identified the templates that each of the platforms used for advertisements or for promoting other articles and used regular expressions to identify and remove them from the articles. We then sent our articles through a basic preprocessing pipeline to change the case, stem, lemmatize and remove special characters and regular stopwords, etc. We also identified certain redundant words like journalism, etc. that didn’t add to the analysis and removed them from our dataset.
在抓取所有文章之后,接下來,我們將重點放在預處理文章上。 在進行預處理時,我們面臨的主要挑戰之一是從文章中識別并刪除促銷內容。 首先,我們從文章中刪除了所有URL。 接下來,我們確定了每個平臺用于廣告或促銷其他文章的模板,并使用正則表達式來標識它們并將其從文章中刪除。 然后,我們通過基本的預處理流程發送文章,以更改大小寫,詞干,詞形化并刪除特殊字符和常規停用詞等。我們還發現了某些未添加到分析中的多余詞(例如新聞等),并將其刪除從我們的數據集中。
Additionally, we also did a keyword analysis as a preprocessing step so as to ensure that we have everything ready before we start with our analysis. Next, we used Stanford’s NER and Python’s geopy library to identify locations with respect to the articles. Then, we used LDA and Empath based analysis for topic modeling and recognized 9 following topics:
此外,我們還將關鍵字分析作為預處理步驟,以確保在開始分析之前已準備就緒。 接下來,我們使用Stanford的NER和Python的geopy庫來確定關于文章的位置。 然后,我們使用基于LDA和Empath的分析進行主題建模,并識別出以下9個主題:
Environment (Climate Change)
環境(氣候變化)
Leadership & Politics (Democracy, Leadership)
領導與政治(民主,領導)
Health
健康
COVID-19
新冠肺炎
Education
教育
Technology
技術
Human Rights (LGBT, Black Lives Matter, Bullying)
人權(LGBT,重要的黑人生活,欺凌)
Terrorism and Violence
恐怖主義與暴力
Career and Employment
職業與就業
ENVIRONMENT ( Author: Mr. Mateus Broilo)
環境(作者:Mateus Broilo先生)
There is no question that the Environment is a key topic that gathers the concern of the whole society, from youngsters to adults, and to elders. However, the youth of today are the future of tomorrow and for this reason, they are the part of society that most probably will suffer the most in years to come. The environment can not be seen as a cultural movement, simply because it is not. But it must be seen as and dealt like a political movement and as an economical trend where most of the time it serves the will of powerful corporations.
毫無疑問,環境是一個關鍵話題,引起了整個社會的關注,從年輕人到成年人,再到老年人。 但是,今天的年輕人是明天的未來,因此,他們是社會的一部分,很可能會在未來幾年遭受最大的痛苦。 不能僅僅因為環境就將其視為文化運動。 但是,必須將它視為一種政治運動,并將其視為一種經濟趨勢,在大多數情況下,它服務于強大企業的意愿。
The Word Cloud shows some of the most common and meaningful words related to the Environment topic analysis. Notice that words like climate, change, people, plastic, and others presented in Below Figure may be correlated to the basic concerns of the young people. And not surprisingly they appear as the most common words in over the 380 articles analyzed. Clearly “climate” and “change” are two pieces of a bigram. Climate is changing and that is a fact. “People” are part of the problem, but also can be the solution, mostly the youth. After all, the youth aspirations are a heat map towards where the world actually should be going to. And just for curiosity, have you ever found a “plastic” bottle on the beach? See Below Figures for more clarity.
詞云顯示與環境主題分析相關的一些最常見和最有意義的詞。 請注意,下圖中顯示的諸如氣候,變化,人,塑料等字眼可能與年輕人的基本關切相關。 毫不奇怪,它們在被分析的380篇文章中成為最常見的詞。 顯然,“氣候”和“變化”是二元論的兩個部分。 氣候在變化,這是事實。 “人”是問題的一部分,但也可以是解決方案,主要是青年。 畢竟,青年的志向是世界應去往何處的熱點圖。 只是出于好奇,您是否曾經在海灘上找到過“塑料”瓶? 請參閱下面的圖以更清楚。
One last analysis is to look for lexicons, in other words, to perform a text analysis across lexical categories. Here the main objective is to connect the text with a broad range of sentiments beyond positive, negative, and neutral, as shown in Figure 3.6.15. On the other hand in Figure 3.6.16 we see the most common levels in which the text articles can be categorized and in 3.6.17 the empath values associated with the most meaningful levels that impacts the environmental movement.
最后一種分析是尋找詞典,換句話說,對詞匯類別進行文本分析。 此處的主要目的是將文本與正面,負面和中立之外的廣泛情感聯系起來,如圖3.6.15所示。 另一方面,在圖3.6.16中,我們看到可以對文本文章進行分類的最常見級別,在3.6.17中,我們看到了與影響環境運動的最有意義的級別相關聯的移情值。
2. CARRER AND EDUCATION (Author: Mr. Mario Vasquez Arias, Ms. Adelore Similoluwa Gloria)
2. 就業和 教育(作者:Mario Vasquez Arias先生,Adelore Similoluwa Gloria女士)
Education is one of the chosen categories and, at the same time, fundamental to this study because, if we talk about young people, it should not be lacking. Most young people are at some level of education, be it primary school, high school, or university. Therefore many young people spend a lot of time in educational sites becoming their second home and directly affecting the lives of each young person. As they are considered home, they reflect their personalities, concerns, and other feelings that the young person has at that time, so it is important to analyze this aspect.
教育是選擇的類別之一,同時也是這項研究的基礎,因為如果我們談論年輕人,就不應該缺乏教育。 大多數年輕人受過一定程度的教育,無論是小學,高中還是大學。 因此,許多年輕人在教育場所花費大量時間成為他們的第二故鄉,并直接影響每個年輕人的生活。 當他們被視為家時,它們反映了年輕人當時的性格,關注和其他感受,因此分析此方面很重要。
We can see in the word cloud that the two words that stand out the most are “school” and “students”, which allude precisely to what education represents, so it is obvious to expect those results. The word “time” also stands out, which we can infer is all the time that young people spend in school, which is a large part of the day for five days and for many years (these words are also visible in the graph). The word “high school” shows that the articles scraped and from which the analyses were made were more focused on a younger population and the target population, precisely. Another key word is “work”, which implies that young people not only study but also work, probably because of economic conditions. Another word that can be visualized is “immigrant”, which is an aspect that has been seen quite a lot in recent years, and education would not be exempt from this. The word “home problem” is seen in a smaller size but also important to note, as this reflects that sometimes students bring home problems to school, affecting their performance on grades and mental health.
我們可以在詞云中看到,最突出的兩個詞是“學校”和“學生”,這恰好暗示了教育所代表的含義,因此可以預期得到這些結果。 “時間”一詞也很突出,我們可以推斷出年輕人在學校上的所有時間,這是一天中大部分時間,持續五天和很多年(這些單詞在圖表中也可見)。 “高中”一詞表明,準確地刮掉并進行分析的文章更加側重于年輕人口和目標人群。 另一個關鍵詞是“工作”,這意味著年輕人不僅學習而且工作,這可能是由于經濟狀況所致。 可以形象化的另一個詞是“移民”,這是近年來已經出現的很多方面,而且教育也不能免除這一點。 “家庭問題”一詞看起來較小,但也要注意,因為這反映出有時學生將家庭問題帶到學校,影響他們在年級和心理健康方面的表現。
Empath Analysis:
移情分析 :
The four values of empathy with the highest level are school (which is also the most repeated), reading, social networks, and holidays. In this range of time, it is what the young people but have emphasized in their thoughts, the school that already we said that it is like its second home; the habit of the reading that is something that has been increasing, or in physical or digital means; the social networks that these were a boom in the society and practically all the young people know and handle this type of technological services; and the vacations that are a few dates enough waited by the young people to enjoy the free time, their hobbies and the rest. Another value to highlight is technology, which appears at a lower level, but is still relevant for young people, due to the great advance of technology and the great proliferation of services and devices that are available to anyone, especially to this young population.
同理心最高的四個值是學校(也是重復次數最多的),閱讀,社交網絡和假期。 在這段時間里,正是年輕人在思想中強調的,我們已經說過的學校就像是第二故鄉。 以某種形式或以物理或數字方式增加的閱讀習慣; 這些社交網絡正在社會中蓬勃發展,幾乎所有年輕人都知道并使用這種技術服務; 還有一些假期足以讓年輕人等著享受空閑時間,他們的業余愛好和其他時間。 值得強調的另一個價值是技術,由于技術的飛速發展以及任何人(尤其是這個年輕人口)可以使用的服務和設備的廣泛普及,它的出現水平較低,但仍然與年輕人相關。
We can observe the average of the sentiment values for each year, where we have the highest peak in 2016 and the lowest in 2020, the latter could be due to the negative feelings generated by the pandemic generated by the coronavirus, which generates feelings of anxiety, confinement due to quarantine, loneliness, and depression, among others.
我們可以觀察到每年的情緒平均值,其中我們在2016年達到最高峰,而在2020年達到最低峰,后者可能是由于冠狀病毒引起的大流行所產生的負面情緒,從而產生了焦慮感,由于隔離,孤獨和沮喪等原因導致的禁閉。
3. TERRORISM AND VIOLENCE (Author: Ms. Shanya Sharma)
3. 恐怖主義與暴力(作者:Shanya Sharma女士)
Sentiments Trends over time:
情緒隨時間變化的趨勢:
The dip in sentiments for 2019 can be associated to 2019 Oakland Gun Violence. The same can be inferred from the keywords extracted from 2019 terrorism articles.
2019年情緒下降可能與2019年奧克蘭槍支暴力有關。 從2019年恐怖主義文章中提取的關鍵詞可以推斷出同樣的情況。
Keywords like domestic violence can be seen for the year 2020 which can have a direct relation to COVID-19 and lockdown
在2020年可以看到像家庭暴力這樣的關鍵詞,它可能與COVID-19和鎖定有直接關系
Police brutality is also a frequent keyword for 2020 data indicating that police brutality for imposing lockdown (for e.g. India) or that surrounding George Floyd’s case kept the youth in terror.
警察暴行也是2020年數據的常見關鍵詞,表明警察因實施封鎖而暴行(例如印度)或周圍的喬治·弗洛伊德(George Floyd)案使年輕人感到恐怖。
4. HUMAN RIGHTS(Author: Mr. Opeyemi Fabiyi)
4 。 人權(作者:奧貝米·法比伊先生)
Keywords for certain locations
特定位置的關鍵字
Let’s Look the Emotion Trends:
讓我們看看情緒趨勢:
- A gradually increasing trend for negative emotions wrt human rights is concerning 與人權有關的負面情緒逐漸增加的趨勢令人擔憂
- Similarly, hate can be seen to be increasing gradually 同樣,可以看出仇恨正在逐漸增加
- A higher value for positive emotions can indicate that the articles might also be hopeful about certain aspects 積極情緒的價值較高,表明該文章也可能對某些方面充滿希望
- An analysis for finding the reason for the peak in 2018 showed that most articles written during these times were about how the writers want to fight the wrongdoings around them and indicated hope. 通過分析發現2018年達到峰值的原因,分析發現,這段時期內撰寫的大多數文章都是關于作家如何應對周圍的錯誤行為并表示希望的。
- Some common causes of concerns were: 引起關注的一些常見原因是:
Racism
種族主義
Violence
暴力
Poverty
貧窮
Immigration
出入境
Homophobia
恐同
6. Some concerning insights that came were:
6.一些有關的見解包括:
Youth in India is worried about Menstrual Hygiene
印度的年輕人擔心月經衛生
Sex-Trafficking is a cause of concern in developed nations like the US.
在美國等發達國家,性販運問題引起人們的關注。
SEX TRAFFICKING:
性交易:
- Almost all articles were from the United States 幾乎所有文章都來自美國
- Most articles are written in 2019 大多數文章寫于2019年
5. POLITICS(Author: Ms. Kriti Rai Saini)
5.政治(作者:Kriti Rai Saini女士)
Lexicons Associated with Positive.
Lexicons與陽性相關。
Lexicons Associated with negative.
Lexicons與陰性相關。
Let’s analyze yearly changes on sentiments over the years due to politics on different topics.
讓我們分析一下多年來由于不同主題的政治而導致的情緒年度變化。


- The steep increase in disappointment due to the #MeToo movement, US Presidential results announcement, and the Facebook scandal. 由于#MeToo運動,美國總統結果公告和Facebook丑聞,令人失望的人數急劇增加。
- A steep decrease in hate in 2018 maybe due to the royal wedding and increase in 2020 due to the #BLM movement. 2018年的仇恨急劇下降可能是由于皇家婚禮,而#BLM運動則導致了2020年仇恨的急劇增加。
- Steep Increase in poor in 2020 due to the COVID pandemic. 由于COVID大流行,2020年的貧困人口將急劇增加。
- An increase in death in 2020 can be attributed to the COVID pandemic. 2020年死亡人數增加可歸因于COVID大流行。
- Increase in anger due to constant discontent with the political situations over the years. 多年來,由于對政治局勢的不滿,導致憤怒增加。
- Increase in violence in 2020 due to lockdown imposed, the #BLM movement, and the peak in 2017 due to the #MeToo movement, mass violence in the USA(Texas, Las Vegas). 由于實施了封鎖,#BLM運動,2020年的暴力事件增加;由于#MeToo運動,美國的大規模暴力(德克薩斯州拉斯維加斯),2017年的暴力事件增加。

- An increase in contentment and optimism in 2020 maybe because the pandemic has made people realize the importance of little things. The peak in contentment in 2017 due to the royal wedding. 2020年滿足感和樂觀情緒的提高可能是因為大流行使人們意識到了小事情的重要性。 由于皇家婚禮,2017年的滿足感達到了頂峰。
- The peak in lexicon love in 2017 due to the royal wedding. 由于皇家婚禮,2017年詞典愛情達到了頂峰。
- The peak in lexicon strength in 2017 due to the Women’s march wherein 1 million women stood up for women's rights and in 2020 due to the BLM movement. 在2017年,由于婦女大游行(其中有100萬名婦女捍衛婦女權利)而導致詞典力量達到頂峰,而在2020年,由于BLM運動,詞典力量達到了頂峰。
6. COVID 19(Author: Ms. Monalisa Panda)
6. COVID 19(作者:Monalisa Panda女士)
In the above fig, we can see that there are only a few articles present in covid that is only the year 2020.
在上圖中,我們可以看到,只有2020年,covid中只有幾篇文章。
Mean Sentiments over different Months before lockdown vs after lockdown.
鎖定前與鎖定后不同月份的平均情緒。
Word clouds of all the articles based on the topic COVID-19:
基于主題COVID-19的所有文章的詞云:
Based on Positive Sentiments:
基于積極情緒:
So these are listed as positive sentiments on the topic of COVID-19, mostly the words detected are:
因此,這些被列為COVID-19主題的積極情緒,大部分檢測到的單詞是:
People, Time: Most people get time to spend with their families and Relatives.
人,時間:大多數人有時間陪伴家人和親戚。
Based on Negative Sentiments:
基于負面情緒:
So here in this word cloud, we can see that misinformation and racism, discrimination is some of the negative key holders in the case of the COVID topic.
因此,在這個詞云中,我們可以看到,就COVID主題而言,錯誤信息和種族主義,歧視是負面因素的一部分。
Racism
種族主義
The peak in the negative emotions can be associated with US Presidential Elections 2016
負面情緒的高峰可能與2016年美國總統大選有關
Fear wrt racism was gradually decreasing but saw a slight rise in 2020
恐懼種族主義正在逐漸減少,但在2020年會略有上升
Region-based Positive and Negative sentiments on the topic of COVID:
基于區域的正面和負面情緒,涉及COVID:
Positive Sentiments:
積極情緒:

Negative Sentiments Regions:
負面情緒區域:

So these are the Whole analysis with all the topics mentioned in the Top. Once again I would like to thank all Omdena to give this Wonderful Opportunity to work on this project.
因此,這些是整體分析,上面列出了所有主題。 我要再次感謝所有Omdena給予這個工作的美好機會。
To visit for Upcoming Projects Go to Omdena
要訪問即將進行的項目,請訪問Omdena
Thank You!
謝謝!
Monalisa Panda
蒙娜麗莎·熊貓(Monalisa Panda)
翻譯自: https://medium.com/omdena/understanding-youths-sentiments-c25ccbdb5702
青年報告
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389819.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389819.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389819.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!