spotify 數據分析
Spotisis /spo-ti-sis/ noun The analysis of one’s Spotify streaming history using Python.
Spotisis / spo-ti-sis / 名詞使用Python分析一個人的Spotify流歷史。
I was reading through a lot of data science related guides and project ideas when I came across an article in which the author compared his song choices with his friend’s. I wanted to do something similar, so set out to analyse my own streaming history and compare it with what the world listens to.
當我看到一篇文章,作者將他的歌曲選擇與朋友的歌曲選擇進行比較時,我正在閱讀許多與數據科學相關的指南和項目構想。 我想做類似的事情,因此著手分析自己的流媒體歷史并將其與世界聽的內容進行比較。
Through this, I aim to find out more about my music preferences and how that differs from the world’s genral picks.
通過這一工作,我旨在了解有關我的音樂喜好以及與世界各地的精選音樂有何不同的更多信息。
I never really put much thought into my music preference before this project — it was always kind of dependent on my mood, and when someone asked me what type of music I like, I had no answer — because it varied from one hour to another.
在這個項目開始之前,我從來沒有真正考慮過我的音樂偏好-它總是取決于我的心情,當有人問我喜歡哪種音樂時,我沒有答案-因為它從一個小時到另一個小時不等。
I’ve split this project into 2 sections:
我將該項目分為兩個部分:
Part A is the analysis of my music streaming history.
A部分是對我的音樂流歷史的分析。
- Timeline of my streaming history 我的流式傳輸歷史的時間表
- Day preference 日偏好
- Favorite artist 最喜歡的藝術家
- Favorite songs 最喜歡的歌曲
- Spirit of the songs 歌曲的精神
- Diversity 多元化
Part B is the comparison of the top 50 songs streamed on my list with the top 50 songs streamed in 2019
B部分是我列表中前50首歌曲與2019年前50首歌曲的比較
數據 (The data)
Spotify allows every user to request a download of all their streaming history, so Part A is completely dependent on that. They also have an amazing Developer Platform in which the public can use the data available for their own interest. Along with my personal data, I used the audio features option — which breaks down a song and gives it ‘score’ for a number of different attributes. The attributes are as follows:
Spotify允許每個用戶請求下載其所有流歷史記錄,因此A部分完全依賴于此。 他們還擁有一個了不起的開發人員平臺 ,公眾可以在其中使用自己感興趣的數據。 除了我的個人數據,我還使用了音頻功能選項-可以分解一首歌曲,并為許多不同的屬性賦予它“得分”。 屬性如下:
Acousticness — A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic
聲學 -軌道是否聲學的置信度,范圍為0.0到1.0。 1.0代表高置信度軌道是聲學的
Danceability — A description of how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
舞蹈性 -基于音樂元素(包括速度,節奏穩定性,節拍強度和整體規律性)的組合,說明軌道是否適合跳舞。 值0.0最低可跳舞,而1.0最高可跳舞。
Energy — Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
能量 —能量是從0.0到1.0的量度,表示強度和活動的感知量度。 通常,充滿活力的曲目會感覺快速,響亮且嘈雜。
Instrumentalness — Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.
器樂性 —預測音軌是否不包含人聲。 在這種情況下,“哦”和“啊”的聲音被當作工具。 器樂性值越接近1.0,則軌道中沒有聲音的可能性越大。
Liveness — Detects the presence of an audience in the recording.
生動度 -檢測記錄中是否有聽眾。
Loudness — The overall loudness of a track in decibels (dB). Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
響度 -軌道的整體響度,以分貝(dB)為單位。 響度是聲音的質量,它是身體力量(振幅)的主要心理關聯。 值的典型范圍是-60至0 db。
Speechiness — Speechiness detects the presence of spoken words in a track.
語音性 -語音性可檢測曲目中是否存在口語。
Valence — A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track.
價 -從0.0到1.0的量度,描述了軌道傳達的音樂積極性。
Tempo — The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration
節奏 —曲目的總體估計節奏,單位為節拍/分鐘(BPM)。 用音樂術語來說,節奏是指給定樂曲的速度或節奏,它直接來自平均拍子持續時間
Mode — Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
模式 —模式表示軌道的形態(主要或次要),是其旋律內容所源自的音階類型。 Major用1表示,minor用0表示。
Key — The estimated overall key of the track.
密鑰 -曲目的估計總體密鑰。
The dataset was a little messy, so I used Pandas to clean it up according to my need for each section. The entire code can be found on the GitHub link at the end of this article.
數據集有點混亂,因此我根據每個部分的需要使用Pandas對其進行清理。 完整的代碼可以在本文結尾的GitHub鏈接上找到。
For Part B, I used this dataset from Kaggle.
對于B部分,我用這從Kaggle數據集。
Before we begin, I just want to say something… Don’t come at me for my music choice!
在我們開始之前,我只想說些什么...不要因為我的音樂選擇而來找我!
甲部 (Part A)
1.我的流式傳輸歷史的時間表 (1. Timeline of my streaming history)
I know that I spend a lot of time listening to music, but I didn’t know I spent that much time! The data dates back to late June of 2019 and was highly varied.
我知道我花了很多時間聽音樂,但是我不知道我花了很多時間! 該數據可以追溯到2019年6月下旬,并且變化很大。

On February 24th 2020, I spent a gasping 535 minutes (which is almost 9 hours) on spotify — the most in the past year! There’s no definite answer as to why the difference between the highest and lowest value (which was in seconds) is so much, but I did register for Spotify Premium around that time, so maybe that was the reason? Push the promos harder you guys ;)
2020年2月24日,我在Spotify上花費了535分鐘(將近9個小時),這是過去一年中最多的! 關于最高值和最低值(以秒為單位)之間的差異為何如此之大,沒有確切的答案,但是我確實在那個時候注冊了Spotify Premium,所以也許這就是原因嗎? 加大促銷力度;)
2.每日偏好 (2. Day preference)
Does the day of the week affect how long I spend listening to music?
星期幾會影響我聽音樂的時間嗎?

I usually listen to music while walking to and back from college, so I would’ve predicted that more time would be spent during the weekdays. Sunday is chillday, so it makes sense that it was when I spent most time listening to music.
我通常在上大學和上大學時聽音樂,因此我預計在工作日將花費更多時間。 星期日很冷,所以有意義的是那是我花大量時間聽音樂的時候。
3.最喜歡的藝術家 (3. Favorite artists)
Do I have a favorite arist?
我有最喜歡的藝術家嗎?
According the the data, I actually do. There were two factors I considered: the number of times I played an artist’s song and the total amount of time I spent listening to their songs.
根據數據,我實際上是這樣做的。 我考慮了兩個因素:播放歌手歌曲的次數和收聽他們歌曲的總時間。
When looking through the data, I found that some of the songs were played only for a few seconds, so that was reducing the accuracy of the results.
查看數據時,我發現某些歌曲僅播放了幾秒鐘,因此降低了結果的準確性。
The graphs below show the top 15 artists under both categories.
下圖顯示了兩個類別中的前15位藝術家。


Lauv, Shawn Mendes, One Direction and Justin Bieber maintained the top 4 positions under both graphs, whereas the others were rearranged.
勞夫,肖恩·門德斯,一個方向和賈斯汀·比伯在兩個圖表上均保持前4位,而其他兩個則重新排列。
4.哪些歌曲播放最多? (4. Which songs were played most?)
Was it by the same 15 artists?
是由同一15位藝術家創作的嗎?
Yes, it was — Lauv took 5 of the 15 spots!
是的,是的— Lauv占據了15個景點中的5個!

I realised that some of the top 15 artists (based on the amount of time spent listening to their songs) were on the list because of one or two songs which were repeated multiple times.
我意識到,排名前15位的藝術家中的一些(基于聽他們的歌曲所花費的時間)在名單上是因為一首或兩首歌曲被重復多次。
For example, Memories by Maroon 5 was the most played song (played for a total of 184 minutes). When comared to the total time spent listening to the group (430 minutes), the different was about 246 minutes. In percentage, it means that more than 40% of the time spent listening to Maroon 5 was spent only on Memories.
例如,Maroon 5的Memories是播放最多的歌曲(總共播放184分鐘)。 將聽完該小組所花費的總時間(430分鐘)估算為大約246分鐘。 以百分比表示,這意味著超過40%的時間在聆聽Maroon 5上的時間僅花在記憶上。
It’s a good song. Admit it.
這是一首好歌。 承認吧
5.歌曲的精神 (5. Spirit of the song)
Do I listen to positive songs?
我會聽正面的歌嗎?
Using the valence attribute from Spotify’s audio analysis features, I tried to find out the general spirit of the top 50 songs I listen to. The valence scale is from 0–1, with one being the most positiveness conveyed in the track.
使用Spotify音頻分析功能的valence屬性,我試圖找出我聽的前50首歌曲的總體精神。 化合價的范圍是0-1,其中一個是在曲目中傳達的最多的積極性。
For the sake of classification:- low spirit = 0 ≤ valence < 0.5- netural = 0.5≤ valence < 0.6-high spirit = 0.6 ≤ valence ≤ 1
為了分類:-低酒精度= 0≤價<0.5-神經質=0.5≤價<0.6-高酒精度= 0.6≤價≤1
(I named it as ‘spirit’ because ‘positive’ and ‘negative’ didn’t feel right)
(我將其命名為“精神”,因為“正”和“負”感覺不正確)
I was pretty unsure about this one and was utterly surprised by the results.
我對此不太確定,對結果完全感到驚訝。

So I listen to more of low spirit songs?? That doesn’t make sense!
所以我聽更多的低沉的歌曲嗎? 那沒有道理!
When I cross referenced the song names to its valence scale, I realised that this may not have been the most accurate representation. Ed Sheeran’s Photograph had a valence scale of 0.18, for which it was categorised as ‘low spirit’. Although it’s not a super high spirited song, it’s not so low either!
當我將歌曲名稱以其效價比例交叉引用時,我意識到這可能不是最準確的表示形式。 埃德·希蘭(Ed Sheeran)的攝影作品的化合價等級為0.18,因此其分類為“精神低落”。 盡管這不是一首超振奮的歌,但它也不是那么低!
6.歌曲的多樣性 (6. Diversity of songs)
How do the audio features of the songs compare to one another?
歌曲的音頻功能如何相互比較?
The spirit of the song built up my curiosity to know more about how the songs varied from one another in therms of the audio features, so I compared the top 3 most played songs. I believe that my song choices are highly diverse.
這首歌的精神激發了我的好奇心,以了解更多有關歌曲在音頻功能方面的差異的信息,因此我比較了播放次數最多的前三首歌曲。 我相信我的歌曲選擇非常多樣化。
Those who are familiar with these songs know just how much they vary from one another — they give such different vibes, but I needed the data to prove it.
那些熟悉這些歌曲的人知道它們彼此之間有多少不同-它們具有不同的共鳴,但是我需要數據來證明這一點。

There is A LOT of difference — most noticable in the loudness and acousticness attributes.
有很多差異-響度和聲學屬性最明顯。
The next part is based off of this diversity.
下一部分基于這種多樣性。
B部分 (Part B)
Is my music too diverse? How does it fare when compared to the global top 50?
我的音樂太多樣化了嗎? 與全球前50名相比,情況如何?

Apart from the mode, everything is different! I prefer less groovy, instrumental based songs which have lower energy levels, while the global hits suggest people lean towards fast paced, energetic songs that they can dance to.
除了模式,其他都不同! 我更喜歡能量水平較低的低調,器樂性歌曲,而全球流行歌曲則建議人們傾向于快節奏,充滿活力的歌曲,他們可以跳舞。

The difference between my music’s average tempo (beats per minute) and the global average is 4 BPM. According to research, songs which have 120 BPM are considered to be fast paced songs. My preference seems to be at a little slower pace, though not by much.
我的音樂的平均節奏(每分鐘的節拍)與全局平均速度之間的差是4 BPM。 根據研究,具有120 BPM的歌曲被視為快節奏的歌曲。 我的喜好似乎放慢了一點,盡管速度并不慢。
結論 (Conclusion)
This project was a blast to do. I thoroughly enjoyed learning more about my music preferences and comparing that to the global hits. Now that I am backed with the data, I can say that my music is highly diversified and that I do have a favourite artist — Lauv (considering the amount of time I’ve spent listening to his songs, it wouldn’t be justified to say otherwise!).
這個項目是一個爆炸。 我非常喜歡學習有關自己的音樂喜好,并將其與全球流行歌曲進行比較。 現在,我有了這些數據的支持,可以說我的音樂非常多樣化,而且確實有一位喜歡的藝術家Lauv(考慮到我花了很多時間聽他的歌曲,這并沒有理由否則說!)。
Following this article, I would like to continue by applying some machine learning knowledge to create a recommender system based on my music preferences.
在閱讀完本文之后,我想繼續應用一些機器學習知識,根據我的音樂喜好創建一個推薦系統。
Feel free to comment and view the entire code on my GitHub!
隨時在我的GitHub上評論和查看整個代碼!
Big thanks to Vlad Gheorghe for his brilliant explanation (huge savior!)
非常感謝弗拉德·格奧爾格(Vlad Gheorghe)出色的解釋(救世主!)
翻譯自: https://medium.com/swlh/analysis-of-my-spotify-streaming-history-57a6088c3d3
spotify 數據分析
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/388977.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/388977.shtml 英文地址,請注明出處:http://en.pswp.cn/news/388977.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!