aws架構
What I learned building the StateOfVeganism ?
我學到的建立素食主義的方法是什么?
By now, we all know that news and media shape our views on the topics we discuss. Of course, this is different from person to person. Some might be influenced a little more than others, but there always is some opinion communicated.
到目前為止,我們都知道新聞和媒體塑造了我們對所討論主題的看法 。 當然,這因人而異。 有些人可能受到的影響要比其他人多一些,但總是傳達了一些意見。
Considering this, I thought it would be really interesting to see the continuous development of mood directed towards a specific topic or person in the media.
考慮到這一點,我認為看到針對特定話題或媒體人物的情緒的持續發展真的很有趣。
For me, Veganism is an interesting topic, especially since it is frequently mentioned in the media. Since the media’s opinion changes the opinion of people, it would be interesting to see what “sentiment” they communicate.
對我而言, 素食主義是一個有趣的話題,尤其是因為它在媒體中經常被提及。 由于媒體的觀點改變了人們的觀點,所以很有趣的是看到他們傳達的是什么“情感”。
This is what this whole project is about. It collects news that talks about or mentions Veganism, finds out the context in which it was mentioned, and analyses whether it propagates negativity or positivity.
這就是整個項目的目的。 它收集談論或提及素食主義的新聞,找出提及它的背景,并分析其傳播的是否定性還是積極性。
Of course, a huge percentage of the analysed articles should be classified as “Neutral” if the writers do a good job in only communicating information, so we should keep that in mind, too.
當然,如果作者僅在交流信息方面做得很好,則被分析文章中有很大一部分應歸類為“中性”,因此我們也應牢記這一點。
I realized that this was an incredible opportunity to pick up new toolset, especially when I thought about the sheer number of articles published daily. So, I thought about building a scalable architecture — one that is cheap/free in the beginning when there is no traffic and only a few articles, but scales easily and infinitely once the amount of mentions or traffic increases. I heard the cloud calling.
我意識到這是獲取新工具集的絕佳機會,尤其是當我想到每天發表的文章數量之多時 。 因此,我考慮過要構建一種可伸縮的體系結構,這種體系結構一開始是便宜/免費的,沒有流量,只有幾篇文章,但是一旦提及或流量增加,就可以輕松,無限地擴展。 我聽到了云的呼喚。
設計架構 (Designing the Architecture)
Planning is everything, especially when we want to make sure that the architecture scales right from the beginning.
規劃就是一切,尤其是當我們要確保架構從一開始就可以擴展時。
Starting on paper is a good thing, because it enables you to be extremely rough and quick in iterating.
從紙上開始是一件好事,因為它使您可以非常粗暴且快速地進行迭代。
Your first draft will never be your final one, and if it is, you’ve probably forgotten to question your decisions.
您的初稿永遠不會是您的初稿,如果是的話,您可能已經忘記質疑您的決定了。
For me, the process of coming up with a suitable and, even more important, reasonable architecture was the key thing I wanted to improve with this project. The different components seemed pretty “easy” to implement and build, but coming up with the right system, the right communication, and a nice, clean data pipeline was the really interesting part.
對我來說,提出一個合適的,甚至更重要的,合理的體系結構的過程是我希望通過該項目進行改進的關鍵。 不同的組件似乎很容易實現和構建,但是提出了正確的系統,正確的通信以及良好而干凈的數據管道是真正有趣的部分。
In the beginning, I had some bottlenecks in my design which, at one point, would’ve brought my whole system to its knees. In that situation, I thought about just adding more “scalable” services like queues to queue the load and take care of it.
一開始,我在設計中遇到了一些瓶頸,這些瓶頸曾經使我的整個系統崩潰。 在那種情況下,我考慮過只添加更多的“可伸縮”服務(例如隊列)以使負載排隊并進行處理。
When I finally had a design which, I guessed, could handle a ton of load and was dynamically scalable, it was a mess: too many services, a lot of overhead, and an overall “dirty” structure.
我猜想,當我最終設計出一種可以處理大量負載并可以動態擴展的設計時,那是一團糟:太多的服務,大量的開銷以及整個“臟”的結構。
When I looked at the architecture a few days later, I realised that there was so much I could optimise with a few changes. I started to remove all the queues and thought about replacing actual virtual machines with FAAS components. After that session, I had a much cleaner and still scalable design.
幾天后,當我查看該體系結構時,我意識到可以進行一些更改就可以優化很多東西。 我開始刪除所有隊列,并考慮過用FAAS組件替換實際的虛擬機。 在那次會議之后,我有了一個更加整潔且可擴展的設計。
考慮結構和技術,而不是實現 (Think of the structure and technologies, not implementations)
That was one of the mistakes I made quite early in the project. I started out by looking at what services IBM’s BlueMix could offer and went on from there. Which ones could I mix together and use in my design that seemed to work together with triggers and queues and whatever?
那是我在項目早期就犯的錯誤之一。 首先,我著眼于IBM的BlueMix可以提供什么服務,然后再繼續。 我可以將哪些混合在一起并在我的設計中使用,這些設計似乎可以與觸發器和隊列一起使用?
In the end, I could remove a lot of the overhead in terms of services by simply stepping away from it and thinking of the overall structure and technologies I needed, rather than the different implementations.
最后,我可以通過簡單地放棄服務并考慮我需要的總體結構和技術,而不是不同的實現 ,來消除很多服務方面的開銷。
Broken down into a few distinct steps, the project should:
分解為幾個不同的步驟 ,該項目應:
Every hour (in the beginning, since there would only be a few articles at the moment -> could be made every minute or even every second) get the news from some NewsAPI and store it.
每小時(從一開始,因為現在只有幾篇文章->可以每分鐘甚至每秒制作一次)就可以從News API獲取新聞并進行存儲。
- Process each article, analyse the sentiment of it, and store it in a database to query. 處理每篇文章,分析其情感,然后將其存儲在數據庫中以進行查詢。
- Upon visiting the website, get the selected range data and display bars/articles. 訪問網站后,獲取所選范圍數據并顯示條/文章。
So, what I finally ended up with was a CloudWatch Trigger which triggers a Lambda Function every hour. This Function gets the news data for the last hour from the NewsAPI. It then saves each article as a separate JSON file into an S3 bucket.
因此,我最終得到的是一個CloudWatch觸發器,該觸發器每小時觸發一次Lambda函數。 此函數從NewsAPI獲取最后一小時的新聞數據。 然后,將每篇文章作為單獨的JSON文件保存到S3存儲桶中。
This bucket, upon ObjectPut, triggers another Lambda Function. This loads the JSON from S3, creates a “context” for the appearance of the part-word “vegan,” and sends the created context to the AWS Comprehend sentiment analysis. Once the function gets the sentiment information for the current article, it writes it to a DynamoDB table.
在ObjectPut上,此存儲桶觸發另一個Lambda函數。 這將從S3加載JSON,為部分單詞“純素”的外觀創建“上下文”,并將創建的上下文發送到AWS Comprehend情感分析。 一旦該函數獲取了當前文章的情感信息,它就會將其寫入DynamoDB表。
This Table is the root for the data displayed in the frontend. It gives the user a few filters with which they can explore the data a little bit more.
該表是前端中顯示的數據的根。 它為用戶提供了一些過濾器,他們可以使用這些過濾器來探索更多數據。
If you’re interested in a deeper explanation, jump down to the description of the separate components.
如果您對更深入的解釋感興趣,請跳至各個組件的描述。
誰是“一個”云提供商? (Who’s “The One” Cloud Provider?)
Before I knew that I was going with AWS, I tried out two other cloud providers. It’s a very basic and extremely subjective view on which provider to choose, but maybe this will help some other “Cloud-Beginners” choose.
在我知道要使用AWS之前,我嘗試了另外兩個云提供商。 這是一個關于選擇哪個提供程序的非常基礎且極為主觀的觀點,但這也許會幫助其他“云計算初學者”進行選擇。
I started out with IBMs Bluemix Cloud, moved to Google Cloud, and finally ended up using AWS. Here are some of the “reasons” for my choice.
我最初是從IBM的Bluemix Cloud開始的,后來轉移到Google Cloud,最后使用了AWS。 這是我選擇的一些“理由”。
A lot of the points listed here really only tell how good the overall documentation and community is, how many of the issues I encountered that already existed, and which ones had answers on StackOverflow.
這里列出的許多要點實際上只能說明整體文檔和社區的水平,我遇到的已經存在的許多問題以及哪些在StackOverflow上有答案。
文檔和社區是關鍵 (Documentation and Communities are Key)
Especially for beginners and people who’ve never worked with cloud technologies, this is definitely the case. The documentation and, even more importantly, the documented and explained examples were simply the best for AWS.
尤其是對于初學者和從未使用過云技術的人們來說,確實是這樣。 文檔,甚至更重要的是,已記錄和解釋的示例對于AWS來說簡直是最好的。
Of course, you don’t have to settle for a single provider. In my case, I could’ve easily used Google’s NLU tools because, in my opinion, they brought the better results. I just wanted to keep my whole system on one platform, and I can still change this later on if I want to.
當然,您不必滿足于單個提供商。 就我而言,我本可以輕松使用Google的NLU工具,因為我認為它們帶來了更好的結果。 我只是想將整個系統保留在一個平臺上,如果需要,以后仍然可以更改此系統。
The starter packs of all providers are actually really nice. You’ll get $300 on Google Cloud which will enable you to do a lot of stuff. However, it’s also kind of dangerous, since you’ll be charged if you should use up the amount and forget to turn off and destroy all the services building up the costs.
所有提供商的入門包實際上都非常不錯。 您將在Google Cloud上獲得300美元,這將使您能夠做很多事情。 但是,這也很危險,因為如果您用光了金額卻忘記關閉并銷毀所有增加費用的服務,則會向您收取費用。
BlueMix only has very limited access to services on their free tier, which is a little bit unfortunate if you want to test out the full suite.
BlueMix在其免費層上只能訪問非常有限的服務,如果要測試完整套件,這有點不幸。
Amazon, for me, was the nicest one, since they also have a free tier which will allow you to use nearly every feature (some only with the smallest instance like EC2.micro).
對我來說,亞馬遜是最好的,因為它們還有免費層,您可以使用幾乎所有功能(有些僅適用于最小的實例,例如EC2.micro)。
Like I already mentioned, this is a very flat and subjective opinion on which one to go for… For me AWS was the easiest and fastest to pick up without investing too much time upfront.
就像我已經提到的那樣,這是一個非常平坦和主觀的觀點,值得一提……對我來說,AWS是最簡單,最快的方法,無需花太多時間在前。
組成部分 (The Components)
The whole project can basically be split into three main components that need work.
整個項目基本上可以分為三個需要工作的主要部分。
The Article Collection, which consists of the hourly cron job, the lambda function which calls the NewsAPI, and the S3 bucket that stores all the articles.
文章集合,由每小時的cron作業,調用NewsAPI的lambda函數以及存儲所有文章的S3存儲桶組成。
The Data Enrichment part which loads the article from S3, creates the context and analyses it using Comprehend, and the DynamoDB that stores the enriched data for later use in the frontend.
數據豐富部分從S3加載文章,創建上下文并使用Comprehend對其進行分析,以及DynamoDB,該數據庫存儲豐富的數據以供以后在前端使用。
And the Frontend which gets displayed when the users request the webpage. This component consists of a graphical user interface, a scalable server service which serves the webpage, and, again, the DynamoDB.
以及當用戶請求網頁時顯示的前端 。 該組件由圖形用戶界面,為網頁提供服務的可伸縮服務器服務以及DynamoDB組成。
文章收藏 (Article Collection)
The first and probably easiest part of the whole project was collecting all the articles and news that contain the keyword “vegan”. Luckily, there are a ton of APIs that provide such a service.
整個項目的第一部分(可能是最簡單的部分)是收集所有包含關鍵字“素食主義者”的文章和新聞。 幸運的是,有許多提供此類服務的API。
One of them is NewsAPI.org.
其中之一是NewsAPI.org 。
With their API, it’s extremely easy and understandable. They have different endpoints. One of them is called “everything” which, as the name suggests, just returns all the articles that contain a given keyword.
有了他們的API,它非常簡單易懂。 它們具有不同的端點。 其中之一稱為“一切”,顧名思義,它僅返回包含給定關鍵字的所有文章。
Using Node.js here, it looks something like this:
在這里使用Node.js,看起來像這樣:
The + sign in front of the query String “vegan” simply means that the word must appear.
查詢字符串“ vegan”前面的+號僅表示必須出現該單詞。
The pageSize defines how many articles per request will be returned. You definitely want to keep an eye on that. If, for example, your system has extremely limited memory, it makes sense to do more requests (use the provided cursor) in order to not crash the instance with responses that are too big.
pageSize定義每個請求將返回多少文章。 您絕對想留意這一點。 例如,如果您的系統內存非常有限,則有必要執行更多請求(使用提供的游標),以免因響應太大而使實例崩潰。
The response from NewsAPI.org looks like this. If you’re interested in seeing more examples, head over to their website where they have a lot of examples displayed.
來自NewsAPI.org的響應如下所示。 如果您有興趣查看更多示例,請訪問他們的網站 ,其中顯示了很多示例。
As you can see, those article records only give a very basic view of the article itself. Terms like vegan, which appear in some context inside the article without being the main topic of it, are not represented in the title or description.Therefore, we need the Data Enrichment component, which we’ll cover a little bit later. However, this is exactly the type of JSON data that is stored in the S3 bucket, ready for further processing.
如您所見,這些文章記錄僅提供了文章本身的非常基本的視圖。 諸如素食主義者這樣的術語出現在文章中的某些上下文中而不是本文的主要主題,因此沒有在標題或描述中表示。因此,我們需要數據豐富組件,稍后我們將進行介紹。 但是,這正是存儲在S3存儲桶中的JSON數據的類型,可供進一步處理。
Trying an API locally and actually using it in the cloud are really similar. Of course, there are some catches where you don’t want to paste your API key into the actual code but rather use environment variables, but that’s about it.
在本地嘗試API并在云中實際使用它確實很相似。 當然,有一些陷阱不希望將API密鑰粘貼到實際代碼中,而是使用環境變量,僅此而已。
AWS has a very neat GUI for their Lambda setup. It really helps you understand the structure of your component and visualise which services and elements are connected to it.
AWS的Lambda設置非常簡潔。 它確實可以幫助您了解組件的結構,并可視化將哪些服務和元素連接到該組件。
In the case of the first component, we have the CloudWatch Hourly Trigger on the “Input”-side and the Logging with CloudWatch and the S3 Bucket as a storage system on the “Output”-side.
對于第一個組件,我們在“輸入”端具有CloudWatch每小時觸發器,在“輸出”端具有CloudWatch和S3存儲桶作為存儲系統。
So, after putting everything together, importing the Node.JS SDK for AWS, and testing out the whole script locally, I finally deployed it as a Lamdba Function.
因此,在將所有內容放在一起,導入適用于AWS的Node.JS SDK并在本地測試整個腳本之后,我最終將其部署為Lamdba Function。
The final script is actually pretty short and understandable:
最終的腳本實際上很簡短,可以理解:
The GUI has some nice testing features with which you can simply trigger your Function by hand.
GUI具有一些不錯的測試功能,您可以通過它們簡單地手動觸發功能。
But nothing worked…
但是沒有任何效果……
After a few seconds of googling, I found the term “Policies”. I’d heard of them before, but never read up on them or tried to really understand them.
搜尋了幾秒鐘后,我發現了“政策”一詞。 我以前聽說過它們,但從未閱讀過它們或試圖真正理解它們。
Basically, they describe what service/user/group is allowed to do what. This was the missing piece: I had to allow my Lambda function to write something to S3. (I won’t go into detail about it here, but if you want to skip to policies, feel free to head to the end of the article.)
基本上,它們描述了允許哪些服務/用戶/組執行哪些操作。 這是缺少的部分:我不得不允許我的Lambda函數向S3寫一些東西。 (我在這里不會詳細介紹它,但是如果您想跳到政策,請隨意轉到本文結尾。)
A policy in AWS is a simple JSON-Style configuration which, in the case of my article collection function, looked like this:
AWS中的策略是一個簡單的JSON樣式配置,在我的文章收集功能中,該配置如下所示:
This is the config that describes the previously mentioned “Output”-Side of the function. In the statements, we can see that it gets access to different methods of the logging tools and S3.
這是描述函數前面提到的“輸出”端的配置。 在這些語句中,我們可以看到它可以訪問日志記錄工具和S3的不同方法。
The weird part about the assigned resource for the S3 bucket is that, if not stated otherwise in the options of your S3 bucket, you have to both provide the root and “everything below” as two separate resources.
關于S3存儲桶分配的資源的怪異之處在于,如果未在S3存儲桶的選項中另行說明,則必須同時提供根目錄和“以下所有內容”作為兩個單獨的資源。
The example given above allows the Lambda Function to do anything with the S3 bucket, but this is not how you should set up your system! Your components should only be allowed to do what they are designated to.
上面給出的示例允許Lambda函數對S3存儲桶執行任何操作,但這不是設置系統的方式! 只允許您的組件執行指定的操作。
Once this was entered, I could finally see the records getting put into my S3 bucket.
輸入后,我終于可以看到記錄已放入我的S3存儲桶中。
特殊字符是邪惡的…… (Special Characters are evil…)
When I tried to get the data back from the S3 bucket I encountered some problems. It just wouldn’t give me the JSON file for the key that was created. I had a hard time finding out what was wrong until at one point, I realised that, by default, AWS enables logging for your services.
當我嘗試從S3存儲桶取回數據時,遇到了一些問題。 它只是不會為我提供所創建密鑰的JSON文件。 我很難找出問題所在,直到有一點才意識到,默認情況下,AWS啟用了服務日志記錄。
This was gold!
這是金子!
When I looked into the logs, the problem jumped out at me right away: it seemed like the key-value that gets sent by the S3-Trigger does some URL-Encoding. However, this problem was absolutely invisible when just looking at the S3 key names where everything was displayed correctly.
當我查看日志時,問題立刻就撲向了我:似乎S3-Trigger發送的鍵值進行了一些URL編碼。 但是,僅查看所有顯示正確的S3鍵名時,此問題是絕對看不到的。
The solution to this problem was pretty easy. I just replaced every special character with a dash which won’t be replaced by some encoded value.
解決這個問題非常容易。 我只是將每個特殊字符替換為破折號,而破折號不會被某些編碼值替換。
So, always make sure to not risk putting some special characters in keys. It might save you a ton of debugging and effort.
因此,始終確保不要冒險在鍵中放入一些特殊字符。 它可以節省大量的調試和工作量。
資料充實 (Data Enrichment)
Since we now have all the articles as single records in our S3 bucket, we can think about enrichment. We have to combine some steps in order to fulfill our pipeline which, just to think back, was the following:
由于我們現在將所有文章作為單個記錄保存在S3存儲桶中,因此我們可以考慮充實。 我們必須結合一些步驟才能完成我們的管道,回想一下,這是以下步驟:
- Get record from S3 bucket. 從S3存儲桶獲取記錄。
- Build a context from the actual article in combination with the title and description. 根據實際文章以及標題和描述來構建上下文。
- Analyse the created context and enrich the record with the result. 分析創建的上下文,并用結果豐富記錄。
- Write the enriched article-record to our DynamoDB table. 將豐富的文章記錄寫到我們的DynamoDB表中。
One of the really awesome things about Promises in JavaScript is that you can model pipelines exactly the way you would describe them in text. If we compare the code with the explanation of what steps will be taken, we can see the similarity.
關于JavaScript中的Promises的真正令人敬畏的事情之一是,您可以完全按照在文本中描述管道的方式對管道進行建模。 如果將代碼與將要執行的步驟的說明進行比較,則可以看到相似之處。
If you take a closer look at the first line of the code above, you can see the export handler. This line is always predefined in the Lambda Functions in order to know which method to call. This means that your own code belongs in the curly braces of the async block.
如果仔細查看上面代碼的第一行,您會看到導出處理程序。 該行始終在Lambda函數中預定義,以便知道要調用的方法。 這意味著您自己的代碼屬于異步塊的花括號。
For the Data Enrichment part, we need some more services. We want to be able to send and get data from Comprehends sentiment analysis, write our final record to DynamoDB, and also have logging.
對于數據充實部分,我們需要更多服務。 我們希望能夠從理解情緒分析中發送和獲取數據,將最終記錄寫入DynamoDB,并進行日志記錄。
Have you noticed the S3 Service on the “Output”-side? This is why I always put the Output in quotes, even though we only want to read data here. It’s displayed on the right hand side. I basically just list all the services our function interacts with.
您是否注意到“輸出”端的S3服務? 這就是為什么即使我們只想在此處讀取數據, 我也總是將Output放在引號中的原因 。 顯示在右側。 我基本上只是列出我們的函數與之交互的所有服務。
The policy looks comparable to the one of the article collection component. It just has some more resources and rules which define the relation between Lambda and the other services.
該策略看起來與文章收集組件之一相當。 它只是具有更多的資源和規則,這些資源和規則定義了Lambda與其他服務之間的關系。
Even though Google Cloud, in my opinion, has the “better” NLU components, I just love the simplicity and unified API of AWS’ services. If you’ve used one of them, you think you know them all. For example, here’s how to get a record from S3 and how the sentiment detection works in Node.js:
我認為,即使Google Cloud具有“更好”的NLU組件, 我還是喜歡AWS服務的簡單性和統一的API。 如果您使用了其中之一,則認為您全部了解。 例如,以下是從S3獲取記錄的方法以及情感檢測在Node.js中的工作方式:
Probably one of the most interesting tasks of the Data Enrichment Component was the creation of the “context” of the word vegan in the article.
數據豐富組件最有趣的任務之一就是在本文中創建純素食主義者一詞的“上下文”。
Just as a reminder — we need this context, since a lot of articles only mention the word “Vegan” without having “Veganism” as a topic.
提醒一下-我們需要這種背景,因為許多文章只提到“素食主義者”一詞,而沒有以“素食主義者”為主題。
So, how do we extract parts from a text? I went for Regular Expressions. They are incredibly nice to use, and you can use playgrounds like Regex101 to play around and find the right regex for your use case.
那么,我們如何從文本中提取部分呢? 我去了正則表達式。 它們非常好用,您可以使用Regex101之類的游樂場玩耍并找到適合您的用例的正則表達式。
The challenge was to come up with a regex that could find sentences that contained the word “vegan”. Somehow it was harder than I expected to make it generalise for whole text passages that also had line breaks and so on in them.
面臨的挑戰是提出一個正則表達式,以便找到包含“素食主義者”一詞的句子。 某種程度上來說,要使它推廣到整個文本段落(其中也包含換行符)的難度比我預期的要難。
The final regex looks like this:
最終的正則表達式如下所示:
The problem was that for long texts, this was not working due to timeout problems. The solution in this case was pretty “straightforward”… I simply crawled the text and split it by line breaks, which made it way easier to process for the RegEx module.
問題是,對于長文本,由于超時問題而無法正常工作。 在這種情況下,解決方案非常“簡單”……我只是對文本進行爬網并按換行符進行了拆分,這使RegEx模塊的處理變得更加容易。
In the end, the whole context “creation” was a mixture of splitting the text, filtering for passages that contained the word vegan, extracting the matching sentence from that passage, and joining it back together so that it could be used in the sentiment analysis.
最后,整個上下文“創建”是混合以下內容的過程: 拆分文本,過濾包含純素詞的段落,從該段落中提取匹配的句子,然后將其重新組合在一起,以便可以用于情感分析。
Also the title and description might play a role, so I added those to the context if they contained the word “vegan”.
標題和描述也可能起作用,因此,如果標題和描述中包含“素食主義者”一詞,則將其添加到上下文中。
Once all the code for the different steps was in place, I thought I could start building the frontend. But something wasn’t right. Some of the records just did not appear in my DynamoDB table…
一旦完成了用于不同步驟的所有代碼,我就可以開始構建前端了。 但是有些事情是不對的。 有些記錄只是沒有出現在我的DynamoDB表中...
DynamoDB中的空字符串也很邪惡 (Empty Strings in DynamoDB are also evil)
When checking back with the status of my already running system, I realised that some of the articles would not be converted to a DynamoDB table entry at all.
當檢查我已經在運行的系統的狀態時,我意識到有些文章根本不會轉換為DynamoDB表條目。
After checking out the logs, I found this Exception which absolutely confused me…
在檢查了日志之后,我發現了這個使我完全困惑的異常…
To be honest, this was a really weird behaviour since, as stated in the discussion, the semantics and usage of an empty String are absolutely different than that of a Null value.
老實說,這是一個非常奇怪的行為,因為正如討論中所述,空String的語義和用法與Null值的語義和用法完全不同。
However, since I couldn’t change anything about the design of the DynamoDB, I had to find a solution to avoid getting the empty String error.
但是,由于我無法更改DynamoDB的設計,因此我不得不找到一種解決方案來避免出現空的String錯誤。
In my case, it was really easy. I just iterated through the whole JSON object and checked whether there was an empty String or not. If there was, I just replaced the value with null. That’s it, works like charm and does not cause any problems. (I needed to check if it has a value in the frontend, though, since getting the length of a null value throws an error).
就我而言,這真的很容易。 我只是遍歷整個JSON對象,并檢查是否有空的String。 如果有的話,我只是將值替換為null。 就是這樣,就像魅力一樣工作,不會引起任何問題。 (不過,我需要檢查其前端是否具有值,因為獲取空值的長度會引發錯誤)。
前端 (Frontend)
The last part was to actually create a frontend and deploy it so people could visit the page and see the StateOfVeganism.
最后一部分是實際創建一個前端并進行部署,以便人們可以訪問該頁面并查看StateOfVeganism 。
Of course, I was thinking about whether I should use one of those fancy frontend frameworks like Angular, React or Vue.js… But, well, I went for absolutely old school, plain HTML, CSS and JavaScript.
當然,我在考慮是否應該使用像Angular,React或Vue.js這樣的高級前端框架之一。但是,好吧,我去了絕對古老的學校,純HTML,CSS和JavaScript。
The idea I had for the frontend was extremely minimalistic. Basically it was just a bar that was divided into three sections: Positive, Neutral and Negative. When clicking on either one of those, it would display some titles and links to articles that were classified with this sentiment.
我對前端的想法極簡主義 。 基本上,它只是一個分為三個部分的標準:積極,中立和消極。 當單擊其中任何一個時,它將顯示一些標題和指向以此情感分類的文章的鏈接。
In the end, that was exactly what it turned out to be. You can check out the page here. I thought about making it live at stateOfVeganism.com, but we’ll see…
最后,這就是事實。 您可以在此處簽出該頁面。 我曾考慮過在stateOfVeganism.com上發布它,但我們會看到…
Make sure to note the funny third article of the articles that have been classified as “Negative” ;)
請確保注意被分類為“負面”的文章中有趣的第三篇文章;)
Deploying the frontend on one of AWS’ services was something else I had to think about. I definitely wanted to take a service that already incorporated elastic scaling, so I had to decide between Elastic Container Service or Elastic Beanstalk (actual EC2 instances).
我還必須考慮在AWS的一項服務上部署前端。 我絕對想使用已經包含彈性縮放的服務,因此我必須在Elastic Container Service或Elastic Beanstalk(實際EC2實例)之間做出選擇。
In the end, I went for Beanstalk, since I really liked the straightforward approach and the incredibly easy deployment. You can basically compare it to Heroku in the way you set it up.
最后,我選擇了Beanstalk,因為我真的很喜歡這種簡單的方法和極其簡單的部署。 您基本上可以按照設置方式將其與Heroku進行比較。
Side note: I had some problems with my auto scaling group not being allowed to deploy EC2 instances, because I use the free tier on AWS. But after a few emails with AWS support, everything worked right out of the box.
旁注:由于我在AWS上使用免費層,因此我的自動伸縮組無法部署EC2實例時遇到了一些問題。 但是,在收到幾封具有AWS支持的電子郵件之后,一切都立即可用。
I just deployed a Node.js Express Server Application that serves my frontend on each path.
我剛剛部署了一個Node.js Express Server Application,該服務器在每個路徑上都為我的前端服務。
This setup, by default, provides the index.html which resides in the “public” folder, which is exactly what I wanted.
默認情況下,此設置提供了index.html,它位于“ public”文件夾中,而這正是我想要的。
Of course this is the most basic setup. For most applications, it’s not the recommended way, since you somehow have to provide the credentials in order to access the DynamoDB table. It would be better to do some server-side rendering and store the credentials in environment variables so that nobody can access them.
當然,這是最基本的設置。 對于大多數應用程序,這不是推薦的方法,因為您必須以某種方式提供憑據才能訪問DynamoDB表。 最好進行一些服務器端渲染并將憑據存儲在環境變量中,以使任何人都無法訪問它們。
玩得很酷,并在前端部署AWS密鑰 (Playing it cool and deploying the AWS keys in the front end)
This is something you should never do. However, since I restricted the access of those credentials to only the scan method of the DynamoDB table, you can get the chance to dig deeper into my data if you’re interested.
這是您永遠都不應做的事情。 但是,由于我僅將這些憑據的訪問權限限制為DynamoDB表的掃描方法,因此,如果您有興趣,就可以有機會更深入地研究我的數據。
I also restricted the number of requests that can be done, so that the credentials will “stop working” once the free monthly limit has been surpassed, just to make sure.
我還限制了可以完成的請求數量,以確保一旦超過免費每月限制,憑據將“停止工作”。
But feel free to look at the data and play around a little bit if you’re interested. Just make sure to not overdo it, since the API will stop providing the data to the frontend at some point.
但是,如果您有興趣,可以隨時查看數據并進行一些操作。 只要確保不要過度使用它,因為API有時會停止向前端提供數據。
政策,政策?…政策! (Policies, Policies?… Policies!)
When I started working with cloud technologies, I realised that there has to be a way to allow/restrict access to the single components and create relations. This is where policies come into place. They also help you do access management by giving you the tools you need to give specific users and groups permissions. At one point, you’ll probably struggle with this topic, so it makes sense to read up on it a little bit.
當我開始使用云技術時,我意識到必須有一種允許/限制對單個組件的訪問并創建關系的方法。 這就是制定政策的地方。 它們還為您提供了授予特定用戶和組權限所需的工具,從而幫助您進行訪問管理。 有時,您可能會在這個主題上苦苦掙扎,因此,對此稍作閱讀是有意義的。
There are basically two types of policies in AWS. Both are simple JSON style configuration files. However, one of them is assigned to the resource itself, for example S3, and the other one gets assigned to roles, users, or groups.
AWS中基本上有兩種策略。 兩者都是簡單的JSON樣式配置文件。 但是,其中一個分配給資源本身,例如S3,另一個分配給角色,用戶或組。
The table below shows some very rough statements about which policy you might want to choose for your task.
下表顯示了一些非常粗糙的陳述,說明您可能希望為任務選擇哪種策略。
So, what is the actual difference? This might become clearer when we compare examples of both policy types.
那么,實際區別是什么? 當我們比較兩種策略類型的示例時,這可能會變得更加清晰。
The policy on the left is the IAM-Policy (or Identity-Based). The right one is the Resource-(Based)-Policy.
左側的策略是IAM策略(或基于身份的)。 正確的是資源(基于)策略。
If we start to compare them line by line, we can’t see any difference until we reach the first statement which defines some rules related to some service. In this case, it’s S3.
如果我們開始逐行比較它們,那么直到到達定義了與某些服務相關的某些規則的第一條語句之前,我們看不到任何區別。 在這種情況下,它是S3。
In the Resource-Policy, we see an attribute that is called “Principal” which is missing in the IAM-Policy. In the context of a Resource-Policy, this describes the entities that are “assigned” to this rule. In the example given above, this would be the users, Alice and root.
在資源策略中,我們看到一個稱為“委托人”的屬性,該屬性在IAM策略中丟失。 在資源策略的上下文中,這描述了“分配”給該規則的實體。 在上面給出的示例中,這將是用戶Alice和root。
On the other hand, to achieve the exact same result with IAM-Policies, we would have to assign the policy on the left to our existing users, Alice and root.
另一方面,要使用IAM策略獲得完全相同的結果,我們必須將左側的策略分配給現有用戶Alice和root。
Depending on your use case, it might make sense to use one or the other. It’s also a question of what your “style” or the convention or your workplace is.
根據您的用例,可能需要使用其中一個。 這也是您的“風格”,慣例或工作場所的問題。
下一步是什么? (What’s next?)
StateOfVeganism is live already. However, this does not mean that there is nothing to improve. One thing I definitely have to work on is, for example, that recipes from Pinterest are not classified as “Positive” but rather “Neutral”. But the basic functionality is working as expected. The data pipeline works nicely, and if anything should go wrong, I will have nice logging with CloudWatch already enabled.
素食主義已經存在。 但是,這并不意味著沒有任何改善。 我肯定要處理的一件事是,例如,Pinterest配方未分類為“正”,而是“中性”。 但是基本功能正在按預期方式工作。 數據管道運行良好,如果出現任何問題,我將在啟用CloudWatch的情況下進行日志記錄。
It’s been great to really think through and build such a system. Questioning my decisions was very helpful in optimising the whole architecture.
認真考慮并構建這樣的系統真是太好了。 質疑我的決定對優化整個體系結構很有幫助。
The next time you’re thinking about building a side project, think about building it with one of the cloud providers. It might be a bigger time investment in the beginning, but learning how to use and build systems with an infrastructure like AWS really helps you to grow as a developer.
下一次考慮構建輔助項目時,請考慮與其中一個云提供商一起構建它。 一開始可能會花費更多時間,但是學習如何使用和構建類似AWS這樣的基礎架構的系統確實可以幫助您成長為一名開發人員 。
I’d love to hear about your projects and what you build. Reach out and tell me about them.
我希望知道您的項目和建造的內容。 伸出手 ,告訴我有關他們的信息。
Thank you for reading. Be sure to follow me on YouTube and to star StateOfVeganism on GitHub.
感謝您的閱讀。 確保在YouTube上關注我,并在GitHub上為StateOfVeganism加上星號。
Don’t forget to hit the clap button and follow me on Twitter, GitHub, Youtube, and Facebook to follow me on my journey.
不要忘了按下拍手按鈕,并在Twitter , GitHub , Youtube和Facebook上關注我,以跟隨我的旅程。
I’m always looking for new opportunities.So please, feel free to contact me. I’d love to get in touch with you.
我一直在尋找新的機會。 請隨時與我聯系 。 我很想與您聯系。
Also, I’m currently planning to do a half year internship in Singapore starting in March 2019. I’d like to meet as many of you as possible. If you live in Singapore, please reach out. Would love to have a chat over coffee or lunch.
此外,我目前正計劃從2019年3月開始在新加坡進行為期半年的實習。我希望與您見面。 如果您住在新加坡,請與我們聯系。 很想在喝咖啡或午餐時聊天。
翻譯自: https://www.freecodecamp.org/news/how-to-build-a-fully-scalable-architecture-with-aws-5c4e8612565e/
aws架構