乞力馬扎羅山 海明威
I’ve been using the Hemingway App to try to improve my posts. At the same time I’ve been trying to find ideas for small projects. I came up with the idea of integrating a Hemingway style editor into a markdown editor. So I needed to find out how Hemingway worked!
我一直在使用海明威應用程序來嘗試改善我的帖子。 同時,我一直在努力尋找小型項目的想法。 我想到了將海明威樣式編輯器集成到markdown編輯器中的想法。 因此,我需要了解海明威的運作方式!
掌握邏輯 (Getting the Logic)
I had no idea how the app worked when I first started. It could have sent the text to a server to calculate the complexity of the writing, but I expected it to be calculated client side.
我不知道我第一次啟動時該應用程序是如何工作的。 它可以將文本發送到服務器以計算編寫的復雜程度,但我希望它可以在客戶端進行計算。
Opening developer tools in Chrome ( Control + Shift + I or F12 on Windows/Linux, Command + Option + I on Mac) and navigating to Sources provided the answers. There, I found the file I was looking for: hemingway3-web.js.
在Chrome中打開開發人員工具(在Windows / Linux上為Control + Shift + I或F12,在Mac上為Command + Option + I),然后導航至Sources提供了答案。 在那里,我找到了要查找的文件: hemingway3-web.js。
This code is in a minified form, which is a pain to read and understand. To solve this, I copied the file into VS Code and formatted the document (Control+ Shift + I for VS Code). This changes a 3-line file into a 4859-line file with everything formatted nicely.
該代碼采用最小化形式,難以閱讀和理解。 為了解決這個問題,我將文件復制到VS Code并格式化了文檔(VS Code為Control + Shift + I )。 這會將3行文件更改為4859行文件,所有文件的格式都很好。
探索代碼 (Exploring the Code)
I started to look through the file for anything that I could make sense of. The start of the file contained immediately invoked function expressions. I had little idea of what was happening.
我開始瀏覽文件,以查找所有我可能理解的東西。 文件的開頭包含立即調用的函數表達式。 我對發生的事情一無所知。
!function(e) {function t(r) {if (n[r])return n[r].exports;var o = n[r] = {exports: {},id: r,loaded: !1};
...
This continued for about 200 lines before I decided that I was probably reading the code to make the page run (React?). I started skimming through the rest of the code until I found something I could understand. (I missed quite a lot that I would later find through finding function calls and looking at the function definition).
這持續了大約200行,然后我決定我可能正在閱讀使頁面運行的代碼(對嗎?)。 我開始瀏覽其余的代碼,直到發現我可以理解的內容。 (我錯過了很多,以后會通過查找函數調用并查看函數定義來發現)。
The first bit of code I understood was all the way at line 3496!
我理解的第一部分代碼一直在3496行!
getTokens: function(e) {var t = this.getAdverbs(e), n = this.getQualifiers(e),r = this.getPassiveVoices(e), o = this.getComplexWords(e);return [].concat(t, n, r, o).sort(function(e, t) {return e.startIndex - t.startIndex})
}
And amazingly, all these functions were defined right below. Now I knew how the app defined adverbs, qualifiers, passive voice, and complex words. Some of them are very simple. The app checks each word against lists of qualifiers, complex words, and passive voice phrases. this.getAdverbs
filters words based on whether they end in ‘ly’ and then checks whether it’s in the list of non-adverb words ending in ‘ly’.
令人驚訝的是,所有這些功能都在下面定義。 現在,我知道了該應用程序如何定義副詞,限定詞,被動語態和復雜的單詞。 其中一些非常簡單。 該應用程序根據限定詞,復雜詞和被動語音短語列表檢查每個詞。 this.getAdverbs
根據是否以'ly'結尾的單詞進行過濾,然后檢查其是否在以'ly'結尾的非副詞列表中。
The next bit of useful code was the implementation of highlighting words or sentences. In this code there is a line:
下一個有用的代碼是突出顯示單詞或句子的實現。 這段代碼中有一行:
e.highlight.hardSentences += h
‘hardSentences’ was something I could understand, something with meaning. I then searched the file for hardSentences
and got 13 matches. This lead to a line that calculated the readability stats:
“ hardSentences”是我能理解的,有意義的東西。 然后,我在文件中搜索hardSentences
并獲得了13個匹配項。 這導致一行計算了可讀性統計信息:
n.stats.readability === i.default.readability.hard && (e.hardSentences += 1),
n.stats.readability === i.default.readability.veryHard && (e.veryHardSentences += 1)
Now I knew that there was a readability
parameter in both stats
and i.default
. Searching the file, I got 40 matches. One of those matches was a getReadabilityStyle
function, where they grade your writing.
現在我知道在stats
和i.default
中都有一個readability
參數。 搜索文件,我找到40個匹配項。 其中一項匹配項是getReadabilityStyle
函數,可在其中對您的寫作進行評分。
There are three levels: normal, hard and very hard.
分為三個級別:正常,困難和非常困難。
t = e.words;
n = e.readingLevel;
return t < 14? i.default.readability.normal: n >= 10 && n < 14? i.default.readability.hard: n >= 14 ? i.default.readability.veryHard : i.default.readability.normal;
“Normal” is less than 14 words, “hard” is 10–14 words, and “very hard” is more than 14 words.
“正常”少于14個單詞,“困難”為10-14個單詞,“非常困難”大于14個單詞。
Now to find how to calculate the reading level.
現在找到如何計算閱讀水平。
I spent a while here trying to find any notion of how to calculate the reading level. I found it 4 lines above the getReadabilityStyle
function.
我在這里花了一段時間試圖找到關于如何計算閱讀水平的任何概念。 我在getReadabilityStyle
函數上方找到4行。
e = letters in paragraph;
t = words in paragraph;
n = sentences in paragraph;getReadingLevel: function(e, t, n) {if (0 === t 0 === n) return 0;var r = Math.round(4.71 * (e / t) + 0.5 * (t / n) - 21.43);return r <= 0 ? 0 : r;
}
That means your score is 4.71 * average word length + 0.5 * average sentence length -21.43. That’s it. That is how Hemingway grades each of your sentences.
這意味著您的分數是4.71 *平均單詞長度+ 0.5 *平均句子長度-21.43。 而已。 這就是海明威為您的每個句子評分的方式。
我發現的其他有趣的東西 (Other Interesting Things I Found)
- The highlight commentary (information about your writing on the right hand side) is a big switch statement. Ternary statements are used to change the response based on how well you’ve written. 最重要的評論(關于您的寫作的信息在右側)是一個重要的聲明。 三元語句用于根據您的寫作水平來更改響應。
- The grading goes up to 16 before it’s classed as “Post-Graduate” level. 在被歸類為“研究生”級別之前,該評分最高可達16。
我要怎么辦 (What I’m going to do with this)
I am planning to make a basic website and apply what I’ve learned from deconstructing the Hemingway app. Nothing fancy, more as an exercise for implementing some logic. I’ve built a Markdown previewer before, so I might also try to create a writing application with the highlighting and scoring system.
我打算建立一個基本的網站,并運用我從解構海明威應用程序中學到的知識。 沒什么,更像是實施一些邏輯的練習。 我之前已經構建了Markdown預覽器,所以我也可以嘗試使用突出顯示和評分系統創建一個書寫應用程序。
創建我自己的海明威應用程序 (Creating My Own Hemingway App)
Having figured out how the Hemingway app works, I then decided to implement what I had learnt to make a much simplified version.
在弄清楚了海明威應用程序的工作原理之后,我決定實施我學到的東西來制作一個簡化得多的版本。
I wanted to make sure that I was keeping it basic, focusing on the logic more that the styling. I chose to go with a simple text box entry box.
我想確保自己保持基本狀態,而不是僅關注樣式邏輯。 我選擇了一個簡單的文本框輸入框。
挑戰性 (Challenges)
1. How to assure performance. Rescanning the whole document on every key press could be very computationally expensive. This could result in UX blocking which is obviously not what we want.
1.如何確保性能。 在每次按鍵時重新掃描整個文檔可能在計算上非常昂貴。 這可能會導致UX阻止,這顯然不是我們想要的。
2. How to split up the text into paragraphs, sentences and words for highlighting.
2.如何將文本分為段落,句子和單詞以突出顯示。
可能的解決方案 (Possible Solutions)
- Only rescan the paragraphs that change. Do this by counting the number of paragraphs and comparing that to the document before the change. Use this to find the paragraph that has changed or the new paragraph and only scan that one. 僅重新掃描更改的段落。 通過計算段落數并將其與更改前的文檔進行比較來做到這一點。 使用它來查找已更改的段落或新段落,然后僅掃描該段落。
- Have a button to scan the document. This massively reduces the calls of the scanning function. 有一個按鈕來掃描文檔。 這大大減少了掃描功能的調用。
2. Use what I learnt from Hemingway?—?every paragraph is a <p> and any sentences or words that need highlighting are wrapped in an internal <span> with the necessary class.
2.使用我從海明威中學到的知識-每個段落都是一個<p>,任何需要突出顯示的句子或單詞都包裝在帶有必需類的內部<span>中。
構建應用 (Building the App)
Recently I’ve read a lot of articles about building a Minimum Viable Product (MVP) so I decided that I would run this little project the same. This meant keeping everything simple. I decided to go with an input box, a button to scan and an output area.
最近,我讀了很多有關構建最低限度可行產品(MVP)的文章,因此我決定我將以同樣的方式運行這個小項目。 這意味著保持一切簡單。 我決定帶一個輸入框,一個要掃描的按鈕和一個輸出區域。
This was all very easy to set up in my index.html file.
在我的index.html文件中設置所有這些都很容易。
<link rel=”stylesheet” href=”index.css”>
<title>Fake Hemingway</title>
<div><h1>Fake Hemingway</h1><textarea name=”” id=”text-area” rows=”10"></textarea><button onclick=”format()”>Test Me</button><div id=”output”></div>
</div>
<script src=”index.js”></script>
Now to start on the interesting part. Now to get the Javascript working.
現在開始有趣的部分。 現在開始運行Javascript。
The first thing to do was to render the text from the text box into the output area. This involves finding the input text and setting the output’s inner html to that text.
首先要做的是將文本從文本框中渲染到輸出區域中。 這涉及查找輸入文本并將輸出的內部html設置為該文本。
function format() {let inputArea = document.getElementById(“text-area”);let text = inputArea.value;let outputArea = document.getElementById(“output”);outputArea.innerHTML = text;
}
Next is getting the text split into paragraphs. This is accomplished by splitting the text by ‘\n’ and putting each of these into a <p> tag. To do this we can map over the array of paragraphs, putting them in between <p> tags. Using template strings makes doing this very easy.
接下來是將文本分成幾段。 這可以通過用'\ n'分割文本并將每個文本放入<p>標記中來實現。 為此,我們可以映射段落數組,將其放在<p>標記之間。 使用模板字符串使此操作非常容易。
let paragraphs = text.split(“\n”);
let inParagraphs = paragraphs.map(paragraph => `<p>${paragraph}</p>`);
outputArea.innerHTML = inParagraphs.join(“ “);
Whilst I was working though that, I was becoming annoyed having to copy and paste the test text into the text box. To solve this, I implemented an Immediately Invoked Function Expression (IIFE) to populate the text box when the web page renders.
雖然我當時正在工作,但是我不得不將測試文本復制并粘貼到文本框中感到非常惱火。 為了解決這個問題,我實現了立即調用函數表達式(IIFE),以便在渲染網頁時填充文本框。
(function start() {let inputArea = document.getElementById(“text-area”);let text = `The app highlights lengthy, …. compose something new.`;inputArea.value = text;
})();
Now the text box was pre-populated with the test text whenever you load or refresh the web page. Much simpler.
現在,無論何時加載或刷新網頁,文本框都會預填充測試文本。 簡單得多。
突出顯示 (Highlighting)
Now that I was rendering the text well and I was testing on a consistent text, I had to work on the highlighting. The first type of highlighting I decided to tackle was the hard and very hard sentence highlighting.
既然我已經很好地渲染了文本,并且正在對一致的文本進行測試,那么我必須進行突出顯示。 我決定要解決的第一種突出顯示方式是句子的突出顯示。
The first stage of this is to loop over every paragraph and split them into an array of sentences. I did this using a `split()` function, splitting on every full stop with a space after it.
第一步是遍歷每個段落并將它們分成句子數組。 我使用一個`split()`函數來做到這一點,在每個句號處都用空格分隔。
let sentences = paragraph.split(‘. ’);
From Heminway I knew that I needed to calculate the number of words and level of each of the sentences. The level of the sentence is dependant on the average length of words and the average words per sentence. Here is how I calculated the number of words and the total words per sentence.
從海明威,我知道我需要計算單詞的數量和每個句子的級別。 句子的級別取決于單詞的平均長度和每個句子的平均單詞。 這是我計算每個句子的單詞數和總單詞數的方法。
let words = sentence.split(“ “).length;
let letters = sentence.split(“ “).join(“”).length;
Using these numbers, I could use the equation that I found in the Hemingway app.
使用這些數字,我可以使用在海明威應用程序中找到的方程式。
let level = Math.round(4.71 * (letters / words) + 0.5 * words / sentences — 21.43);
With the level and number of words for each of the sentences, set their difficulty level.
使用每個句子的單詞級別和數量,設置其難度級別。
if (words < 14) {return sentence;
} else if (level >= 10 && level < 14) {return `<span class=”hardSentence”>${sentence}</span>`;
} else if (level >= 14) {return `<span class=”veryHardSentence”>${sentence}</span>`;
} else {return sentence;
}
This code says that if a sentence is longer than 14 words and has a level of 10 to 14 then its hard, if its longer than 14 words and has a level of 14 or up then its very hard. I used template strings again but include a class in the span tags. This is how I’m going to define the highlighting.
該代碼表示??,如果句子長于14個單詞且級別為10到14,則很難;如果句子長于14個單詞且級別為14或更高,則它很難。 我再次使用了模板字符串,但在span標簽中包含了一個類。 這就是我要定義突出顯示的方式。
The CSS file is really simple; it just has each of the classes (adverb, passive, hardSentence) and sets their background colour. I took the exact colours from the Hemingway app.
CSS文件非常簡單。 它僅具有每個類(副詞,被動,hardSentence)并設置其背景色。 我從海明威應用程序中提取了確切的顏色。
Once the sentences have been returned, I join them all together to make each of the paragraphs.
句子返回后,我將它們全部合并在一起以構成每個段落。
At this point, I realised that there were a few problems in my code.
至此,我意識到我的代碼中存在一些問題。
- There were no full stops. When I split the paragraphs into sentences, I had removed all of the full stops. 沒有句號。 當我將段落分成句子時,我刪除了所有句號。
- The numbers of letters in the sentence included the commas, dashes, colons and semi-colons. 句子中字母的數量包括逗號,破折號,冒號和分號。
My first solution was very primitive but it worked. I used split(‘symbol’) and join(‘’) to remove the punctuation and then appended ‘.’ onto the end. Whist it worked, I searched for a better solution. Although I don’t have much experience using regex, I knew that it would be the best solution. After some Googling I found a much more elegant solution.
我的第一個解決方案非常原始,但是有效。 我使用split('symbol')和join('')刪除標點符號,然后附加了'。'。 到最后。 一直奏效,我一直在尋找更好的解決方案。 盡管我沒有太多使用正則表達式的經驗,但我知道這將是最好的解決方案。 經過一番谷歌搜索后,我發現了一個更為優雅的解決方案。
let cleanSentence = sent.replace(/[^a-z0–9. ]/gi, “”) + “.”;
With this done, I had a partially working product.
完成此操作后,我得到了部分工作的產品。
The next thing I decided to tackle was the adverbs. To find an adverb, Hemingway just finds words that end in ‘ly’ and then checks that it isn’t on a list of non-adverb ‘ly’ words. It would be bad if ‘apply’ or ‘Italy’ were tagged as adverbs.
我決定解決的下一件事是副詞。 為了找到副詞,海明威只是找到以“ ly”結尾的單詞,然后檢查它是否不在非副詞“ ly”單詞列表中。 如果將'apply'或'Italy'標記為副詞,那將是不好的。
To find these words, I took the sentences and split them into an arary of words. I mapped over this array and used an IF statement.
為了找到這些單詞,我采用了句子并將其拆分為單詞集。 我在此數組上映射并使用了IF語句。
if(word.match(/ly$/) &&, !lyWords[word] ){return `<span class=”adverb”>${word}</span>`;
} else {return word
};
Whist this worked most of the time, I found a few exceptions. If a word was followed by a punctuation mark then it didn’t match ending with ‘ly’. For example, “The crocodile glided elegantly; it’s prey unaware” would have the word ‘elegantly;’ in the array. To solve this I reused the .replace(/^a-z0-9. ]/gi,””)
functionality to clean each of the words.
在大多數情況下,這都是可行的,我發現了一些例外。 如果單詞后面帶有標點符號,則該單詞與“ ly”結尾不匹配。 例如,“鱷魚優雅滑翔; 獵物沒有意識到”會帶有“優雅”一詞; 在數組中。 為了解決這個問題,我重用了.replace(/^a-z0-9. ]/gi,””)
功能來清理每個單詞。
Another exception was if the word was capitalised, which was easily solved by calling toLowerCase()
on the string.
另一個例外是單詞大寫,可以通過在字符串上調用toLowerCase()
輕松解決。
Now I had a result that worked with adverbs and highlighting individual words. I then implemented a very similar method for complex and qualifying words. That was when I realised that I was no longer just looking for individual words, I was looking for phrases. I had to change my approach from checking if each word was in the list to seeing if the sentence contained each of the phrases.
現在,我得到了處理副詞并突出顯示單個單詞的結果。 然后,我對復雜且合格的單詞實施了一種非常相似的方法。 從那時起,我意識到我不再只是在尋找單個單詞,而是在尋找短語。 我不得不將方法從檢查每個單詞是否在列表中更改為查看句子是否包含每個短語。
To do this I used the .indexOf()
function on the sentences. If there was an index of the word or phrase, I inserted an opening span tag at that index and then the closing span tag after the key length.
為此,我在句子上使用了.indexOf()
函數。 如果有單詞或短語的索引,我會在該索引處插入一個開始跨度標簽,然后在鍵長之后插入一個結束跨度標簽。
let qualifiers = getQualifyingWords();
let wordList = Object.keys(qualifiers);
wordList.forEach(key => {let index = sentence.toLowerCase().indexOf(key);if (index >= 0) {sentence =sentence.slice(0, index) +‘<span class=”qualifier”>’ +sentence.slice(index, index + key.length) +“</span>” +sentence.slice(index + key.length);}
});
With that working, it’s starting to look more and more like the Hemingway editor.
有了這項工作,它開始看起來越來越像海明威編輯器。
The last piece of the highlighting puzzle to implement was the passive voice. Hemingway used a 30 line function to find all of the passive phrases. I chose to use most of the logic that Hemingway implemented, but order the process differently. They looked to find any words that were in a list (is, are, was, were, be, been, being) and then checked whether the next word ended in ‘ed’.
突出顯示難題的最后一部分是被動語態。 海明威使用30行函數查找所有被動短語。 我選擇使用海明威實現的大多數邏輯,但是對過程的排序不同。 他們尋找找到列表中的任何單詞(是,曾經,曾經,曾經,是,曾經,存在),然后檢查下一個單詞是否以“ ed”結尾。
I looped though each of the words in a sentence and checked if they ended in ‘ed’. For every ‘ed’ word I found, I checked whether the previous word was in the list of pre-words. This seemed much simpler, but may be less performant.
我遍歷句子中的每個單詞,并檢查它們是否以“ ed”結尾。 對于找到的每個“ ed”單詞,我都會檢查前一個單詞是否在預單詞列表中。 這看似簡單得多,但性能可能較差。
With that working I had an app that highlighted everything I wanted. This is my MVP.
通過這項工作,我有了一個突出顯示我想要的一切的應用程序。 這是我的MVP。
然后我遇到了一個問題 (Then I hit a problem)
As I was writing this post I realised that there were two huge bugs in my code.
當我寫這篇文章時,我意識到我的代碼中有兩個巨大的錯誤。
// from getQualifier and getComplex
let index = sentence.toLowerCase().indexOf(key);
// from getPassive
let index = words.indexOf(match);
These will only ever find the first instance of the key or match. Here is an example of the results this code will produce.
這些只會找到鍵或匹配項的第一個實例。 這是此代碼將產生的結果的示例。
‘Perhaps’ and ‘been marked’ should have been highlighted twice each but they aren’t.
“也許”和“被標記”應分別高亮兩次,但不是。
To fix the bug in getQualifier and getComplex, I decided to use recursion. I created a findAndSpan
function which uses .indexOf()
to find the first instance of the word or phrase. It splits the sentence into 3 parts: before the phrase, the phrase, after the phrase. The recursion works by passing the ‘after the phrase’ string back into the function. This will continue until there are no more instances of the phrase, where the string will just be passed back.
為了修復getQualifier和getComplex中的錯誤,我決定使用遞歸。 我創建了一個findAndSpan
函數,該函數使用。 indexOf()
查找單詞或短語的第一個實例。 它將句子分為三部分:短語之前,短語,短語之后。 遞歸通過將“短語后”字符串傳遞回函數來工作。 這將繼續進行,直到不再有該短語的實例為止,在該實例中該字符串將被傳遞回去。
function findAndSpan(sentence, string, type) {let index = sentence.toLowerCase().indexOf(key);if (index >= 0) {sentence =sentence.slice(0, index) +`<span class="${type}">` +sentence.slice(index, index + key.length) +"</span>" +findAndSpan(sentence.slice(index + key.length), key,type);}return sentence;
}
Something very similar had to be done for the passive voice. The recursion was in an almost identical pattern, passing the leftover array items instead of the leftover string. The result of the recursion call was spread into an array that was then returned. Now the app can deal with repeated adverbs, qualifiers, complex phrases and passive voice uses.
對于被動語音,必須做一些非常相似的事情。 遞歸以幾乎相同的模式進行,傳遞剩余的數組項而不是剩余的字符串。 遞歸調用的結果被傳播到一個數組中,然后返回該數組。 現在,該應用程序可以處理重復的副詞,限定詞,復雜的短語和被動語音用法。
統計計數器 (Statistics Counter)
The last thing that I wanted to get working was the nice line of boxes informing you on how many adverbs or complex words you’d used.
我要開始工作的最后一件事是用漂亮的方框來告知您使用了多少個副詞或復雜詞。
To store the data I created an object with keys for each of the parameters I wanted to count. I started by having this variable as a global variable but knew I would have to change that later.
為了存儲數據,我為每個要計數的參數創建了一個帶有鍵的對象。 我首先將此變量作為全局變量,但是知道以后必須更改它。
Now I had to populate the values. This was done by incrementing the value every time it was found.
現在,我必須填充值。 這是通過在每次找到該值時增加該值來完成的。
data.sentences += sentence.length
or
data.adverbs += 1
The values needed to be reset every time the scan was run to make sure that values didn’t continuously increase.
每次運行掃描時都需要重置這些值,以確保這些值不會持續增加。
With the values I needed, I had to get them rendering on the screen. I altered the structure of the html file so that the input box and output area were in a div on the left, leaving a right div for the counters. These counters are empty divs with an appropriate id and class as well as a ‘counter’ class.
有了我需要的值,我不得不將它們呈現在屏幕上。 我更改了html文件的結構,以使輸入框和輸出區域位于左側的div中,為計數器保留了右側的div。 這些計數器是具有適當ID和類以及“計數器”類的空div。
<div id=”adverb” class=”adverb counter”></div>
<div id=”passive” class=”passive counter”></div>
<div id=”complex” class=”complex counter”></div>
<div id=”hardSentence” class=”hardSentence counter”></div>
<div id=”veryHardSentence” class=”veryHardSentence counter”></div>
With these divs, I used document.querySelector to set the inner html for each of the counters using the data that had been collected. With a little bit of styling of the ‘counter’ class, the web app was complete. Try it out here or look at my code here.
通過這些div,我使用document.querySelector使用已收集的數據為每個計數器設置內部html。 通過對“ counter”類進行一些樣式設置,該Web應用程序就完整了。 在這里嘗試或在這里查看我的代碼。
翻譯自: https://www.freecodecamp.org/news/https-medium-com-samwcoding-deconstructing-the-hemingway-app-8098e22d878d/
乞力馬扎羅山 海明威