什么是自然語言處理,它如何工作?

Talking to a chat bot on a smartphone.
NicoElNino/Shutterstock.comNicoElNino / Shutterstock.com

Natural language processing enables computers to process what we’re saying into commands that it can execute. Find out how the basics of how it works, and how it’s being used to improve our lives.

自然語言處理使計算機能夠將我們所說的內容處理成可以執行的命令。 了解其運作方式的基礎知識,以及如何將其用于改善我們的生活。

什么是自然語言處理? (What Is Natural Language Processing?)

Whether it’s Alexa, Siri, Google Assistant, Bixby, or Cortana, everyone with a smartphone or smart speaker has a voice-activated assistant nowadays. Every year, these voice assistants seem to get better at recognizing and executing the things we tell them to do. But have you ever wondered how these assistants process the things we’re saying? They manage to do this thanks to Natural Language Processing, or NLP.

無論是Alexa,Siri,Google Assistant,Bixby還是Cortana,如今每個擁有智能手機或智能揚聲器的人都可以使用聲控助手。 每年,這些語音助手在識別和執行我們告訴他們要做的事情上似乎都變得更好。 但是您是否想知道這些助手如何處理我們所說的話? 他們借助自然語言處理(NLP)設法做到了這一點。

Historically, most software has only been able to respond to a fixed set of specific commands. A file will open because you clicked Open, or a spreadsheet will compute a formula based on certain symbols and formula names. A program communicates using the programming language that it was coded in, and will thus produce an output when it is given input that it recognizes. In this context, words are like a set of different mechanical levers that always provide the desired output.

從歷史上看,大多數軟件只能響應一組固定的特定命令。 一個文件將打開,因為你點擊打開,或電子表格將計算公式基于一定的符號和公式的名稱。 程序使用其編碼所用的編程語言進行通信,因此當獲得可識別的輸入時,它將產生輸出。 在這種情況下,詞語就像總是提供所需輸出的一組不同的機械桿。

This is in contrast to human languages, which are complex, unstructured, and have a multitude of meanings based on sentence structure, tone, accent, timing, punctuation, and context.?Natural Language Processing is a branch of artificial intelligence that attempts to bridge that gap between what a machine recognizes as input and the human language. This is so that when we speak or type naturally, the machine produces an output in line with what we said.

這與人類語言相反,人類語言復雜,無結構,并且具有基于句子結構,語調,重音,時間,標點和上下文的多種含義。 自然語言處理是人工智能的一個分支,它試圖彌合機器識別為輸入的語言與人類語言之間的鴻溝。 這樣一來,當我們自然說話或打字時,機器會產生與我們所說的一致的輸出。

This is done by taking vast amounts of data points to derive meaning from the various elements of the human language, on top of the meanings of the actual words. This process is closely tied with the concept known as machine learning, which enables computers to learn more as they obtain more points of data. That is the reason why most of the natural language processing machines we interact with frequently seem to get better over time.

這是通過在實際單詞的含義之上,通過獲取大量數據點來從人類語言的各個元素中獲取含義來實現的。 該過程與稱為機器學習的概念緊密相關,后者使計算機在獲取更多數據點時可以學習更多。 這就是為什么我們經常與之交互的大多數自然語言處理機器隨著時間的推移而變得越來越好的原因。

To illuminate the concept better, let’s have a look at two of the most top-level techniques used in NLP to process language and information.

為了更好地闡明這一概念,讓我們看一下NLP中用于處理語言和信息的兩種最高級技術。

代幣化 (Tokenization)

tokenization natural language processing

Tokenization means splitting up speech into words or sentences. Each piece of text is a token, and these tokens are what show up when your speech is processed. It sounds simple, but in practice, it’s a tricky process.

標記化是指將語音分為單詞或句子。 每一段文本都是一個標記,這些標記是在處理語音時顯示的標記。 聽起來很簡單,但是實際上,這是一個棘手的過程。

Let’s say that you are using text-to-speech software, such as the Google Keyboard, to send a message to a friend. You want to message, “Meet me at the park.” When your phone takes that recording and processes it through Google’s text-to-speech algorithm, Google must then split what you just said into tokens. These tokens would be?“meet,” “me,” “at,” “the,” and “park”.

假設您正在使用文字轉語音軟件(例如Google鍵盤)向朋友發送消息。 您想留言,“在公園認識我”。 當您的手機錄制該記錄并通過Google的語音合成算法對其進行處理時,Google必須將您剛才所說的內容拆分為令牌。 這些標記將是“滿足”,“我”,“在”,“該”和“停放”。

People have different lengths of pauses between words, and other languages may not have very little in the way of an audible pause between words. The tokenization process varies drastically between languages and dialects.

人們在單詞之間的停頓時間長短不同,而其他語言在單詞之間的可聽停頓方面可能不會少。 語言和方言之間的分詞過程大不相同。

詞干和詞法化 (Stemming and Lemmatization)

Stemming and lemmatization both involve the process of removing additions or variations to a root word that the machine can recognize. This is done to make interpretation of speech consistent across different words that all mean essentially the same thing, which makes NLP processing faster.

詞干和詞根去除均涉及刪除機器可以識別的根詞的附加內容或變體的過程。 這樣做的目的是使語音解釋在不同的詞之間保持一致,而這些詞本質上都是同一件事,這使得NLP處理更快。

stemming natural language processing

Stemming is a crude fast process that involves removing affixes from a root word, which are additions to a word attached before or after the root. This turns the word into the simplest base form by simply removing letters. For example:

詞干處理是一個粗略的快速過程,涉及從詞根詞中刪除詞綴,詞綴是詞根之前或之后附加詞的附加詞。 只需刪除字母,即可將單詞變成最簡單的基本形式。 例如:

  • “Walking” turns into “walk”

    “走路”變成“走路”
  • “Faster” turns into “fast”

    “更快”變成“快速”
  • “Severity” turns into “sever”

    “嚴重程度”變成“嚴重程度”

As you can see, stemming may have the adverse effect of changing the meaning of a word entirely. “Severity” and “sever” do not mean the same thing, but the suffix “ity” was removed in the process of stemming.

如您所見,詞干可能會對完全改變單詞的含義產生不利影響。 “嚴重性”和“嚴重性”并不相同,但是在詞干處理過程中刪除了后綴“ ity”。

On the other hand, lemmatization is a more sophisticated process that involves reducing a word to their base, known as the?lemma.?This takes into consideration the context of the word and how it’s used in a sentence. It also involves looking up a term in a database of words and their respective lemma. For example:

另一方面,詞義化是一個更復雜的過程,涉及將單詞減少為詞根,即詞義 這考慮了單詞的上下文及其在句子中的使用方式。 它還涉及在單詞及其各自的引理的數據庫中查找術語。 例如:

  • “Are” turns into “be”

    “是”變成“是”
  • “Operation” turns into “operate”

    “經營”變成“經營”
  • “Severity” turns into “severe”

    “嚴重程度”變成“嚴重程度”

In this example, lemmatization managed to turn the term “severity” into “severe,” which is its lemma form and root word.

在此示例中,詞形化成功將術語“嚴重性”轉換為“嚴重”,這是其詞綴形式和詞根。

NLP用例和未來 (NLP Use Cases and the Future)

The previous examples only begin to scratch the surface of what Natural Language Processing is. It encompasses a wide range of practices and usage scenarios, many of which we use in our daily lives. These are a few examples of where NLP is currently in use:

前面的示例僅開始介紹自然語言處理的內容。 它涵蓋了廣泛的實踐和使用場景,我們在日常生活中使用了許多實踐和使用場景。 以下是一些當前使用NLP的示例:

  • Predictive Text:?When you type a message on your smartphone, it automatically suggests you words that fit into the sentence or that you’ve used before.

    預想文字:當您在智能手機上鍵入信息時,它會自動為您推薦適合該句子或您以前使用過的單詞。

  • Machine Translation:?Widely used consumer translating services, such as Google Translate, to incorporate a high-level form of NLP to process language and translate it.

    機器翻譯:廣泛使用的消費者翻譯服務,例如Google Translate,可以結合高級形式的NLP來處理語言并進行翻譯。

  • Chatbots:?NLP is the foundation for intelligent chatbots, especially in customer service, where they can assist customers and process their requests before they face a real person.

    聊天機器人: NLP是智能聊天機器人的基礎,尤其是在客戶服務中,他們可以在面對真正的人之前幫助客戶并處理他們的請求。

There’s more to come. NLP uses are currently being developed and deployed in fields such as news media, medical technology, workplace management, and finance. There’s a chance we may be able to have a full-fledged sophisticated conversation with a robot in the future.

還有更多。 NLP用途目前正在新聞媒體,醫療技術,工作場所管理和金融等領域開發和部署。 將來,我們有可能與機器人進行全面的復雜對話。

If you’re interested in learning more about NLP, there are a lot of fantastic resources on the Towards Data Science blog or the Standford National Langauge Processing Group that you can check out.

如果您有興趣了解有關NLP的更多信息,可以在Towards Data Science博客或Standford National Langauge Processing Group上找到很多精彩的資源,可以查閱。

翻譯自: https://www.howtogeek.com/665702/what-is-natural-language-processing-and-how-does-it-work/

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/278290.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/278290.shtml
英文地址,請注明出處:http://en.pswp.cn/news/278290.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

GIT速查手冊

為什么80%的碼農都做不了架構師?>>> 一、GIT 1.1 簡單配置 git是版本控制系統,與svn不同的是git是分布式,svn是集中式 配置文件位置 # 配置文件 .git/config 當前倉庫的配置文件 ~/.gitconfig 全局配置文件# 查看所有配置項 git …

4-3邏輯非運算符及案例 4-4

創建類 LoginDemo3 這里取反 !(n%30) package com.imooc.operator; import java.util.Scanner;public class LoginDemo3 {public static void main(String[] args) {// TODO Auto-generated method stubSystem.out.println("請輸入一個整數");Scanner scnew Scanner(…

assistant字體_如何使用Google Assistant設置和致電家庭聯系人

assistant字體Google谷歌Google Home and Nest smart speakers and displays allow you to make calls without using your phone. By setting up “Household Contacts,” anyone in your home can easily call friends and family members with Google Assistant-enabled dev…

php隊列使用

由于項目中在修改產品的同時要同步關聯水單,刪單,客保 等等數據。所以不可能等待所有都執行完畢以后再給客戶端反饋。所以自己用寫了個隊列。在這里曬出來代碼,以供大家參考。(項目中用到的是tp,所以在這里用tp作為演示) 思路 1,需要用到隊列…

Accoridion折疊面板

詳細操作見代碼&#xff1a; <!doctype html> <html><head><meta charset"UTF-8"><title></title><meta name"viewport" content"widthdevice-width,initial-scale1,minimum-scale1,maximum-scale1,user-scal…

skype快捷鍵_每個Skype鍵盤快捷鍵及其用法

skype快捷鍵Roberto Ricca/Shutterstock羅伯托里卡/ ShutterstockGet familiar with Skype’s unique keyboard shortcuts that will allow you to quickly change your settings, alter your interface, and control your communications. Use these hotkeys and become a Sky…

習慣需要堅持

近期會把本地的資料上傳分享出來&#xff0c;好久沒更新自己的內容了&#xff0c;以后會不斷的更新哦。轉載于:https://blog.51cto.com/haohao1010/2087494

YouTube鍵盤快捷鍵:速查表

Google’s video website wouldn’t be complete without all sorts of useful buttons and hidden commands that aren’t immediately obvious. Use this hotkey cheat sheet to quickly navigate YouTube and gain better control over your video browsing experience. 如果…

第五章 課本題目

例 5.1 使用單分支條件結構輸出兩個數的最大值。 #include<stdio.h> int main() { int a,b,max; scanf("%d,%d",&a,&b); if(a>b) maxa; if(a<b) maxb; printf("max%d\n",max); return 0; } 例 5.2 用雙分支條件語句求最大值。 #includ…

MySQL服務讀取參數文件my.cnf的規律研究探索

在MySQL中&#xff0c;它是按什么順序或規律去讀取my.cnf配置文件的呢&#xff1f;其實只要你花一點功夫&#xff0c;實驗測試一下就能弄清楚&#xff0c;下面的實驗環境為5.7.21 MySQL Community Server。其它版本如有不同&#xff0c;請以實際情況為準。 其實&#xff0c;MyS…

將組策略編輯器添加到控制面板

If you find yourself using the Group Policy Editor all the time, you might have wondered why it doesn’t show up in the Control Panel along with all the other tools. After many hours of registry hacking, I’ve come up with a registry tweak to let you do ju…

cookies和session區別

cookies和session區別 1、Cookie和Session都是會話技術&#xff0c;Cookie是運行在客戶端&#xff0c;Session是運行在服務器端。 2、Cookie有大小限制以及瀏覽器在存cookie的個數也有限制&#xff0c;Session是沒有大小限制和服務器的內存大小有關。3、Cookie有安全隱患&#…

Exchange Server 2016管理系列課件50.DAG管理之激活數據庫副本

激活郵箱數據庫副本是將特定被動副本指定為郵箱數據庫的新主動副本的過程。我們將此過程稱為數據庫切換。數據庫切換過程是指卸除當前的活動數據庫&#xff0c;然后在指定的服務器上將相應的數據庫副本作為新的活動郵箱數據庫副本進行裝載。成為活動郵箱數據庫的數據庫副本必須…

常見設計模式 (python代碼實現)

1.創建型模式 單例模式 單例模式&#xff08;Singleton Pattern&#xff09;是一種常用的軟件設計模式&#xff0c;該模式的主要目的是確保某一個類只有一個實例存在。當你希望在整個系統中&#xff0c;某個類只能出現一個實例時&#xff0c;單例對象就能派上用場。 比如&#…

記錄一次解決httpcline請求https報handshake_failure錯誤

概述 當使用httpclinet發起https請求時報如下錯誤&#xff1a; javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failureat com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)at com.sun.net.ssl.internal.ssl.Alerts.getSSLExcep…

桌面程序explorer_備份Internet Explorer 7搜索提供程序列表

桌面程序explorerIf you are both an IE user and a fan of using custom search providers in your search box, you might be interested to know how you can back up that list and/or restore it on another computer. Yes, this article is boring, but we’re trying to…

C++內聯函數(inline function)

c從c中繼承的一個重要特征就是效率。假如c的效率明顯低于c的效率&#xff0c;那么就會有很大的一批程序員不去使用c了。 在c中我們經常把一些短并且執行頻繁的計算寫成宏&#xff0c;而不是函數&#xff0c;這樣做的理由是為了執行效率&#xff0c;宏可以避免函數調用的開銷&am…

GreenPlum數據庫故障恢復測試

本文介紹gpdb的master故障及恢復測試以及segment故障恢復測試。 環境介紹&#xff1a;Gpdb版本&#xff1a;5.5.0 二進制版本操作系統版本&#xff1a; centos linux 7.0Master segment: 192.168.1.225/24 hostname: mfsmasterStadnby segemnt: 192.168.1.227/24 hostname: ser…

書評:Just the Computer Essentials(Vista)

Normally we try and focus on articles about how to customize your computer, but today we’ll take a break from that and do a book review. This is something I’ve not done before, so any suggestions or questions will be welcomed in the comments. 通常&#x…

RxSwift筆記七其他操作符

簡介 git地址: https://github.com/ReactiveX/RxSwift參考資料:http://t.swift.gg/d/2-rxswiftReactiveX是通過可觀察的流實現異步編程的一種API&#xff0c;它結合了觀察者模式、迭代器模式和函數式編程的精華&#xff0c;RxSwift 是 ReactiveX 編程思想的一種實現。 復制代碼…