衡量試卷難度信度_我們可以通過數字來衡量語言難度嗎?

衡量試卷難度信度

Without a doubt, the world is “growing smaller” in terms of our access to people and content from other countries and cultures. Even the COVID-19 pandemic, which has curtailed international travel, has led to increasing virtual interaction via the internet. Yet the barriers to fluent and proficient inter-language communications remain formidable.

毫無疑問,就我們接觸其他國家和文化的人們和內容而言,世界正在“變得越來越小”。 即使是減少國際旅行的COVID-19大流行,也導致通過互聯網增加虛擬互動。 然而,流暢和熟練的跨語言交流的障礙仍然巨大。

在線翻譯與語言學習 (Online Translation Versus Language Learning)

The quality of machine translation has improved dramatically in recent years, thanks to the introduction of Artificial Intelligence methods such as neural networks to the task. The AI-driven optimization of translating has trickled down rapidly to consumer apps like Google Translate and Microsoft Translator, which simplify the usage of machine translators and improve the ability to convey meaning across linguistic frontiers.

近年來,由于將諸如神經網絡之類的人工智能方法引入任務,機器翻譯的質量得到了顯著改善。 人工智能驅動的翻譯優化已Swift滲透到Google Translate和Microsoft Translator等消費類應用程序,這些應用程序簡化了機器翻譯的使用,并提高了跨語言邊界傳達含義的能力。

There’s a huge difference between translating a language via software and learning a new language. For most adults, learning a new language is hard. But some people love linguistic challenges: for them, the hardest languages to learn may be the most enjoyable to conquer. The neuroplasticity of young brains, of course, makes new language acquisition a relative snap for children. But few adults have it so easy.

通過軟件翻譯語言和學習新語言之間存在巨大差異。 對于大多數成年人來說,學習新語言非常困難。 但是有些人喜歡語言方面的挑戰:對他們而言,最難學習的語言可能是最難克服的語言 。 當然,年輕大腦的神經可塑性使新語言習得成為兒童的一個相對習慣。 但是很少有成年人這么容易。

在線語言學習及其挑戰 (Online Language Learning and Its Challenges)

Online language learning, now a $582 billion/year industry according to the ICEF, has made more convenient and easier the learning of a new language for millions. English language learning accounts for most of this total. While the popularity of English may not be surprising — it is the most spoken language and the main language of business worldwide — proficient English speakers are branching out to additional languages at a rapid pace.

根據ICEF的統計 ,在線語言學習現在的年產值為 5820億美元,它使數百萬的新語言學習更加便捷。 英語學習占總數的大部分。 盡管英語的流行并不令人驚訝-它是全球最常用的語言和主要業務語言-但是精通英語的人正在Swift向其他語言擴展。

Rosetta Stone, a leading provider of language courses, reports that Spanish topped the list of languages that British people were most eager to take on in 2018, with 23.1% of its UK learners learning the language last year. Four other European languages — French, English, Italian and German — rounded off the top five. Perhaps surprisingly, Mandarin Chinese, the most popular native language, with more than a million, was not in the next tier.

領先的語言課程提供商Rosetta Stone報告說,西班牙語是英國人最渴望在2018年采用的語言,其英國學習者中有23.1%的人去年學習了該語言。 排名前五位的還有其他四種歐洲語言-法語,英語,意大利語和德語。 也許令人驚訝的是,擁有超過一百萬種語言的最受歡迎的母語-普通話不在下一級。

No doubt the perception of that language’s difficulty played a role in its relatively low popularity ranking. Mandarin Chinese, poses major hardships for a non-Chinese speaker. And yet more than a 1.1 billion speaking, read, write and understand it fluently. So is it really hard? Or is it just unfamiliar to an English speaker? The question raises a major challenge: isn’t the perception of difficulty a totally relative matter, differing to some degree for each language learner, depending on background and education.

毫無疑問,對這種語言的困難的認識在其相對較低的流行度排名中起作用。 中文普通話給不講中文的人帶來很大的困難。 流利的說,讀,寫和理解能力超過11億。 那真的很難嗎? 還是只是不熟悉說英語的人? 這個問題提出了一個重大挑戰:對困難的理解不是一個完全相對的問題嗎,取決于背景和教育程度,每種語言學習者在一定程度上有所不同。

The challenge that faces a data scientist, of course, is how can language difficulty be measured. If we wish to split hairs, there is a distinction between the difficulty of learning a language and its inherent difficulty of usage. But for purposes of this article, we will focus on the task of evaluating a way to measure a language’s degree of difficulty, if we can borrow a term from the language of gymnastics and other competitive sports.

當然,數據科學家面臨的挑戰是如何衡量語言難度。 如果我們希望分開頭發,則在學習語言的難度和其固有的使用難度之間會有區別。 但是出于本文的目的,如果我們可以從體操和其他競技體育的語言中借用一個術語,則我們將專注于評估一種衡量語言難度的方法的任務。

方法A:向外交部咨詢 (Approach A: Ask the Foreign Service)

Nearly a decade ago, Voxy posted an infographic (shown below), sourced from the Foreign Service Institute, which breaks language difficulty for native English speakers into three neat categories: easy, medium, and hard. The basis for comparison was how long — in terms of calendar weeks and learning hours, attaining “proficiency” would be required for different languages. The site did qualify its findings by noting that difficulty depended on language complexity, how close it was to the learner’s own language (in this case, English), how many learning hours per week, and the language resources available. It appears from the chart that the basic assumption of 25 hours of learning per week.

大約十年前,Voxy發布了一個圖表(如下所示),該圖表來自外交事務學院,該指南將以英語為母語的人的語言難度分為三個簡單的類別:簡單,中等和困難。 比較的基礎是多長時間-就日歷周和學習時間而言,不同語言需要達到“熟練”水平。 該站點通過指出難度取決于語言的復雜性,與學習者自己的語言的距離(在這種情況下為英語),每周學習多少小時以及可用的語言資源,來驗證其發現。 從圖表中可以看出,每周學習25個小時的基本假設。

  • Easy (22–23 weeks, 575–600 class hours): The Romance Languages (Spanish, Portuguese, French, Italian, and Romanian) all fell in this group, along with Dutch, Afrikaans, Norwegian and Swedish

    輕松學習 (22-23周,575-600學時):浪漫語言(西班牙語,葡萄牙語,法語,意大利語和羅馬尼亞語)以及荷蘭語,南非荷蘭語,挪威語和瑞典語都屬于這一類

  • Medium (44 weeks, 1110 class hours): Russian, Polish, Serbian, Finnish, Thai and Vietnamese, Greek, Hebrew, and Hindi.

    中級 (44周,1110學時):俄語,波蘭語,塞爾維亞語,芬蘭語,泰語和越南語,希臘語,希伯來語和北印度語。

  • Hard (88 weeks, 2220 class hours): Chinese, Japanese, Korean, Arabic

    辛苦 (88周,2220課時):中文,日語,韓語,阿拉伯語

While Voxy clearly intends the chart to be a teaching tool or subject of discussion, it’s not hard to pick apart weaknesses in its analytical method. First, who is to set the bar of “proficiency”? And how to measure the quality of instruction? How to account for factor-like second-language knowledge? For a data scientist, the results would appear disappointingly arbitrary.

盡管Voxy明確希望該圖表成為教學工具或討論的主題,但不難發現其分析方法中的缺點。 首先,誰來設定“熟練”標準? 以及如何衡量教學質量? 如何解釋類??似因子的第二語言知識? 對于數據科學家來說,結果似乎是令人失望的任意。

Image for post
Voxy on Voxy攝, What Are The Hardest Languages To Learn?最難學習的語言是什么?

方法B:評分語言學習難度:多種語言的方法 (Approach B: Scoring Language Learning Difficulty: A Polyglot’s Approach)

A more intriguing approach to the problem, at least from a data science perspective, is offered by linguist Michael Campbell at Glossika. In a detailed blog post aptly titled “Language Difficulty,” he devised a scoring system for answering, numerically, the precise questions which intrigue us:

至少從數據科學的角度來看,語言學家Michael Campbell在Glossika上提供了一種更有趣的方法。 在一個恰當的標題為“語言難度”的詳細博客文章中,他設計了一種評分系統,以數字方式回答引起我們注意的精確問題:

  1. Is there an objective method for measuring language difficulty?

    是否有客觀的方法來衡量語言難度?
  2. What are the most difficult languages in the world?

    世界上最困難的語言是什么?

Distinguishing Campbell’s approach is its relativistic data-based approach. Language difficulty is based on the relative similarity between any two languages according to various criteria of linguistic complexity. Perhaps counter-intuitively, this approach actually makes an objective assessment of language learning difficulty possible, because it is based on numerical criteria that can be objectively assessed. Among the criteria he offers are:

區分坎貝爾的方法是其相對論的基于數據的方法。 語言難度是根據語言復雜性的各種標準,基于任何兩種語言之間的相對相似性。 也許與直覺相反,該方法實際上使對語言學習難度的客觀評估成為可能,因為它基于可以客觀評估的數字標準。 他提供的標準包括:

詞匯習得 (Vocabulary Acquisition)

This he considered with respect to how close the language is to the learner’s language.

他考慮到語言與學習者語言之間的接近程度。

Languages are divided into families, branches, and sub-branches. For example, English belongs to the Indo-European Proto-language, to which languages like Russian, Armenia, and Greek all belong. By contrast, Arabic, Chinese, and Japanese would be in a different family. Within the Indo-European grouping, that branch, English is a Germanic-Romance language, therefore closer to languages like German and French. In terms of similarity, English is closest in any way to German, despite grammatical differences. Similarly, Portuguese, Spanish and Italian would belong to the same sub-branch, making language-learning easier. Campbell assigns high importance to this criterion, with language-learning difficulty reflected in exponentially higher numbers. Same sub-branch branch: 0 points. Different sub-branch: 1 point. Different branches: 10 points. Different family: 100 points.

語言分為家庭,分支和分支。 例如,英語屬于印歐語系的原始語言,俄語,亞美尼亞和希臘語等語言均屬于該語言。 相比之下,阿拉伯文,中文和日文將屬于另一個家庭。 在該分支的印歐語組中,英語是日耳曼語-羅曼斯語,因此更接近德語和法語。 就相似性而言,盡管在語法上有所不同,但英語在任何方面都與德語最接近。 同樣,葡萄牙語,西班牙語和意大利語將屬于同一分支機構,從而使語言學習更加容易。 坎貝爾(Campbell)對該標準給予了高度重視,語言學習的困難程度以指數級的高反映出來。 同一個分支分支:0分。 不同的支行:1分。 不同的分支機構:10分。 不同的家庭:100分。

流利的語法和語法 (Syntax and Grammar for Fluency)

Campbell, a linguist by profession. broke down into a list of factors, such as

坎貝爾,專業語言學家。 分為一系列因素,例如

  • Language type

    語言類型
  • Subject-Verb-Object order

    主語-賓語-賓語順序
  • Adjective-Noun order

    形容詞-名詞順序
  • Genitive (possessor) — Noun order

    屬格(賓語)—名詞順序
  • Determiner-Noun order

    確定者名詞順序
  • Relative (clause) — Noun order

    相對(從句)-名詞順序
  • Noun Declension

    名詞變格
  • Tenses

    時態
  • Conjugation

    共軛
  • Adposition

    定位

For each of these criteria, Campbell assigns 1 point plus or minus if there is a difference between languages. The results of his calculation are rendered in a matrix:

對于這些標準中的每一個,如果語言之間存在差異,則Campbell會為其分配正負1點。 他的計算結果呈現在一個矩陣中:

Image for post
Matrix矩陣 derived from 來自 The Glossika BlogThe Glossika Blog

By comparing rows in this matrix, he can assign a score to the syntactical and grammatical differences between two languages and thus the difficulty of learning from a given language. The difficulty score for a German speaker learning French would be 6 points, a Japanese speaker learning Spanish 13 points, and a Chinese speaker learning Polish a whopping 34 points.

通過比較此矩陣中的行,他可以為兩種語言之間的句法和語法差異分配分數,從而為從給定語言學習的難度分配分數。 如果說德語的人說法語,那么他的難度得分將是6分;如果說日語的人說西班牙語,那么他的難度得分將是13分;如果說波蘭語的話,中國人的難度得分將會高達34分。

音韻流利 (Phonology for Fluency)

Campbell’s calculations account for the difference in total phonemes (written sounds) and allophones (the sounds people say), considering 12 points of articulation and the number of vowels and intonations.

坎貝爾的計算考慮了12個發音點以及元音和語調的數量,從而說明了總音素(書面聲音)和同音素(人們說的聲音)之間的差異。

Image for post
Matrix 矩陣derived from 來自 The Glossika BlogThe Glossika Blog

According to this matrix, comparing rows enables you to calculate language difficulty as related to these phonological criteria. The difficulty score for a German speaker learning French would be 1 point, a Japanese speaker learning Spanish 11 points, and a Chinese speaker learning Polish a whopping 15 points.

根據此矩陣,比較行使您能夠計算與這些語音標準相關的語言難度。 如果說德語的人會說法語,那么他的難度系數將是1分;如果說日語的人說西班牙語,那么難度系數將是11分;而如果說波蘭語的話,漢語學習者的難度得分將達到15分。

Data scientists will note that the scores assigned for various parameters are arbitrary and subjective, but there is merit in the attempt to break down degrees of difficulty into component factors.

數據科學家將注意到,為各種參數分配的分數是任意的和主觀的,但嘗試將難度分為要素也有好處。

For example, for an English speaker, the following are the score assignments according to language family:

例如,對于說英語的人,以下是根據語言族的分數分配:

Image for post
Matrix矩陣 derived from 來自 The Glossika BlogThe Glossika Blog

It is hard to reconcile a 0 score in German (So einfach ist das?) with a score of 5 in French or Spanish. And is Georgian really 10 times harder to acquire vocabulary than Polish? So the specific enumeration is certainly open to fine-turning, though the method is intriguing — if a bit rough around the edges.

很難用德語( So einfach ist das? )的0分數與法語或西班牙語的5分數進行協調。 格魯吉亞語的詞匯獲取真的比波蘭語難10倍嗎? 因此,盡管該方法很有趣,但是具體的枚舉當然可以進行微調-如果邊緣有些粗糙。

最后的推算:烏比赫有什么獨特之處? (The Final Reckoning: What’s Unique About Ubykh?)

His 2016 article concluded with a list of some of the most difficult languages. He mentioned, in this connection, the Romany language of European gypsies, which are not even written down, and Sentinelese, the language of the Pacific island where wannabe visitors are killed on arrival, polysynthetic languages like Greenlandic, and Ubykh, with no less than 84 consonants. Honorable mention goes to Bella Coola, a language is only written down by linguists to record the grammar.

他在2016年的文章中總結了一些最困難的語言。 在這方面,他提到了甚至沒有寫下來的歐洲吉普賽人的羅曼語和太平洋島嶼上的塞納蒂萊斯(Stinetineles)語言,那里是想要來訪的游客被殺死的語言,包括格陵蘭語和烏比克語等多合成語言,其中不少于84個輔音 值得一提的是貝拉·庫拉(Bella Coola),該語言僅由語言學家寫下才能記錄語法。

Two years later, Campbell wrote a follow-up piece applying his scoring system and setting it against the FSI rankings.

兩年后,坎貝爾撰寫了一篇后續文章,運用了他的計分系統并將其與FSI排名進行比較。

Image for post
Matrix矩陣 derived from 來自 The Glossika BlogThe Glossika Blog

Non-linguists may be nonplussed by the dismissive way the author chalks up Thai, Vietnamese, Turkish and Finnish as “easy” — except, he hastens to say, for their utterly unfamiliar vocabularies. He confesses surprise that, per his ranking system, Korean beats out Taiwanese in difficulty. But he credits Ubykh, an extinct Circassian language, as leaving even Korean in the dust.

筆者將泰國,越南語,土耳其語和芬蘭語歸為“輕松”,這是不屑一顧的方式,這使非語言學家不為所動。但他不得不說,因為他們完全不熟悉這些詞匯。 他承認,按照他的排名系統,韓國人在臺灣方面的困難勝過臺灣人。 但他認為,已故的切爾克斯語烏比赫語,甚至使朝鮮人也陷入了塵土。

Here you can learn Ubykh numbers and listen to a tale of futility that should appeal to every data scientist — in any language.

在這里,您可以學習Ubykh數字并聆聽一個徒勞無益的故事,該故事應該以任何一種語言吸引每位數據科學家。

翻譯自: https://towardsdatascience.com/can-we-measure-language-difficulty-by-the-numbers-3d591396934c

衡量試卷難度信度

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/390692.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/390692.shtml
英文地址,請注明出處:http://en.pswp.cn/news/390692.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Linux 題目總結

守護進程的工作就是打開一個端口,并且等待(Listen)進入連接。 如果客戶端發起一個連接請求,守護進程就創建(Fork)一個子進程響應這個連接,而主進程繼續監聽其他的服務請求。 xinetd能夠同時監聽…

《精通Spring4.X企業應用開發實戰》讀后感第二章

一、配置Maven\tomcat https://www.cnblogs.com/Miracle-Maker/articles/6476687.html https://www.cnblogs.com/Knowledge-has-no-limit/p/7240585.html 二、創建數據庫表 DROP DATABASE IF EXISTS sampledb; CREATE DATABASE sampledb DEFAULT CHARACTER SET utf8; USE sampl…

換了電腦如何使用hexo繼續寫博客

前言 我們知道,使用 Githubhexo 搭建一個個人博客確實需要花不少時間的,我們搭好博客后使用的挺好,但是如果我們有一天電腦突然壞了,或者換了系統,那么我們怎么使用 hexo 再發布文章到個人博客呢? 如果我們…

leetcode 525. 連續數組

給定一個二進制數組 nums , 找到含有相同數量的 0 和 1 的最長連續子數組,并返回該子數組的長度。 示例 1: 輸入: nums [0,1] 輸出: 2 說明: [0, 1] 是具有相同數量 0 和 1 的最長連續子數組。 示例 2: 輸入: nums [0,1,0] 輸出: 2 說明: [0, 1] (或 [1, 0]) 是…

實踐作業2:黑盒測試實踐(小組作業)每日任務記錄1

會議時間:2017年11月24日20:00 – 20:30 會議地點:在線討論 主 持 人:王晨懿 參會人員:王晨懿、余晨晨、鄭錦波、楊瀟、侯歡、汪元 記 錄 人:楊瀟 會議議題:軟件測試課程作業-黑盒測試實踐的啟動計劃 會議內…

視圖可視化 后臺_如何在單視圖中可視化復雜的多層主題

視圖可視化 后臺Sometimes a dataset can tell many stories. Trying to show them all in a single visualization is great, but can be too much of a good thing. How do you avoid information overload without oversimplification?有時數據集可以講述許多故事。 試圖在…

iam身份驗證以及訪問控制_如何將受限訪問IAM用戶添加到EKS群集

iam身份驗證以及訪問控制介紹 (Introduction) Elastic Kubernetes Service (EKS) is the fully managed Kubernetes service from AWS. It is deeply integrated with many AWS services, such as AWS Identity and Access Management (IAM) (for authentication to the cluste…

一步一步構建自己的管理系統①

2019獨角獸企業重金招聘Python工程師標準>>> 系統肯定要先選一個基礎框架。 還算比較熟悉Spring. 就選Spring boot postgres mybatis. 前端用Angular. 開始搭開發環境,開在window上整的。 到時候再放到服務器上。 自己也去整了個小服務器,…

面向對象面向過程

1、面向語句: 直接寫原生的sql語句,但是這樣代碼不容易維護。改一個方法會導致整個項目都要改動, 2、面向過程 定義一些函數,用的時候就調用不用就不調用。但是這也有解決不了的問題,如果要維護需要改動代碼&#xff0…

python邊玩邊學_邊聽邊學數據科學

python邊玩邊學Podcasts are a fun way to learn new stuff about the topics you like. Podcast hosts have to find a way to explain complex ideas in simple terms because no one would understand them otherwise 🙂 In this article I present a few episod…

react css多個變量_如何使用CSS變量和React上下文創建主題引擎

react css多個變量CSS variables are really cool. You can use them for a lot of things, like applying themes in your application with ease. CSS變量真的很棒。 您可以將它們用于很多事情,例如輕松地在應用程序中應用主題。 In this tutorial Ill show you …

vue 自定義 移動端篩選條件

1.創建組件 components/FilterBar/FilterBar.vue <template><div class"filterbar" :style"{top: top px}"><div class"container"><div class"row"><divclass"col":class"{selected: ind…

PSP

姓名&#xff1a;袁亞琴 日期&#xff1a;11月27日 教師&#xff1a;王建民 課程&#xff1a;PSP 項目計劃日志&#xff1a; PSP Planning . Estimate Development . Analysis . Design Spec . Design Review . …

如何在Windows中打開和使用命令提示符

入門 (Getting started) Windows, MacOS and Linux have command line interfaces. Windows’ default command line is the command prompt. The command prompt allows users to use their computer without pointing and clicking with a mouse. Windows&#xff0c;MacOS和…

ACM-ICPC北京賽區2017網絡同步賽H

http://hihocoder.com/contest/icpcbeijing2017/problem/8 預處理暴力枚舉修改的點 #include <bits/stdc.h> using namespace std; const int maxn 159; const int inf 0x3f3f3f3f; int a[maxn][maxn]; int colsum[maxn][maxn]; int rowsum[maxn][maxn]; int dp[maxn];…

PPPOE撥號上網流程及密碼竊取具體實現

樓主學生黨一枚&#xff0c;最近研究netkeeper有些許心得。 關于netkeeper是調用windows的rasdial來進行上網的東西&#xff0c;網上已經有一大堆&#xff0c;我就不贅述了。 本文主要講解rasdial的部分核心過程&#xff0c;以及我們可以利用它來干些什么。 netkeeper中rasdial…

leetcode 160. 相交鏈表(雙指針)

給你兩個單鏈表的頭節點 headA 和 headB &#xff0c;請你找出并返回兩個單鏈表相交的起始節點。如果兩個鏈表沒有交點&#xff0c;返回 null 。 圖示兩個鏈表在節點 c1 開始相交&#xff1a; 題目數據 保證 整個鏈式結構中不存在環。 注意&#xff0c;函數返回結果后&#…

android開發入門_Android開發入門

android開發入門Android is an open source, Linux-based mobile operating system. Android was developed by the Open Handset Alliance, which was lead by Google and featured contributions from many other companies.Android是基于Linux的開放源代碼移動操作系統。 An…

新購阿里云服務器ECS創建之后無法ssh連接的問題處理

作者&#xff1a;13 GitHub&#xff1a;https://github.com/ZHENFENG13 版權聲明&#xff1a;本文為原創文章&#xff0c;未經允許不得轉載。 問題描述 由于原服務器將要到期&#xff0c;因此趁著阿里云搞促銷活動重新購買了一臺ECS服務器&#xff0c;但是在初始化并啟動后卻無…

數據下發非標準用戶權限測試

與同事一起溝通了下MDM的Oracle權限部分: create user cx default tablespace cwbaseoe73 identified by Test6530 grant select,update,delete,insert on lcoe739999.lsbzdw to cx grant create table to cx alter user cx quota unlimited on cwbaseoe73 grant create sessio…