自我接納

現實世界中的數據科學 (Data Science in the Real World)

Students are often worried and unaware about their chances of admission to graduate school. This blog aims to help students in shortlisting universities with their profiles using ML model. The predicted output gives them a fair idea about their admission chances in a particular university.

學生通常擔心并且不知道自己被研究生院錄取的機會。該博客旨在幫助使用ML模型的名單上的大學學生。預測的輸出使他們對特定大學的入學機會有一個清晰的認識。

Technical Review: Pooja Gramopadhye / ABCOM Team Copy Editor: Anushka DevasthaleLevel: IntermediateBanner Image Source : InternetDisclaimer:The purpose of this tutorial is to demonstrate the use of linear regression model on a multi-feature dataset and should not be used as is for predicting admissions.

技術評論： Pooja Gramopadhye / ABCOM團隊復制編輯： Anushka Devasthale級別：中級橫幅圖像來源： Internet 免責聲明：本教程的目的是演示在多元數據集上使用線性回歸模型，不應直接使用用于預測入學人數。

Are you applying for a Master’s degree program and knowing your chances of admission to your dream university? What GRE score, TOEFL score, or CGPA is required to get an admission in a University of your choice? Learn to apply Linear Regression to develop an ML model to answer these questions.

您是否正在申請碩士學位課程，并且知道自己升入夢想大學的機會？要獲得所選大學的錄取資格，需要提供什么GRE成績，TOEFL成績或CGPA？學習應用線性回歸來開發ML模型來回答這些問題。

Many students who aspire to pursue a Master’s degree program from a suitably good university turn towards famous coaching institutes and let them take care of everything, like preparing for exams, building an SOP and LOR, and training for visa interviews and searching for the right universities as well. A few of you may prefer to do all these things on your own. In such situations, searching for the right university is a very daunting task. We search for universities that fit our profile on those so-called “university hunt” websites with all the data about universities around the world. These websites have a section known as “University Predictor,” which is most of the time a paid section you need to fill your information to make use of that section. I present how to build your own University Admit Predictor, which gives your chances of getting admitted to the desired university. You can also use this model before giving exams to know beforehand what the required score is to gain admission to your dream university. Accordingly, you can set your targets for studies.

許多渴望從一所合適的大學攻讀碩士學位課程的學生轉向著名的教練學院，并讓他們照顧一切，例如準備考試，建立SOP和LOR，培訓簽證面試和尋找合適的大學也一樣你們中的一些人可能更愿意自己做所有這些事情。在這種情況下，尋找合適的大學是一項非常艱巨的任務。我們在那些所謂的“大學搜尋”網站上搜索與我們匹配的大學，其中包含有關世界各地大學的所有數據。這些網站有一個稱為“大學預測變量”的部分，在大多數情況下，這是您需要付費的部分，以填充您的信息以使用該部分。我將介紹如何構建自己的大學入學預測器，從而為您提供被理想大學錄取的機會。您也可以在進行考試之前使用此模型，以預先了解達到理想大學要求的分數。因此，您可以設置學習目標。

By the end of this tutorial, you will be able to build and train a linear regression model to predict the chance of admission to a particular university.

在本教程結束時，您將能夠構建和訓練線性回歸模型，以預測入讀特定大學的機會。

建立專案 (Creating Project)

Create a new Google Colab project and rename it to Admit Prediction. If you are new to Colab, then check out this short tutorial.

創建一個新的Google Colab項目，并將其重命名為“允許預測”。如果您不熟悉Colab，請查看此簡短教程。

Import the following libraries in your Colab project:

在您的Colab項目中導入以下庫：

# import statements
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

We will use pandas for data handling, pyplot from matplotlib for charting, sklearn for preparing datasets, and using their predefined machine learning models.

我們將使用pandas進行數據處理，使用 matplotlib的 pyplot進行圖表繪制，使用sklearn來準備數據集，并使用其預定義的機器學習模型。

The dataset is taken from Kaggle competition Use the read_csv function of pandas for reading the data file into your Colab project environment.

該數據集取自Kaggle競賽。使用pandas的read_csv函數將數據文件讀取到Colab項目環境中。

# loading the data from csv file saved at the url
data = pd.read_csv("https://raw.githubusercontent.com/abcom-mltutorials/Admit-Prediction/master/Admission_Predict_Ver1.1.csv")

Examine the data by printing the first few records:

通過打印前幾條記錄來檢查數據：

data.head()

This command gives the following output:

此命令提供以下輸出：

As you can see, each row contains the fields such as GRE, TOEFL, SOP, LOR, CGPA scores, and the Research activity of any student along with the university ranking. The last column, Chance of Admit, indicates the chances (probability value) of admission to this school of given ranking. You can check out how many such records are provided in the dataset by calling the shape method:

如您所見，每一行都包含諸如GRE，TOEFL，SOP，LOR，CGPA分數，任何學生的研究活動以及大學排名之類的字段。最后一欄，“錄取機會”，指示該學校獲得給定排名的錄取機會(概率值)。您可以通過調用shape方法來檢查數據集中提供了多少這樣的記錄：

# observing the data with the first 5 rows 
data.head()

This command gives an output:(500, 9) Thus, we have a record of 500 students. We will now proceed to pre-process the data and make it ready for model training.

該命令的輸出為： (500, 9)因此，我們有500個學生的記錄。現在，我們將對數據進行預處理，以準備進行模型訓練。

數據預處理 (Data Pre-processing)

We need to ensure that the data does not contain any null values. We do this by calling the isna method on the data frame and then taking the sum of values on each column

我們需要確保數據不包含任何空值。為此，我們在數據幀上調用isna方法，然后在每一列上取值的總和

# checking null items
print(data.isna().sum())

This gives the following output:

這給出以下輸出：

Serial No.           0
GRE Score            0
TOEFL Score          0
University Rating    0
SOP                  0
LOR                  0
CGPA                 0
Research             0
Chance of Admit      0
dtype: int64

As all sums are zero, none of the columns have null values. From the above list of columns, you understand easily that Serial No. is of no significance to us in model training. We will drop this column from the dataset:

由于所有總和均為零，因此所有列均沒有空值。從上面的列列表中，您可以輕松地了解序列號對我們的模型培訓沒有意義。我們將從數據集中刪除此列：

data = data.drop(["Serial No."], axis = 1)

Next, we will prepare our data for building the model.

接下來，我們將準備用于構建模型的數據。

準備資料 (Preparing Data)

We will first extract the features and the target into two arrays X and y. You create X features array using Python array slicing:

我們將首先將特征和目標提取到兩個數組X和y 。您可以使用Python數組切片創建X功能數組：

X = data.iloc[:,:7]

You can get the information on the extracted data by calling the info method. This is the output:

您可以通過調用info方法來獲取有關提取數據的info 。這是輸出：

As you can see, it contains all our desired features. You now extract target data using the following slicing:

如您所見，它包含我們所有所需的功能。現在，使用以下切片提取目標數據：

y = data.iloc[:,7:]

Print the information on y:

在y上打印信息：

y.info()

It shows the following output:

它顯示以下輸出：

We will now split the data into training and testing datasets by calling the train_test_split method of sklearn.

現在，我們將通過調用train_test_split方法將數據分為訓練和測試數據集。

X_train,X_test,Y_train,Y_test = train_test_split(X, y, 
                                                 random_state = 10, 
                                                 shuffle = True, 
                                                 test_size = 0.2)

I have split the dataset into the ratio 80:20. We use X_train and Y_train arrays for training and X_test and Y_test arrays for testing. The training dataset is shuffled to give us the randomness in data. The random_state sets the seed for shuffling. Setting the random_state ensures reproducible outputs on multiple runs.

我已將數據集拆分為比例80:20。我們使用X_train和Y_train數組進行訓練，并使用X_test和Y_test數組進行測試。混合訓練數據集以使我們具有數據的隨機性。 random_state設置改組的種子。設置random_state可確保多次運行時可重現輸出。

We will now get some visualization on the training data so as to decide which model to be used.

現在，我們將獲得關于訓練數據的一些可視化信息，以便決定使用哪種模型。

可視化數據 (Visualizing Data)

We will create charts for each of our features versus the Chance of Admit. This will give us the idea of admission probabilities based on the feature value. For example, how the GRE score affects admission probability? We will be able to get answers to such questions by doing some charting. We first plot the GRE Score feature with the admit probability. We use matplotlib for plotting. The following code produces the desired plot.

我們將針對每個功能相對于準入機會創建圖表。這將為我們提供基于特征值的準入概率概念。例如，GRE分數如何影響錄取概率？通過做一些圖表，我們將能夠找到這些問題的答案。我們首先用入場概率繪制GRE分數特征。我們使用matplotlib進行繪圖。以下代碼生成所需的圖。

# Visualize the effect of GRE Score on chance of getting an admit  
plt.scatter(X_train["GRE Score"],Y_train, color = "red")
plt.xlabel("GRE Score")
plt.ylabel("Chance of Admission")
plt.legend(["GRE Score"])
plt.show()

The output is shown below:

輸出如下所示：

You can see that a higher GRE score increases the chances of admission, and the relationship between the two is almost linear.

您可以看到更高的GRE分數增加了錄取的機會，并且兩者之間的關系幾乎是線性的。

Now, try plotting a similar graph to see the relation between Chance of Admission and CGPA. You should get the following graph after successfully running the code:

現在，嘗試繪制類似的圖表，以查看入學機會與CGPA之間的關系。成功運行代碼后，您應該獲得以下圖形：

Like the first graph, we can see that a higher CGPA has a higher chance of admission, and the relationship is once again linear.

像第一個圖一樣，我們可以看到CGPA越高，接納的機會就越高，并且該關系再次呈線性關系。

Likewise, try other features and you will see a linear relationship between each of those features, and the admission probability.

同樣，嘗試其他功能，您將看到每個功能與準入概率之間的線性關系。

Lastly, let us plot the university rating versus the chance of admission.

最后，讓我們繪制大學等級與入學機會的關系圖。

# Visualize the effect of CGPA on chance of getting an admit. 
plt.scatter(X_train["CGPA"],Y_train, color = "green")
plt.xlabel("CGPA")
plt.ylabel("Chance of Admission")
plt.legend(["CGPA"])
plt.show()

In this chart, the relationship is concentrated into five bars. You observe that for university ratings of 2, 3, and 4, the number of admits is the maximum, as decided by the dots’ density in those three bars. The admission into universities with rating 1 is low. Similarly, the schools with ratings 5 have a low intake, probably due to their high selection criteria.

在此圖表中，關系集中為五個條形。您會觀察到，對于大學等級2、3和4，錄取數量是最大的，這取決于這三個條形圖中的點密度。等級為1的大學入學率很低。同樣，評級為5的學校入學率較低，可能是由于其選擇標準較高。

We will now build our model.

現在，我們將建立模型。

模型制作/培訓 (Model Building/Training)

From the data visualization, we conclude that the relationship between the features and the chances of admission is linear. So, we can try a linear regression model for fitting this dataset.

從數據可視化中，我們得出結論，特征與準入機會之間的關系是線性的。因此，我們可以嘗試使用線性回歸模型來擬合此數據集。

Our model for this project would be a pre-defined classifier from sklearn library, which is open-source and contains many pre-tested collections of useful classifiers. We will use the LinearRegression from this collection.

我們針對該項目的模型將是sklearn庫中的預定義分類器，該庫是開源的，包含許多經過測試的有用分類器集合。我們將使用此集合中的LinearRegression。

classifier = LinearRegression()

We call the fit method on the classifier to train it. Note the two parameters to the fit method.

我們在分類器上調用fit方法進行訓練。注意fit方法的兩個參數。

classifier.fit(X_train,Y_train)

The classifier is now ready for testing.

分類器現在可以進行測試了。

測試中 (Testing)

To test the classifier, we use the test data generated in the earlier stage. We call the predict method on the created object and pass the X_test array of the test data, as shown in the following command:

為了測試分類器，我們使用在較早階段生成的測試數據。我們在創建的對象上調用predict方法，并傳遞測試數據的X_test數組，如以下命令所示：

prediction_of_Y = classifier.predict(X_test)

This generates a single-dimensional array for the entire testing data set, giving each row prediction in the X_test array. Examine the first six entries of this array by using the following command:

這將為整個測試數據集生成一個一維數組，從而在X_test數組中給出每個行的預測。使用以下命令檢查此數組的前六個條目：

prediction_of_Y = np.round(prediction_of_Y, decimals = 3) prediction_of_Y[:6]

The output is:

輸出為：

If you want to compare the predicted value to the actual value, add the predicted value to Y_test and print its contents on screen:

如果要將預測值與實際值進行比較，請將預測值添加到Y_test并將其內容打印在屏幕上：

Y_test["Predicted chance of Admit"] = prediction_of_Y.tolist() print(Y_test)

The output is as follows:

輸出如下：

As you can see, both the actual and predicted values almost match. We will now drop the added column for further visualizations.

如您所見，實際值和預測值幾乎匹配。現在，我們將刪除添加的列以進行進一步的可視化。

Y_test = Y_test.drop(["Predicted chance of Admit"], axis = 1)

But just comparing values on our own is not enough to be sure about the accuracy. We need to verify the accuracy of the prediction.

但是僅僅比較我們自己的值還不足以確保準確性。我們需要驗證預測的準確性。

可視化預測 (Visualizing the Predictions)

Before verifying the accuracy of the model, we will visualize and compare the difference between the actual chance of admission and predicted chance of admission. This is important because most of the time, we see a model of Linear Regression predicting the result based on only one parameter, and the plot of that is a single line that fits the maximum number of data points. But in this tutorial, we are using multiple parameters, and the graph is complex. So, I have tried to show each parameter’s impact on the prediction individually, and I will explain the graphs to make it more evident.

在驗證模型的準確性之前，我們將可視化并比較實際錄取機會與預測錄取機會之間的差異。這很重要，因為在大多數情況下，我們會看到一個線性回歸模型僅基于一個參數來預測結果，并且該圖的繪制是一條單行，適合最大數據點數。但是在本教程中，我們使用多個參數，并且圖形很復雜。因此，我試圖單獨顯示每個參數對預測的影響，并且我將解釋圖表以使其更加明顯。

Important things to note before we plot any graphs are plotting two plots in a single graph. The first is of particular parameter against the actual value of Chance of Admit from the testing dataset. The data points of this graph are either red or blue. The second plot is of that same parameter against the predicted value of Chance of Admit. The data points of this graph are purple and red in color.

在繪制任何圖形之前要注意的重要事項是在單個圖形中繪制兩個圖形。第一個參數與來自測試數據集的準入機會的實際值相對應。該圖的數據點是紅色或藍色。第二個圖具有相同的參數與準入機會的預測值的對比。該圖的數據點為紫色和紅色。

Let’s plot the first set of graphs for the parameter GRE Score. Use the following code to plot the graphs:

讓我們為參數GRE Score繪制第一組圖。使用以下代碼繪制圖形：

# Visualize the difference in graph for same parameter "GRE Score" for actual chance & prediction chance. 
plt.scatter(X_test["GRE Score"],Y_test, color = "red")
plt.scatter(X_test["GRE Score"], prediction_of_Y, color='purple')
plt.xlabel("GRE Score")
plt.ylabel("Chance of Admission")
plt.legend(["Actual chance for GRE Score","Predicted chance for GRE Score"])
plt.show()

Notice that the code contains two calls to scatter function for plotting the two variables.

請注意，該代碼包含對散布函數的兩次調用，以繪制兩個變量。

The output is as follows:

輸出如下：

Remember that we are plotting the graph from the testing dataset, which contains fewer values than the training dataset. Hence the density of data points in the graph will be less compared to the visualizations on the training dataset. In the above plot, we understand how the GRE Score parameter, which is the same for both plots, produces a different effect for predicted value than the actual value.

請記住，我們是從測試數據集中繪制圖形，該數據包含的值少于訓練數據集。因此，與訓練數據集上的可視化相比，圖形中數據點的密度會更低。在以上圖表中，我們了解了兩個圖表相同的GRE Score參數如何對預測值產生與實際值不同的影響。

Our model’s outliers are the red dots at the bottom of the graph because they don’t have any corresponding purple dots around them. How did I infer this from the graph? Considering the error-margin of 5%, a red dot represents a correctly predicted data point if and only if it has a purple dot very near to it, which represents its predicted value. So, the red dots that are isolated are outliers for the model, and the secluded purple dots are poorly predicted values by the model.

我們模型的離群值是圖形底部的紅點，因為它們周圍沒有對應的紫色點。我是如何從圖中推斷出來的？考慮到5％的誤差范圍，當且僅當紫色點非常靠近紫色點(代表其預測值)時，紅色點代表正確預測的數據點。因此，孤立的紅點是該模型的異常值，而隱蔽的紫色點是該模型的較差的預測值。

This is how you visualize when you are building a Linear Regression model with multiple parameters. The above logic applies to most of the parameters in the model.

這是構建帶有多個參數的線性回歸模型時的可視化方式。以上邏輯適用于模型中的大多數參數。

Let’s plot another set of graphs for the parameter SOP. Use the following code to plot the graphs:

讓我們為參數SOP繪制另一組圖形。使用以下代碼繪制圖形：

plt.scatter(X_test["SOP"],Y_test, color = "blue") plt.scatter(X_test["SOP"], prediction_of_Y, color='orange') plt.xlabel("SOP") plt.ylabel("Chance of Admission") plt.legend(["Actual chance for SOP","Predicted chance for SOP"]) plt.show()

The output is as follows:

輸出如下：

Let me explain how to interpret the graph and relate it to the real-world scenarios.

讓我解釋一下如何解釋圖形并將其與實際場景關聯起來。

Consider SOP with rating 1.5: The actual chance of admission (blue dots) is near 60%, and predicted chance (orange dots) is near 50%.

考慮等級1.5的SOP：實際入場機會(藍色點)接近60％，預測機會(橙色點)接近50％。

Consider SOP with rating 2.5: The actual chance of admission is a lower than the predicted chance.

考慮等級為2.5的SOP：實際入學機會低于預期機會。

And this continues for higher SOP as well. Hence this model shows lower chance of admission than an actual for low values of SOP and higher than actual chance for high values of SOP, which is true as SOP is a pivotal factor in getting admission.

對于更高的SOP來說，這種情況也將繼續。因此，對于低SOP值，此模型顯示出比實際機會低的機會，而對于SOP高值，則顯示出高于實際機會的機會，這是正確的，因為SOP是獲得準入的關鍵因素。

Note that these observations are based on the graphs that I have produced with the values of the parameters provided in the tutorial. By changing the values of shuffle and random_state parameters, all the graphs will also change. You may find some facts if you study your newly produced graphs, and I encourage you to experiment with the code.

請注意，這些觀察結果基于我使用教程中提供的參數值生成的圖形。通過更改shuffle和random_state參數的值，所有圖形也將更改。如果您研究新生成的圖形，可能會發現一些事實，我鼓勵您嘗試使用該代碼。

Now, we will verify the accuracy of our prediction.

現在，我們將驗證預測的準確性。

驗證準確性 (Verifying Accuracy)

To test the accuracy of the model, use the score method on the classifier, as shown below:

要測試模型的準確性，請使用分類器上的score方法，如下所示：

print('Accuracy: {:.2f}'.format(classifier.score(X_test, Y_test)))

The output is:Accuracy: 0.80

輸出為： Accuracy: 0.80

It shows that the accuracy of our model is 80%, which is considered good. Thus, no further tuning is required. You can safely try this model with real values to check the chance of getting admission in the desired university. So, now that we know that our model is substantially accurate, we should try the inference on arbitrary values or be more precise real-world values specified by the user.

它表明我們模型的準確性為80％，被認為是很好的。因此，不需要進一步的調整。您可以安全地使用實際值嘗試該模型，以檢查獲得所需大學錄取的機會。因此，既然我們知道我們的模型基本上是準確的，那么我們應該嘗試對任意值進行推斷，或者嘗試由用戶指定更精確的實際值。

推斷看不見的數據 (Inference on Unseen Data)

Let’s assume that I have a GRE score of 332, TOEFL score of 107, SOP and LOR of 4.5 and 4.0 respectively, my CGPA is 9.34, but I have not done any research. Let’s see what the chances of me getting an admit in a 5.0 rated university are. Use the following code to add all the parameter values in the testing dataset:

假設我的GRE分數為332，TOEFL分數為107，SOP和LOR分別為4.5和4.0，我的CGPA為9.34，但是我沒有做任何研究。讓我們看看我進入5.0級大學的機會。使用以下代碼在測試數據集中添加所有參數值：

my_data = X_test.append(pd.Series([332, 107, 5, 4.5, 4.0, 9.34, 0], index = X_test.columns), ignore_index = True)

Check the added row by printing its value:

通過打印其值來檢查添加的行：

print(my_data[-1:])

Remember that the testing dataset already has some values present in it, and our data will be added in the last row. The following image shows the output of the above code:

請記住，測試數據集中已經存在一些值，我們的數據將添加到最后一行。下圖顯示了以上代碼的輸出：

Now use the following code to get the chance of admission for the given data:

現在使用以下代碼來獲取給定數據的機會：

my_chance = classifier.predict(my_data) 
my_chance[-1]

The output is as follows:array([0.8595167])

輸出如下： array([0.8595167])

According to our model’s inference, I have an 85.95% chance of getting the admission.

根據我們模型的推論，我有85.95％的機會被錄取。

Similarly, you can check admission chances for more than one record as well. Use the following code to add all the parameter values for a bunch of records in the testing dataset:

同樣，您也可以查看多個記錄的錄取機會。使用以下代碼為測試數據集中的一堆記錄添加所有參數值：

list_of_records = [pd.Series([309, 90, 4, 4, 3.5, 7.14, 0], index = X_test.columns),
                   pd.Series([300, 99, 3, 3.5, 3.5, 8.09, 0], index = X_test.columns),
                   pd.Series([304, 108, 4, 4, 3.5, 7.91, 0], index = X_test.columns),
                   pd.Series([295, 113, 5, 4.5, 4, 8.76, 1], index = X_test.columns)]
user_defined = X_test.append(list_of_records, ignore_index= True)
print(user_defined[-4:])

We use the series data structure of pandas and append all the series to our testing dataset. The code to see the records and predictions is included in the above code. The following image displays the output of the above code:

我們使用熊貓的系列數據結構，并將所有系列附加到測試數據集中。上面的代碼中包含查看記錄和預測的代碼。下圖顯示了以上代碼的輸出：

Note that the first record is at index 50, and in the previous example with the single record, the index was also 50. This is because when we use the append function on data frames, it makes a copy of the original data frame, and changes are made in that copy, leaving the original data frame intact.

請注意，第一個記錄在索引50處，而在上一個示例的單個記錄中，索引也是50。這是因為當我們在數據幀上使用append函數時，它會復制原始數據幀，并且對該副本進行更改，使原始數據幀保持完整。

By observing the above results, I can assume that CGPA and Research are more important factors than GRE score for getting an admit. Try experimenting with the record values and check the impact it has on the chance of admission. Maybe you will land on a different assumption of your own, or perhaps you will prove me wrong.

通過觀察以上結果，我可以認為CGPA和Research是獲得GRE的重要因素，而不是GRE分數。嘗試試驗記錄值，并檢查其對錄取機會的影響。也許您會以自己不同的假設著陸，或者您可能會證明我錯了。

Finally, if you just want to do the inference on a single record without adding it to the test dataset, you would use the following code:

最后，如果您只想對單個記錄進行推理而又不將其添加到測試數據集中，則可以使用以下代碼：

#Checking chances of single record without appending to previous record
single_record_values = {"GRE Score" : [327], "TOEFL Score" : [95], "University Rating" : [4.0], "SOP": [3.5], "LOR" : [4.0], "CGPA": [7.96], "Research": [1]}
single_rec_df = pd.DataFrame(single_record_values, columns = ["GRE Score",  "TOEFL Score",  "University Rating",  "SOP",  "LOR",   "CGPA",  "Research"])
print(single_rec_df)
single_chance = classifier.predict(single_rec_df)
single_chance

This is the output:

這是輸出：

Add more values to the list of each parameter in the dictionary to get a chance of multiple records without appending it to X_test.

將更多值添加到字典中每個參數的列表中，以獲得多個記錄的機會，而無需將其附加到X_test 。

摘要 (Summary)

In this tutorial, you learned how to develop a linear regression model to create an admission predictor. The first step was selecting an appropriate dataset with all the necessary data needed to build the model. The second step is cleansing the data, eliminating the unwanted rows, fields, and selecting the appropriate fields for your model development. After this was done, you used the train_test_split function to map the data into a format that your classifier demands training. For building the model, you used a linear regression classifier provided in the sklearn library. For training the classifier, you used 80% of the data. You used the rest of the data for testing. Then you saw how to visualize the training data by using graphs with the matplotlib library. In the next step, we tested the accuracy of the model. Fortunately, our model had good accuracy. Then you saw how to visualize the results when you are building a Linear Regression model with multiple parameters. Then you saw how to enter user-defined records and predict the chance of admission. This is a very easy model and can be built using many different algorithms, each of which has its pros and cons. Try using some other algorithm for solving this problem.

在本教程中，您學習了如何開發線性回歸模型來創建入學預測變量。第一步是選擇一個適當的數據集，其中包含構建模型所需的所有必要數據。第二步是清理數據，消除不需要的行，字段，并為模型開發選擇適當的字段。完成此操作后，您使用了train_test_split函數將數據映射為分類器需要訓練的格式。為了構建模型，您使用了sklearn庫中提供的線性回歸分類器。為了訓練分類器，您使用了80％的數據。您將其余數據用于測試。然后，您了解了如何通過使用帶有matplotlib庫的圖形來可視化訓練數據。在下一步中，我們測試了模型的準確性。幸運的是，我們的模型具有良好的準確性。然后，您了解了在構建具有多個參數的線性回歸模型時如何可視化結果。然后，您了解了如何輸入用戶定義的記錄并預測準入的機會。這是一個非常簡單的模型，可以使用許多不同的算法來構建，每種算法各有利弊。嘗試使用其他算法來解決此問題。

Source: Download the project source from our Repository.

來源：從我們的存儲庫下載項目源。

Originally published at http://education.abcom.com on August 17, 2020.

最初于 2020年8月17日 發布在 http://education.abcom.com 上。

翻譯自: https://medium.com/swlh/admit-predictor-97f29d4f0373

自我接納

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/389860.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/389860.shtml
英文地址，請注明出處：http://en.pswp.cn/news/389860.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！

距離產生美

那天下午我跟簡坐在學校操作草地上聊天夕陽的余暉照射著我們陽光在青草的縫隙間拉長了倒影溫暖的晚風輕拂著簡劉海前的幾根發絲淡淡的發香迎面撲來，我望著遠山上的煙囪。對簡說： 我覺得我們坐得太近了。感覺相距 50cm 比較好。簡一臉驚訝說&#x…

299. 猜數字游戲

299. 猜數字游戲你在和朋友一起玩猜數字（Bulls and Cows）游戲，該游戲規則如下： 寫出一個秘密數字，并請朋友猜這個數字是多少。朋友每猜測一次，你就會給他一個包含下述信息的提示： 猜測數字…

mysql數據庫中case when 的用法

場景1：比如說我們在數據庫存了性別的字段，一般都是存0 和 1 代表男和女然后我們會得到0和1之后在java中判斷 ，很麻煩有么有？其實我們完全可以在sql中判斷好之后拿來現成的。就是在sql中做判斷就ok SELECT*,CASEWHEN ly app th…

python中knn_如何在python中從頭開始構建knn

python中knnk最近鄰居 (k-Nearest Neighbors) k-Nearest Neighbors (KNN) is a supervised machine learning algorithm that can be used for either regression or classification tasks. KNN is non-parametric, which means that the algorithm does not make assumptions …

CRT配色

http://a0bd2668.wiz03.com/share/s/2wLipE0wJ4Wl28H1oC2BIvEv02vmgz3S_QjT2YHyWG2t2nng轉載于:https://blog.51cto.com/13420391/2164540

5920. 分配給商店的最多商品的最小值

5920. 分配給商店的最多商品的最小值給你一個整數 n ，表示有 n 間零售商店。總共有 m 種產品，每種產品的數目用一個下標從 0 開始的整數數組 quantities 表示，其中 quantities[i] 表示第 i 種商品的數目。你需要將所有商品分配到零售商…

A*

轉自http://www.mamicode.com/info-detail-1534200.html康托展開X a[1]*(n-1)!a[2]*(n-2)!...a[i]*(n-i)!...a[n-1]*1!a[n]*0!其中a[i]表示在num[i1..n]中比num[i]小的數的數量逆康托展開由于：a[i]≤n-i, a[i]*(n-i)!≤(n-i)*(n-i)!<(n-i1)!于是我們得到&#x…

unity第三人稱射擊游戲_在游戲上第3部分完美的信息游戲

unity第三人稱射擊游戲Previous article上一篇文章 The economics literature distinguishes the quality of a game’s information (perfect vs. imperfect) from the completeness of a game’s information (complete vs. incomplete). Perfect information means that ev…

JVM(2)--一文讀懂垃圾回收

與其他語言相比，例如c/c，我們都知道，java虛擬機對于程序中產生的垃圾，虛擬機是會自動幫我們進行清除管理的，而像c/c這些語言平臺則需要程序員自己手動對內存進行釋放。雖然這種自動幫我們回收垃圾的策略少了一定的靈活…

2058. 找出臨界點之間的最小和最大距離

2058. 找出臨界點之間的最小和最大距離鏈表中的臨界點定義為一個局部極大值點或局部極小值點。如果當前節點的值嚴格大于前一個節點和后一個節點，那么這個節點就是一個局部極大值點。如果當前節點的值嚴格小于前一個節點和后一個節點，…

tb計算機存儲單位_如何節省數TB的云存儲

tb計算機存儲單位Whatever cloud provider a company may use, costs are always a factor that influences decision-making, and the way software is written. As a consequence, almost any approach that helps save costs is likely worth investigating.無論公司使用哪種…

nginx簡單代理配置

原文：https://my.oschina.net/wangnian/blog/791294 前言 Nginx ("engine x") 是一個高性能的HTTP和反向代理服務器，也是一個IMAP/POP3/SMTP服務器。Nginx是由Igor Sysoev為俄羅斯訪問量第二的Rambler.ru站點開發的，第一個公開版本…

2059. 轉化數字的最小運算數

2059. 轉化數字的最小運算數給你一個下標從 0 開始的整數數組 nums ，該數組由互不相同的數字組成。另給你兩個整數 start 和 goal 。整數 x 的值最開始設為 start ，你打算執行一些運算使 x 轉化為 goal 。你可以對數字 x 重復執行下述運算&#xf…

Django Rest Framework（一）

一、什么是RESTful REST與技術無關，代表一種軟件架構風格，REST是Representational State Transfer的簡稱，中文翻譯為“表征狀態轉移”。 REST從資源的角度審視整個網絡，它將分布在網絡中某個節點的資源通過URL進行標識&#xff0c…

光落在你臉上，可愛一如往常

沙沙野 https://www.ssyer.com讓作品遇見全世界圖片來自：沙沙野沙沙野 https://www.ssyer.com讓作品遇見全世界圖片來自：沙沙野沙沙野 https://www.ssyer.com讓作品遇見全世界圖片來自：沙沙野沙沙野 https://www.ssyer.com讓作品遇見全世界圖…

數據可視化機器學習工具在線_為什么您不能跳過學習數據可視化

數據可視化機器學習工具在線重點 (Top highlight)There’s no scarcity of posts online about ‘fancy’ data topics like data modelling and data engineering. But I’ve noticed their cousin, data visualization, barely gets the same amount of attention. Among dat…

2047. 句子中的有效單詞數

2047. 句子中的有效單詞數句子僅由小寫字母（‘a’ 到 ‘z’）、數字（‘0’ 到 ‘9’）、連字符（’-’）、標點符號（’!’、’.’ 和 ‘,’）以及空格（’ ）組成。…

fa

轉載于:https://www.cnblogs.com/smallpigger/p/9546173.html

python中nlp的庫_用于nlp的python中的網站數據清理

python中nlp的庫The most important step of any data-driven project is obtaining quality data. Without these preprocessing steps, the results of a project can easily be biased or completely misunderstood. Here, we will focus on cleaning data that is composed…

51Nod 1043 幸運號碼

1 #include <stdio.h>2 #include <algorithm>3 using namespace std;4 5 typedef long long ll;6 const int mod 1e9 7;7 int dp[1010][10000];8 // dp[i][j] : i 個數，組成總和為j 的數量9 10 int main() 11 { 12 int n; 13 scanf("%d…

自我接納_接納預測因子

現實世界中的數據科學 (Data Science in the Real World)

建立專案 (Creating Project)

數據預處理 (Data Pre-processing)

準備資料 (Preparing Data)

可視化數據 (Visualizing Data)

模型制作/培訓 (Model Building/Training)

測試中 (Testing)

可視化預測 (Visualizing the Predictions)

驗證準確性 (Verifying Accuracy)

推斷看不見的數據 (Inference on Unseen Data)

摘要 (Summary)

相關文章

距離產生美

299. 猜數字游戲

mysql數據庫中case when 的用法

python中knn_如何在python中從頭開始構建knn

CRT配色

5920. 分配給商店的最多商品的最小值

A*

unity第三人稱射擊游戲_在游戲上第3部分完美的信息游戲

JVM(2)--一文讀懂垃圾回收

2058. 找出臨界點之間的最小和最大距離

tb計算機存儲單位_如何節省數TB的云存儲

nginx簡單代理配置

2059. 轉化數字的最小運算數

Django Rest Framework（一）

光落在你臉上，可愛一如往常

數據可視化機器學習工具在線_為什么您不能跳過學習數據可視化

2047. 句子中的有效單詞數

fa

python中nlp的庫_用于nlp的python中的網站數據清理

51Nod 1043 幸運號碼