計算機中的浮點數 - 為什么十進制的 0.1 在計算機中是一個無限循環小數

flyfish

用 float 或 double 來存儲小數時不是精確值

浮點數在計算機中是以二進制形式存儲的，通常使用 IEEE 754 標準。浮點數由三個部分組成：符號位、指數位和尾數位。
先看一個例子

#include <iostream>
#include <iomanip>using namespace std;int main()
{cout << "Hello World!" << endl;double x = 1.0 / 10.0;double y = 1.0 - 0.9;double z = 1.0 + 0.1;// 設置輸出精度cout << fixed << setprecision(17);// 觀察 x、y、z 的結果cout << "x = " << x << endl;cout << "y = " << y << endl;cout << "z = " << z << endl;return 0;
}

Hello World!
x = 0.10000000000000001
y = 0.09999999999999998
z = 1.10000000000000009

浮點數比較

由于浮點數運算可能產生微小的誤差，在比較浮點數時，應避免直接使用 ==。可以定義一個非常小的數（稱為 epsilon）來進行比較。

#include <cmath>
#include <iostream>bool isEqual(double a, double b, double epsilon = 1e-10) {return std::fabs(a - b) < epsilon;
}int main() {double a = 0.1 * 3;double b = 0.3;if (isEqual(a, b)) {std::cout << "a and b are equal." << std::endl;} else {std::cout << "a and b are not equal." << std::endl;}return 0;
}

a and b are equal.

float 和 double 類型的 0.1 并不相等，因為它們在二進制中的表示不完全相同

#include <iostream>
#include <iomanip>int main() {float a = 0.1f;double b = 0.1;std::cout << std::setprecision(20);std::cout << "float a = 0.1f: " << a << std::endl;std::cout << "double b = 0.1: " << b << std::endl;if (a == b) {std::cout << "a and b are equal." << std::endl;} else {std::cout << "a and b are not equal." << std::endl;}return 0;
}

float a = 0.1f: 0.10000000149011611938
double b = 0.1: 0.10000000000000000555
a and b are not equal.

float 和 double 的精度差異

#include <iostream>
#include <iomanip>
#include <string>
#include <sstream>int main() {float floatNum = 1.0f / 7.0f;double doubleNum = 1.0 / 7.0;// 設置輸出精度std::cout << std::fixed << std::setprecision(64);// 輸出 float 和 double 的值std::cout << "float:  " << floatNum << std::endl;std::cout << "double: " << doubleNum << std::endl;return 0;
}

float:  0.1428571492433547973632812500000000000000000000000000000000000000
double: 0.1428571428571428492126926812488818541169166564941406250000000000

將循環小數轉換為分數

#include <iostream>
#include <string>
#include <sstream>
#include <cmath>
#include <iomanip>// 定義一個結構來表示分數
struct Fraction {long long numerator;long long denominator;
};// 最大公約數
long long gcd(long long a, long long b) {return b == 0 ? a : gcd(b, a % b);
}// 將小數部分轉換為分數
Fraction repeatingDecimalToFraction(const std::string& decimal) {size_t pos = decimal.find('(');std::string nonRepeatingPart = decimal.substr(0, pos);std::string repeatingPart = decimal.substr(pos + 1, decimal.size() - pos - 2);// 非循環部分和循環部分長度int n = nonRepeatingPart.size() - 2; // 減去 "0." 的長度int m = repeatingPart.size();// 非循環部分的小數double nonRepeatingDecimal = std::stod(nonRepeatingPart);// 構造非循環部分的分數long long nonRepeatingNumerator = static_cast<long long>(nonRepeatingDecimal * std::pow(10, n));long long nonRepeatingDenominator = std::pow(10, n);// 構造循環部分的分數long long repeatingNumerator = std::stoll(repeatingPart);long long repeatingDenominator = std::pow(10, m) - 1;// 將循環部分的分數移動到正確的位置repeatingNumerator += nonRepeatingNumerator * repeatingDenominator;repeatingDenominator *= nonRepeatingDenominator;// 簡化分數long long divisor = gcd(repeatingNumerator, repeatingDenominator);repeatingNumerator /= divisor;repeatingDenominator /= divisor;return {repeatingNumerator, repeatingDenominator};
}int main() {std::string decimal = "0.285714(285714)";Fraction fraction = repeatingDecimalToFraction(decimal);std::cout << "Fraction: " << fraction.numerator << "/" << fraction.denominator << std::endl;return 0;
}

Fraction: 2/7

查看浮點數的IEEE 754表示

IEEE 754表示：這是浮點數在計算機內存中的存儲格式，包含了符號、指數和尾數。用于浮點數計算和存儲。

#include <iostream>
#include <bitset>
#include <iomanip>void printFloatBinary(float number) {// 將 float 類型重新解釋為 uint32_t 類型uint32_t binary = *reinterpret_cast<uint32_t*>(&number);std::bitset<32> bits(binary);std::cout << "Float: " << number << std::endl;std::cout << "Binary: " << bits << std::endl;
}void printDoubleBinary(double number) {// 將 double 類型重新解釋為 uint64_t 類型uint64_t binary = *reinterpret_cast<uint64_t*>(&number);std::bitset<64> bits(binary);std::cout << "Double: " << number << std::endl;std::cout << "Binary: " << bits << std::endl;
}int main() {float floatNum = 0.1f;double doubleNum = 0.1;printFloatBinary(floatNum);printDoubleBinary(doubleNum);return 0;
}

Float: 0.1
Binary: 00111101110011001100110011001101
Double: 0.1
Binary: 0011111110111001100110011001100110011001100110011001100110011010

符號位：第 1 位
指數位：
對于 float（32 位）：第 2 到第 9 位（共 8 位）
對于 double（64 位）：第 2 到第 12 位（共 11 位）
尾數位：
對于 float（32 位）：第 10 到第 32 位（共 23 位）
對于 double（64 位）：第 13 到第 64 位（共 52 位）

手工將0.1轉換為二進制

轉換整數部分：0（已經是零）

0.1 × 2 = 0.2 (整數部分：0)
0.2 × 2 = 0.4 (整數部分：0)
0.4 × 2 = 0.8 (整數部分：0)
0.8 × 2 = 1.6 (整數部分：1)
0.6 × 2 = 1.2 (整數部分：1)
0.2 × 2 = 0.4 (整數部分：0)
0.4 × 2 = 0.8 (整數部分：0)
0.8 × 2 = 1.6 (整數部分：1)
0.6 × 2 = 1.2 (整數部分：1)
0.2 × 2 = 0.4 (整數部分：0)

合并整數部分

將上述每一步的整數部分合并起來：

$0.1_{10} = 0.0001100110011001100110011001100 \ldots_2$

最終得到的二進制表示是一個無限循環小數：
$0.1_{10} = 0.(0001100110011001100110011001100 \ldots)_2$
其中，上面的橫線表示循環節： $0001\overline{1001}$ 。

IEEE 754表示與32位二進制表示的關系

小數二進制表示

我們前面計算的0.1的小數二進制表示（0.0001100110011001100110011001100…）是直接將小數部分轉換為二進制的結果，這是一個無限循環的小數。

IEEE 754 二進制浮點數表示

而“00111101110011001100110011001101”是0.1在計算機中存儲時的IEEE 754標準的32位單精度浮點數表示。IEEE 754標準規定了浮點數的存儲格式，包括符號位、指數位和尾數（或稱為有效數字位）。

IEEE 754 單精度浮點數表示解釋

IEEE 754單精度浮點數使用32位來表示一個浮點數，其中：

1位用于符號位
8位用于指數位
23位用于尾數位
以0.1 為例

符號位：0 表示正數。
將0.1轉化為二進制：0.0001100110011001100110011001100110011001100110011001100…（無限循環）
規格化二進制：將其表示為 1.xxxxxx × 2^(-4) 的形式，所以 0.1 = 1.10011001100110011001101 × 2^(-4)
指數：由于偏移量為127，所以儲存的指數為 -4 + 127 = 123（即二進制的01111011）
尾數：取1后面的23位：10011001100110011001101
合并這些部分后得到IEEE 754表示：
$0∣01111011∣10011001100110011001101$

這就對應我們之前看到的32位二進制：
$00111101110011001100110011001101$

數據類型	大小	指數位	尾數位	偏移量
binary16	16 位	5 位	10 位	15
binary32	32 位	8 位	23 位	127
binary64	64 位	11 位	52 位	1023
binary128	128 位	15 位	112 位	16383

ratio來處理有理數

#include <iostream>
#include <ratio>int main() {// 定義分數類型using MyRatio = std::ratio<1, 3>;// 獲取分子和分母constexpr int numerator = MyRatio::num;constexpr int denominator = MyRatio::den;std::cout << "Fraction: " << numerator << "/" << denominator << std::endl;return 0;
}

Fraction: 1/3

自定義類實現用分數精確表達浮點數

#include <iostream>
#include <numeric> // for std::gcd
#include <iomanip>
class Fraction {
public:Fraction(long long numerator, long long denominator) : numerator(numerator), denominator(denominator) {reduce();}// 加法運算Fraction operator+(const Fraction& other) const {long long new_numerator = numerator * other.denominator + other.numerator * denominator;long long new_denominator = denominator * other.denominator;return Fraction(new_numerator, new_denominator);}// 減法運算Fraction operator-(const Fraction& other) const {long long new_numerator = numerator * other.denominator - other.numerator * denominator;long long new_denominator = denominator * other.denominator;return Fraction(new_numerator, new_denominator);}// 乘法運算Fraction operator*(const Fraction& other) const {return Fraction(numerator * other.numerator, denominator * other.denominator);}// 除法運算Fraction operator/(const Fraction& other) const {return Fraction(numerator * other.denominator, denominator * other.numerator);}// 輸出friend std::ostream& operator<<(std::ostream& os, const Fraction& fraction) {os << fraction.numerator << "/" << fraction.denominator;return os;}private:long long numerator;long long denominator;// 約分void reduce() {long long gcd_value = std::gcd(numerator, denominator);numerator /= gcd_value;denominator /= gcd_value;if (denominator < 0) {numerator = -numerator;denominator = -denominator;}}
};int main() {Fraction frac1(1, 10); // 0.1Fraction frac2(1, 3);  // 1/3std::cout << "Fraction 1: " << frac1 << std::endl;std::cout << "Fraction 2: " << frac2 << std::endl;Fraction sum = frac1 + frac2;Fraction diff = frac1 - frac2;Fraction prod = frac1 * frac2;Fraction quot = frac1 / frac2;std::cout << "Sum: " << sum << std::endl;std::cout << "Difference: " << diff << std::endl;std::cout << "Product: " << prod << std::endl;std::cout << "Quotient: " << quot << std::endl;return 0;
}

Fraction 1: 1/10
Fraction 2: 1/3
Sum: 13/30
Difference: -7/30
Product: 1/30
Quotient: 3/10

64位的存儲空間，雖然范圍很大，但如果分子和分母的值超出這個范圍，仍然會發生溢出。
對于非常大的數，gcd 函數的計算可能會變得非常慢，因為它需要計算兩個大數的最大公約數。
如果要處理極其巨大的數，即使它們沒有溢出，內存消耗也是一個問題。
在實踐中可以先測試下 Boost Multiprecision 這樣的庫。