不同數據類型的內部秘密----編程內幕(2)

Q： char類型是如何被當成int處理的？

A: ? ?我們可以看看char類型變量在何時才會被當做int處理.

#include <stdio.h>int main()
{char ch;ch = 'a';printf("%c\n", ch);return 0;
}

匯編代碼如下：

hello`main:0x100000f60 <+0>:  pushq  %rbp0x100000f61 <+1>:  movq   %rsp, %rbp0x100000f64 <+4>:  subq   $0x10, %rsp0x100000f68 <+8>:  leaq   0x43(%rip), %rdi          ; "%c\n"0x100000f6f <+15>: movl   $0x0, -0x4(%rbp)
->  0x100000f76 <+22>: movb   $0x61, -0x5(%rbp)0x100000f7a <+26>: movsbl -0x5(%rbp), %esi0x100000f7e <+30>: movb   $0x0, %al0x100000f80 <+32>: callq  0x100000f92               ; symbol stub for: printf

movb $0x61, -0x5(%rbp)是把字符'a'保存到ch變量中. movb是單個字節的拷貝，顯然此時ch并沒有當做int來處理.

而在后面： movsbl -0x5(%rbp), %esi表明ch被放到了int大小的register中，說明ch被提升為int.

再看一個例子：

#include <stdio.h>int main()
{char ch;int i = 0xF;ch = 'a';i = i + ch;printf("%d\n", i);return 0;
}

hello`main:0x100000f40 <+0>:  pushq  %rbp0x100000f41 <+1>:  movq   %rsp, %rbp0x100000f44 <+4>:  subq   $0x10, %rsp0x100000f48 <+8>:  leaq   0x57(%rip), %rdi          ; "%d\n"0x100000f4f <+15>: movl   $0x0, -0x4(%rbp)0x100000f56 <+22>: movl   $0xf, -0xc(%rbp)0x100000f5d <+29>: movb   $0x61, -0x5(%rbp)
->  0x100000f61 <+33>: movl   -0xc(%rbp), %eax0x100000f64 <+36>: movsbl -0x5(%rbp), %ecx0x100000f68 <+40>: addl   %ecx, %eax0x100000f6a <+42>: movl   %eax, -0xc(%rbp)0x100000f6d <+45>: movl   -0xc(%rbp), %esi0x100000f70 <+48>: movb   $0x0, %al0x100000f72 <+50>: callq  0x100000f84               ; symbol stub for: printf

在執行i = i + ch：?

movsbl -0x5(%rbp), %ecx

addl   %ecx, %eax

此時，證明了char被提升為int.

由上面可見，char被提升為int是在char和int一起處理或者當參數傳遞時才會產生，如果char變量單獨使用，又有什么必要提升為int呢？

?

Q: unsigned int和int究竟有何區別？

A：?

#include <stdio.h>int main()
{int i;unsigned int j;i = 1;j = 2;return 0;
}

hello`main:0x100000f90 <+0>:  pushq  %rbp0x100000f91 <+1>:  movq   %rsp, %rbp0x100000f94 <+4>:  xorl   %eax, %eax0x100000f96 <+6>:  movl   $0x0, -0x4(%rbp)
->  0x100000f9d <+13>: movl   $0x1, -0x8(%rbp)0x100000fa4 <+20>: movl   $0x2, -0xc(%rbp)0x100000fab <+27>: popq   %rbp0x100000fac <+28>: retq

兩條movl語句并沒有太大區別，看起來int i和unsigned int j在cpu看來并沒有什么區別，都是4字節，對應不同地址而已.

下面我們用int和unsigned int比較大小：

#include <stdio.h>int main()
{int i;unsigned int j;i = 1;j = -1;printf("%d\n", i > j);return 0;
}

    0x100000f56 <+22>: movl   $0x1, -0x8(%rbp)0x100000f5d <+29>: movl   $0xffffffff, -0xc(%rbp)   ; imm = 0xFFFFFFFF 0x100000f64 <+36>: movl   -0x8(%rbp), %eax0x100000f67 <+39>: cmpl   -0xc(%rbp), %eax
->  0x100000f6a <+42>: seta   %cl0x100000f6d <+45>: andb   $0x1, %cl0x100000f70 <+48>: movzbl %cl, %esi0x100000f73 <+51>: movb   $0x0, %al0x100000f75 <+53>: callq  0x100000f88               ; symbol stub for: printf

在斷點在如上位置時，

(lldb) register read rflagsrflags = 0x0000000000000213

RFLAGS寄存器 b0: CF = 1, ?b6: ZF = 0.

所以seta ?%cl保存到cl寄存器的數值為0: 只有CF = 0, ZF = 0的時候cl才會是1.

注意： seta指令是對無符號數比較的結果. 這里是印證了int和unsigned int在一起操作會被提升成unsigned int.

所以最終printf輸出的結果為0.

對此規則，可能有人會提出異議，但基于一個基本的準則：兩個數據操作，向數據更長不會丟失數值的方向去轉換.

Q： 1左移32位是多少？

A：?

#include <stdio.h>int main()
{int i = 1;int j;j = i << 32;printf("%d\n", j);return 0;
}

    0x100000f5f <+15>: movl   $0x20, %ecx0x100000f64 <+20>: movl   $0x0, -0x4(%rbp)0x100000f6b <+27>: movl   $0x1, -0x8(%rbp)0x100000f72 <+34>: movl   -0x8(%rbp), %eax0x100000f75 <+37>: shll   %cl, %eax0x100000f77 <+39>: movl   %eax, -0xc(%rbp)
->  0x100000f7a <+42>: movl   -0xc(%rbp), %esi0x100000f7d <+45>: movb   $0x0, %al0x100000f7f <+47>: callq  0x100000f92               ; symbol stub for: printf

可以看到shll左移%cl: 0x20即32位. 有一部分書籍說，左移語句對于超過數據大小比特長度會采用模比特長度的方式得到最終左移的位數，并認為這是編譯器的行為. 其實不然，這是指令集的行為.

如下為Intel指令集手冊的原文：

Shifts the bits in the first operand (destination operand) to the left or right 
by the number of bits specified in the second operand (count operand). Bits shifted 
beyond the destination operand boundary are first shifted into the CF flag, then 
discarded. At the end of the shift operation, the CF flag contains the last bit
shifted out of the destination operand.
The destination operand can be a register or a memory location. The count operand 
can be an immediate value or the CL register. The count is masked to 5 bits 
(or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31
(or 63 if 64-bit mode and REX.W is used). A special opcode encoding is provided for a count of 1.

微風不燥，陽光正好，你就像風一樣經過這里，愿你停留的片刻溫暖舒心。

我是程序員小迷（致力于C、C++、Java、Kotlin、Android、Shell、JavaScript、TypeScript、Python等編程技術的技巧經驗分享），若作品對您有幫助，請關注、分享、點贊、收藏、在看、喜歡，您的支持是我們為您提供幫助的最大動力。

歡迎關注。助您在編程路上越走越好！

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/10688.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/10688.shtml
英文地址，請注明出處：http://en.pswp.cn/web/10688.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！