文章目錄
- 1. 寫在前面
- 2. 接口分析
- 3. 補環境分析
- 4. 純算法還原
【🏠作者主頁】:吳秋霖
【💼作者介紹】:擅長爬蟲與JS加密逆向分析!Python領域優質創作者、CSDN博客專家、阿里云博客專家、華為云享專家。一路走來長期堅守并致力于Python與爬蟲領域研究與開發工作!
【🌟作者推薦】:對爬蟲領域以及JS逆向分析感興趣的朋友可以關注《爬蟲JS逆向實戰》《深耕爬蟲領域》
未來作者會持續更新所用到、學到、看到的技術知識!包括但不限于:各類驗證碼突防、爬蟲APP與JS逆向分析、RPA自動化、分布式爬蟲、Python領域等相關文章
作者聲明:文章僅供學習交流與參考!嚴禁用于任何商業與非法用途!否則由此產生的一切后果均與作者無關!如有侵權,請聯系作者本人進行刪除!
1. 寫在前面
??這個站搜索請求必須攜帶一個Token
,生成的話是在它自己sec
接口請求生成的(無感驗證生成
),請求生成Token
的參數中有驗簽需要處理,源碼套了混淆。有概率會出現二次驗證(極驗
),總得來說比較簡單,之前一個小伙伴找到咨詢補環境的時候出現異常時因為反調試的問題,整個只需要處理一下格式化檢測跟那個內存溢出無限循行的問題就可以
分析網站:
aHR0cHM6Ly9tLmFwcC5taS5jb20v
2. 接口分析
這里隨便搜索一個關鍵詞,可以看到提交的請求參數中有一個Token
,這個參數的值在上一個請求觸發并在接口響應數據中返回,這個有效性僅一次,不能夠固定。如下所示:
{"msg":"非正常請求","code":403001,"data":null,"logId":"MO-29s4w-elibom-3c-noitcudorp-noitargetni-bew-erotsppa_0825121058059_33aa"}
生成Token
參數的接口請求參數有兩個動態參數(s、d
)需要處理,根據堆棧進入到m.js
混淆過的JS文件,找到發包的位置跟一下可以看到最終參數生成的位置,如下所示:
3. 補環境分析
混淆的JS代碼中實現了一些普遍的反調試
手段,包括不限于環境檢測,Function.prototype.toString
檢測以及一些自動化工具的檢測。混淆源代碼的控制流扁平化跟字符串的加密(所有的字符都放在_0x3fb6數組中
)運行時動態去還原,如下所示:
// Function.prototype.toString檢測??
var _0x4ef304 = function() {var _0x5ca3e4 = new RegExp('\x5c\x77\x2b\x20\x2a\x5c\x28\x5c\x29\x20\x2a\x7b\x5c\x77\x2b\x20\x2a\x5b\x27\x7c\x22\x5d\x2e\x2b\x5b\x27\x7c\x22\x5d\x3b\x3f\x20\x2a\x7d');return !_0x5ca3e4['\x74\x65\x73\x74'](_0x20e69d['\x74\x6f\x53\x74\x72\x69\x6e\x67']());
};// 瀏覽器指紋檢測
function _0x836b91() {/Android ((\d).\d+)/['test'](navigator['userAgent']);return parseInt(RegExp['$2']) < 6;
}// WebGL檢測??
var _0x5c8ed2 = document.createElement('canvas');
var _0x510957 = _0x5c8ed2.getContext('webgl') || _0x5c8ed2.getContext('experimental-webgl');// 無限遞歸(導致崩潰)
function _0x3fa0e2(_0x16d6fa) {if (_0x16d6fa['indexOf']('\x69' === -1)) {_0x3f7dc2(_0x16d6fa);}
}// 內存占用??
var _0x2e4c9a = [];
for (var i = 0; i < 1000000; i++) {_0x2e4c9a.push(Math.random());
}
可以看到上圖中_0x27edce
就是入口的加密函數了,兩個參數一個是env
的結構化數據,還有一個固定的字符串search
傳不傳都可以,如下圖所示:
如果是選擇補環境的方案,不想去分析整個JS的混淆加密邏輯,只需要把m.js
整個源碼拿出來即可,補環境這里作者使用jsdom
快速實現的(大家也可以自己手補或者用其他的框架都行
),環境頭如下所示:
const { JSDOM } = require("jsdom");const baseUrl = "https://m.app.mi.com/";const dom = new JSDOM("", {url: baseUrl,referrer: baseUrl,userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36",runScripts: "dangerously"
});window = dom.window;document = window.document;window.HTMLCanvasElement.prototype.getContext = function() {return {fillRect: function() {},clearRect: function() {},getImageData: function(x, y, w, h) {return {data: new Uint8ClampedArray(w * h * 4)};},putImageData: function() {},createImageData: function() {return [];},setTransform: function() {},drawImage: function() {},save: function() {},fillText: function() {},restore: function() {},beginPath: function() {},moveTo: function() {},lineTo: function() {},closePath: function() {},stroke: function() {},translate: function() {},scale: function() {},rotate: function() {},arc: function() {},fill: function() {},measureText: function() {return { width: 0 };},transform: function() {},rect: function() {},clip: function() {},};
};window.HTMLCanvasElement.prototype.toDataURL = function() {return "";
};navigator = {appVersion:"5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36",platform:'MacIntel',appCodeName:'Mozilla',appName:'Netscape',language:'en-US',product:'Gecko',vendorSub:'',vendor:'Google Inc.',userAgent:'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36'
}
然后使用window
導出_0x27edce
到全局使用即可,有兩個地方的小細節需要處理一下。就是上面檢測點里面的一個無限遞歸導致內存滿溢出的問題
還有一個就是格式化檢測
,處理一下注釋或者修改一下就可以,如下所示:
4. 純算法還原
function _0x27edce(_0xd7d75d, _0x264211) {var _0x4874ab = function(_0xd7d75d) {for (var _0x264211 = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '-', '=', '_', '+', '~', '`', '{', '}', '[', ']', '|', ':', '<', '>', '?', '/', '.'], _0x4874ab = [], _0xb6b2f = 0x0; _0xb6b2f < _0xd7d75d; _0xb6b2f += 0x1)_0x4874ab[_0x5ebc('0x17')](_0x264211[parseInt(0x59 * Math['random'](), 0xa)]);return _0x4874ab[_0x5ebc('0x19')]('');}(0x10), _0xb6b2f = _0x25aa39[_0x5ebc('0x261')][_0x5ebc('0x262')][_0x5ebc('0x263')](_0x5ebc('0x264')), _0xd88633 = _0x25aa39[_0x5ebc('0x265')]['pkcs7'][_0x5ebc('0x266')](_0x25aa39[_0x5ebc('0x261')][_0x5ebc('0x262')][_0x5ebc('0x263')](JSON['stringify'](_0xd7d75d))), _0xd88633 = new _0x25aa39[(_0x5ebc('0x267'))][(_0x5ebc('0x12f'))](_0x25aa39['utils']['utf8'][_0x5ebc('0x263')](_0x4874ab),_0xb6b2f)[_0x5ebc('0x12b')](_0xd88633), _0xd88633 = _0x5aeeb2['encode'](_0x3969ee[_0x5ebc('0x268')](_0x25aa39['utils'][_0x5ebc('0x11d')][_0x5ebc('0x269')](_0xd88633))), _0x4874ab = _0x3250e2[_0x5ebc('0x12b')](_0x5aeeb2[_0x5ebc('0x117')](_0x4874ab), _0x3250e2[_0x5ebc('0x26a')](_0x5ebc('0x26b'))), _0xd7d75d = _0x5aeeb2[_0x5ebc('0x117')](JSON[_0x5ebc('0x26c')](_0xd7d75d)), _0x264211 = (_0x264211 = _0x264211 + _0xd7d75d,_0x379e77[_0x5ebc('0x143')](_0x264211));return Object(_0x1c50fb['i'])() ? {'s': _0x264211,'d': _0xd7d75d} : {'s': _0x4874ab,'d': _0xd88633};
}
這里從上面這段核心的混淆
代碼開始進行分析,還原純算加密的流程,_0xd7d75d
的原始對象是env
的一個大串,_0x264211
是一個可選參數,_0x4874ab
這里從隨機字符表中獲取到了一個16
位的隨機字符(AES的密鑰
),_0x264211
是密鑰的字符集,實現如下:
import randomdef generate_aes_key():charset = list("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890!@#$%^&*()-=_+~`{}[]|:<>?/.")return "".join(random.choice(charset) for _ in range(16)).encode("utf-8")
這里我們得到了AES
的密鑰,也拿到env
的結構化數據,往下繼續看d
參數對應的_0xd88633
怎么來的,混淆JS中特征也很明顯,其中有pkcs7
,跳轉到如下代碼處:
var _0x25aa39 = {'AES': _0x4cca13,'ModeOfOperation': {'cbc': _0x35f959},'utils': {'hex': _0x264211,'utf8': _0x51dd15},'padding': {'pkcs7': {'pad': function(_0xd7d75d) {var _0x264211 = 0x10 - (_0xd7d75d = _0x300d04(_0xd7d75d, !0x0))['length'] % 0x10, _0x4874ab = _0x2d1977(_0xd7d75d[_0x5ebc('0x51')] + _0x264211);_0x294129(_0xd7d75d, _0x4874ab);for (var _0xb6b2f = _0xd7d75d[_0x5ebc('0x51')]; _0xb6b2f < _0x4874ab[_0x5ebc('0x51')]; _0xb6b2f++)_0x4874ab[_0xb6b2f] = _0x264211;return _0x4874ab;}}}
}
_0x5ebc('0x264')
這個是AES加密的IV
,_0x5ebc('0x12f')
是AES加密使用的模式,然后_0x5aeeb2
調用的如下:
var _0x5aeeb2 = {'base64': _0x5ebc('0x119'),'encode': function(_0xd7d75d) {if (!_0xd7d75d)return !0x1;for (var _0x264211, _0x4874ab, _0xb6b2f, _0xd88633, _0x49094b, _0x3aca4c, _0x253b33 = '', _0x2f77ed = 0x0; _0xb6b2f = (_0x3aca4c = _0xd7d75d[_0x5ebc('0xec')](_0x2f77ed++)) >> 0x2,_0xd88633 = (0x3 & _0x3aca4c) << 0x4 | (_0x264211 = _0xd7d75d[_0x5ebc('0xec')](_0x2f77ed++)) >> 0x4,_0x49094b = (0xf & _0x264211) << 0x2 | (_0x4874ab = _0xd7d75d[_0x5ebc('0xec')](_0x2f77ed++)) >> 0x6,_0x3aca4c = 0x3f & _0x4874ab,isNaN(_0x264211) ? _0x49094b = _0x3aca4c = 0x40 : isNaN(_0x4874ab) && (_0x3aca4c = 0x40),_0x253b33 += this[_0x5ebc('0x11a')][_0x5ebc('0x8b')](_0xb6b2f) + this[_0x5ebc('0x11a')][_0x5ebc('0x8b')](_0xd88633) + this[_0x5ebc('0x11a')][_0x5ebc('0x8b')](_0x49094b) + this[_0x5ebc('0x11a')][_0x5ebc('0x8b')](_0x3aca4c),_0x2f77ed < _0xd7d75d[_0x5ebc('0x51')]; );return _0x253b33;},'decode': function(_0xd7d75d) {if (!_0xd7d75d)return !0x1;_0xd7d75d = _0xd7d75d[_0x5ebc('0x43')](/[^A-Za-z0-9\+\/\=]/g, '');for (var _0x264211, _0x4874ab, _0xb6b2f, _0xd88633, _0x49094b = '', _0x3aca4c = 0x0; _0x264211 = this[_0x5ebc('0x11a')][_0x5ebc('0x1b')](_0xd7d75d[_0x5ebc('0x8b')](_0x3aca4c++)),_0x4874ab = this['base64'][_0x5ebc('0x1b')](_0xd7d75d[_0x5ebc('0x8b')](_0x3aca4c++)),_0xb6b2f = this[_0x5ebc('0x11a')][_0x5ebc('0x1b')](_0xd7d75d[_0x5ebc('0x8b')](_0x3aca4c++)),_0xd88633 = this['base64'][_0x5ebc('0x1b')](_0xd7d75d[_0x5ebc('0x8b')](_0x3aca4c++)),_0x49094b += String[_0x5ebc('0x11b')](_0x264211 << 0x2 | _0x4874ab >> 0x4),0x40 != _0xb6b2f && (_0x49094b += String[_0x5ebc('0x11b')]((0xf & _0x4874ab) << 0x4 | _0xb6b2f >> 0x2)),0x40 != _0xd88633 && (_0x49094b += String['fromCharCode']((0x3 & _0xb6b2f) << 0x6 | _0xd88633)),_0x3aca4c < _0xd7d75d[_0x5ebc('0x51')]; );return _0x49094b;}
}
現在上面的分析,可以知道參數d
的實現先是對env_data
數據進行了一個JSON序列化,如下所示:
然后生成AES
的密鑰,根據調試信息中獲取到的CBC、IV
等信息對參數d
加密并編碼,還原算法實現如下所示:
import json
import base64
from Crypto.Util.Padding import pad
from Crypto.Cipher import AES, PKCS1_v1_5def aes_cbc_encrypt_fixed_iv(key: bytes, data: bytes) -> bytes:iv = b"0102030405060708"cipher = AES.new(key, AES.MODE_CBC, iv)return cipher.encrypt(pad(data, AES.block_size))def sign(env_data: dict) -> dict:json_data = json.dumps(env_data, separators=(',', ':'), ensure_ascii=False).encode('utf-8')# 隨機16位密鑰aes_key = generate_aes_key()encrypted_data = aes_cbc_encrypt_fixed_iv(aes_key, json_data)aes_key_b64 = base64.b64encode(aes_key).decode()d = base64.b64encode(encrypted_data).decode()return d
接下來看參數s
是如何加密生成的,_0x4874ab
這個地方獲取了一個getPublicKey
,然后公鑰在_0x5ebc('0x26b')
進行了一個RSA
的加密,在最初的大數組中也能看到相關的特征,如下所示:
_0x3250e2 = {'getPublicKey': function(_0xd7d75d) {return !(_0xd7d75d[_0x5ebc('0x51')] < 0x32) && (_0x5ebc('0x11e') == _0xd7d75d['substr'](0x0, 0x1a) && ('-----END\x20PUBLIC\x20KEY-----' == (_0xd7d75d = _0xd7d75d[_0x5ebc('0x115')](0x1a))[_0x5ebc('0x115')](_0xd7d75d[_0x5ebc('0x51')] - 0x18) && (_0xd7d75d = _0xd7d75d['substr'](0x0, _0xd7d75d['length'] - 0x18),!(_0xd7d75d = new _0x56ab29(_0x5aeeb2['decode'](_0xd7d75d)))[_0x5ebc('0x42')] && (_0x5ebc('0x11f') === (_0xd7d75d = _0xd7d75d[_0x5ebc('0x11')])[0x0][0x0][0x0] && new _0x8f65e0(_0xd7d75d[0x0][0x1][0x0][0x0],_0xd7d75d[0x0][0x1][0x0][0x1])))));},'encrypt': function(_0xd7d75d, _0x264211) {if (!_0x264211)return !0x1;var _0x4874ab = _0x264211[_0x5ebc('0x116')][_0x5ebc('0x10f')]() + 0x7 >> 0x3;if (!(_0xd7d75d = this[_0x5ebc('0x120')](_0xd7d75d, _0x4874ab)))return !0x1;if (!(_0xd7d75d = _0xd7d75d[_0x5ebc('0x121')](_0x264211[_0x5ebc('0x118')], _0x264211[_0x5ebc('0x116')])))return !0x1;for (_0xd7d75d = _0xd7d75d[_0x5ebc('0x54')](0x10); _0xd7d75d[_0x5ebc('0x51')] < 0x2 * _0x4874ab; )_0xd7d75d = '0'[_0x5ebc('0x18')](_0xd7d75d);return _0x5aeeb2[_0x5ebc('0x117')](_0x3969ee['decode'](_0xd7d75d));},'pkcs1pad2': function(_0xd7d75d, _0x264211) {if (_0x264211 < _0xd7d75d[_0x5ebc('0x51')] + 0xb)return null;for (var _0x4874ab = [], _0xb6b2f = _0xd7d75d[_0x5ebc('0x51')] - 0x1; 0x0 <= _0xb6b2f && 0x0 < _0x264211; )_0x4874ab[--_0x264211] = _0xd7d75d[_0x5ebc('0xec')](_0xb6b2f--);for (_0x4874ab[--_0x264211] = 0x0; 0x2 < _0x264211; )_0x4874ab[--_0x264211] = Math['floor'](0xfe * Math[_0x5ebc('0x3d')]()) + 0x1;return _0x4874ab[--_0x264211] = 0x2,_0x4874ab[--_0x264211] = 0x0,new _0x20635b(_0x4874ab);}
綜上分析發現它這個參數s
是對AES
的密鑰進行了一層RSA
后再編碼得到的,所以服務端那邊的校驗則是先對s
參數的值進行一個B64
的解碼,然后使用RSA私鑰
進行解密得到AES
的密鑰,服務端再拿著這個16
字節的密鑰去解參數d
加密后的業務數據,以此驗證本次請求的合法性,至此純算的加密流程圖及算法實現如下:
from Crypto.PublicKey import RSA
def rsa_encrypt_pkcs1_v1_5(data: bytes, public_key_pem: str) -> bytes:rsa_key = RSA.import_key(public_key_pem)cipher = PKCS1_v1_5.new(rsa_key)return cipher.encrypt(data)aes_key = generate_aes_key()
aes_key_b64 = base64.b64encode(aes_key).decode()
s = rsa_encrypt_pkcs1_v1_5(aes_key_b64.encode())
不管是補環境還是純算,有一處小細節需要注意一下。在構建環境數據env_data
的時候,涉及到時間戳的地方都需要動態生成傳遞,然后如果加密參數不對的話是通過不了接口驗簽的,會出現如下所示的情況:
{'msg': '參數錯誤', 'code': 400, 'data': {'message': 'invalid data', 'status': 463}}
文中之前開頭也提到了會有概率觸發一個極驗
的二次行為驗證滑塊,這個感興趣的也可以去分析一下,觸發率極低,調試的時候就出現過一次,如下所示: