[2020]从0制作一个x86反病毒沙箱,附源码 huoji CPU仿真,sandbox 2022-09-07 942 次浏览 106 次点赞 几个月前看到前辈的文章有所触动 所以准备从0实现一个沙箱 一方面用作自己的技术提升 另外一方面世界范围内这方面资料实在是太少了.写在这里,给后人看. ![](https://key08.com/usr/uploads/2022/09/629269012.png) 我会用几篇文章来更新这个沙箱,篇幅有限.工程量很大,前前后后制作了几个月才勉强有些效果 ### 仿真效果 可以看到程序调用了getmodulehandleW和getmoudleHandleA的 API 并且调用了printf,还调用system函数参数是pause: ![](https://key08.com/usr/uploads/2022/09/930475382.png) 真实代码: ![](https://key08.com/usr/uploads/2022/09/2736705438.png) 要实现的功能:(✔为完成,*是制作中) > 1.系统环境模拟 ✔ > 2.API模拟 ✔ > 3.X32 & X64支持 ✔<-本文暂时介绍到这里 > 4.多线程仿真 * > 5.异常处理仿真 * 一些计划中的功能 > 6. VM handle分析 > 7. 转LLVM IR > 8. 工程化脱壳 <- VM handle分析 & 控制流平坦化 & 转IR code & 编译器优化 这是我做这个东西的终极目标 > 9. 机器学习自动化病毒分析 ### 实现 为了实现沙箱功能 我们需要使用UE(unicorn-engine) 作为我们的CPU仿真引擎(这一点比以前的前辈们幸福,他们要手写一个CPU出来) 让我们缕缕我们要干什么: > 1. 读入文件,内存中做重定位,解决掉32位重定位与64重定位的差异,解决ApiSetMap的问题修复导入表 2. 初始化系统环境TEB PEB GS FS RSP RIP ESP EIP等 3. 写入内存数据到虚拟机中 4. 接管API函数 手动重定向 + 记录 读入文件: ```cpp process.m_buffer = peconv::load_pe_module((const char*)parms_file_path.c_str(), size_of_pe, false, false); if (process.m_buffer == NULL) __debugbreak(); ``` 安排导出表 ```cpp std::vector pe_action::get_export(PVOID params_image_base) { std::vector result; //导出表 DWORD uExportSize = 0; if(RtlImageDirectoryEntryToData == NULL) RtlImageDirectoryEntryToData = (RtlImageDirectoryEntryToDataFn)GetProcAddress(LoadLibraryA("ntdll.dll"), "RtlImageDirectoryEntryToData"); PIMAGE_EXPORT_DIRECTORY pImageExportDirectory = (PIMAGE_EXPORT_DIRECTORY)RtlImageDirectoryEntryToData((PVOID)params_image_base, TRUE, IMAGE_DIRECTORY_ENTRY_EXPORT, &uExportSize); if (pImageExportDirectory) { moudle_export export_data = { 0 }; DWORD dwNumberOfNames = (DWORD)(pImageExportDirectory->NumberOfNames); DWORD* pAddressOfFunction = (DWORD*)((PUCHAR)params_image_base + pImageExportDirectory->AddressOfFunctions); DWORD* pAddressOfNames = (DWORD*)((PUCHAR)params_image_base + pImageExportDirectory->AddressOfNames); WORD* pAddressOfNameOrdinals = (WORD*)((PUCHAR)params_image_base + pImageExportDirectory->AddressOfNameOrdinals); for (size_t i = 0; i < dwNumberOfNames; i++) { char* strFunction = (char*)((PUCHAR)params_image_base + pAddressOfNames[i]); //没有处理forward jmp export function DWORD functionRva = pAddressOfFunction[pAddressOfNameOrdinals[i]]; //DWORD base = (DWORD)params_image_base + functionRva; moudle_export export_data = { 0 }; memcpy(export_data.name, (char*)strFunction, strlen(strFunction)); export_data.function_address = functionRva; result.push_back(export_data); } } return result; } ``` 处理延迟导入表 ```cpp bool peconv::load_delayed_imports(BYTE* modulePtr, ULONGLONG moduleBase, t_function_resolver* func_resolver) { const bool is_64bit = peconv::is64bit(modulePtr); const size_t module_size = peconv::get_image_size(modulePtr); default_func_resolver default_res; if (!func_resolver) { func_resolver = (t_function_resolver*)&default_res; } size_t table_size = 0; IMAGE_DELAYLOAD_DESCRIPTOR* first_desc = get_delayed_imps(modulePtr, module_size, table_size); if (!first_desc) { return false; } #ifdef _DEBUG std::cout << "OK, table_size = " << table_size << std::endl; #endif size_t max_count = table_size / sizeof(IMAGE_DELAYLOAD_DESCRIPTOR); for (size_t i = 0; i < max_count; i++) { IMAGE_DELAYLOAD_DESCRIPTOR* desc = &first_desc[i]; if (!validate_ptr(modulePtr, module_size, desc, sizeof(IMAGE_DELAYLOAD_DESCRIPTOR))) break; if (desc->DllNameRVA == NULL) { break; } ULONGLONG dll_name_rva = desc->DllNameRVA; if (dll_name_rva > moduleBase) { dll_name_rva -= moduleBase; } char* dll_name = (char*)((ULONGLONG)modulePtr + dll_name_rva); if (!validate_ptr(modulePtr, module_size, dll_name, sizeof(char))) continue; #ifdef _DEBUG std::cout << dll_name << std::endl; #endif if (is_64bit) { return parse_delayed_desc(modulePtr, module_size, moduleBase, dll_name, IMAGE_ORDINAL_FLAG64, desc, func_resolver); } else { return parse_delayed_desc(modulePtr, module_size, moduleBase, dll_name, IMAGE_ORDINAL_FLAG32, desc, func_resolver); } } return true; } ``` 处理导入表 ```cpp bool peconv::process_import_table(IN BYTE* modulePtr, IN SIZE_T moduleSize, IN ImportThunksCallback *callback) { if (moduleSize == 0) { //if not given, try to fetch moduleSize = peconv::get_image_size((const BYTE*)modulePtr); } if (moduleSize == 0) return false; IMAGE_DATA_DIRECTORY *importsDir = get_directory_entry((BYTE*)modulePtr, IMAGE_DIRECTORY_ENTRY_IMPORT); if (!importsDir) { return true; //no import table } const DWORD impAddr = importsDir->VirtualAddress; IMAGE_IMPORT_DESCRIPTOR *first_desc = (IMAGE_IMPORT_DESCRIPTOR*)(impAddr + (ULONG_PTR)modulePtr); if (!peconv::validate_ptr(modulePtr, moduleSize, first_desc, sizeof(IMAGE_IMPORT_DESCRIPTOR))) { return false; } return process_dlls(modulePtr, moduleSize, first_desc, callback); } ``` 判断是否是64位 我们要做其他的操作 ```cpp WORD peconv::get_nt_hdr_architecture(IN const BYTE *pe_buffer) { void *ptr = get_nt_hdrs(pe_buffer); if (!ptr) return 0; IMAGE_NT_HEADERS32 *inh = static_cast(ptr); if (IsBadReadPtr(inh, sizeof(IMAGE_NT_HEADERS32))) { return 0; } return inh->OptionalHeader.Magic; } bool peconv::is64bit(IN const BYTE *pe_buffer) { WORD arch = get_nt_hdr_architecture(pe_buffer); if (arch == IMAGE_NT_OPTIONAL_HDR64_MAGIC) { return true; } return false; } ``` ### 设置虚拟机环境 ![](https://key08.com/usr/uploads/2022/09/2067221194.png) 利用IMAGE_DIRECTORY_ENTRY_BASERELOC重定位模块 : ```cpp IMAGE_DATA_DIRECTORY* relocDir = peconv::get_directory_entry((const BYTE*)modulePtr, IMAGE_DIRECTORY_ENTRY_BASERELOC); if (relocDir == NULL) { std::cout << "[!] WARNING: no relocation table found!\n"; return false; } if (!validate_ptr(modulePtr, moduleSize, relocDir, sizeof(IMAGE_DATA_DIRECTORY))) { std::cerr << "[!] Invalid relocDir pointer\n"; return false; } DWORD maxSize = relocDir->Size; DWORD relocAddr = relocDir->VirtualAddress; bool is64b = is64bit((BYTE*)modulePtr); IMAGE_BASE_RELOCATION* reloc = NULL; DWORD parsedSize = 0; DWORD validBlocks = 0; while (parsedSize < maxSize) { reloc = (IMAGE_BASE_RELOCATION*)(relocAddr + parsedSize + (ULONG_PTR)modulePtr); if (!validate_ptr(modulePtr, moduleSize, reloc, sizeof(IMAGE_BASE_RELOCATION))) { std::cerr << "[-] Invalid address of relocations\n"; return false; } if (reloc->SizeOfBlock == 0) { break; } size_t entriesNum = (reloc->SizeOfBlock - 2 * sizeof(DWORD)) / sizeof(WORD); DWORD page = reloc->VirtualAddress; BASE_RELOCATION_ENTRY* block = (BASE_RELOCATION_ENTRY*)((ULONG_PTR)reloc + sizeof(DWORD) + sizeof(DWORD)); if (!validate_ptr(modulePtr, moduleSize, block, sizeof(BASE_RELOCATION_ENTRY))) { std::cerr << "[-] Invalid address of relocations block\n"; return false; } if (!is_empty_reloc_block(block, entriesNum, page, modulePtr, moduleSize)) { if (process_reloc_block(block, entriesNum, page, modulePtr, moduleSize, is64b, callback)) { validBlocks++; } else { // the block was malformed return false; } } parsedSize += reloc->SizeOfBlock; } return (validBlocks != 0); ``` ### 安排掉APISETMAP ![](https://key08.com/usr/uploads/2022/09/2437183747.png) 这些api-ms-win-xxx 都不是真实函数 都是跳转函数 我们希望能直接指向真实函数 ,所幸在gs->API_SET_NAMESPACE_ARRAY_10有对应的结构 ```cpp std::string get_dll_name_from_api_set_map(const std::string& api_set) { std::wstring wapi_set(api_set.begin(), api_set.end()); typedef LONG(__stdcall* fnRtlGetVersion)(PRTL_OSVERSIONINFOW lpVersionInformation); fnRtlGetVersion pRtlGetVersion = (fnRtlGetVersion)GetProcAddress(LoadLibraryA("ntdll.dll"), "RtlGetVersion"); RTL_OSVERSIONINFOEXW verInfo = { 0 }; verInfo.dwOSVersionInfoSize = sizeof(verInfo); pRtlGetVersion((PRTL_OSVERSIONINFOW)&verInfo); ULONG ver_short = (verInfo.dwMajorVersion << 8) | (verInfo.dwMinorVersion << 4) | verInfo.wServicePackMajor; if (ver_short >= WINVER_10) { auto apiSetMap = (API_SET_NAMESPACE_ARRAY_10*)((X64PEB*)__readgsqword(0x60))->ApiSetMap; auto apiSetMapAsNumber = reinterpret_cast(apiSetMap); auto nsEntry = reinterpret_cast((apiSetMap->Start + apiSetMapAsNumber)); for (ULONG i = 0; i < apiSetMap->Count; i++) { UNICODE_STRING nameString, valueString; nameString.MaximumLength = static_cast(nsEntry->NameLength); nameString.Length = static_cast(nsEntry->NameLength); nameString.Buffer = reinterpret_cast(apiSetMapAsNumber + nsEntry->NameOffset); std::wstring name = std::wstring(nameString.Buffer, nameString.Length / sizeof(WCHAR)) + L".dll"; if (_wcsicmp(wapi_set.c_str(), name.c_str()) == 0) { auto valueEntry = reinterpret_cast(apiSetMapAsNumber + nsEntry->ValueOffset); if (nsEntry->ValueCount == 0) return ""; valueString.Buffer = reinterpret_cast(apiSetMapAsNumber + valueEntry->ValueOffset); valueString.MaximumLength = static_cast(valueEntry->ValueLength); valueString.Length = static_cast(valueEntry->ValueLength); auto value = std::wstring(valueString.Buffer, valueString.Length / sizeof(WCHAR)); //note: there might be more than one value, but we will just return the first one.. return std::string(value.begin(), value.end()); } nsEntry++; } } else { __debugbreak(); } return ""; } ``` ```cpp DWORD call_via_rva = static_cast((ULONG_PTR)call_via - (ULONG_PTR)this->modulePtr); //std::cout << "via RVA: " << std::hex << call_via_rva << " : "; LPSTR func_name = NULL; if ((desc->u1.Ordinal & ordinal_flag) == 0) { //名字解析 PIMAGE_IMPORT_BY_NAME by_name = (PIMAGE_IMPORT_BY_NAME)((ULONGLONG)modulePtr + desc->u1.AddressOfData); func_name = reinterpret_cast(by_name->Name); //std::cout << "name: " << func_name << " dll:" << lib_name << std::endl; std::string fuck_up_api_ms = lib_name; if (fuck_up_api_ms.find("api-ms-") != std::string::npos) { fuck_up_api_ms = get_dll_name_from_api_set_map(fuck_up_api_ms); if (fuck_up_api_ms.size() <= 1) __debugbreak(); } moudle_import import_data = { 0 }; memcpy(import_data.name, (char*)func_name, strlen(func_name)); memcpy(import_data.dll_name, (char*)fuck_up_api_ms.c_str(), fuck_up_api_ms.size()); import_data.function_address = call_via_rva; import_data.is_delayed_import = false; nameToAddr.push_back(import_data); } return true; ``` ### 初始化虚拟环境 ![](https://key08.com/usr/uploads/2022/09/1494357735.png) ![](https://key08.com/usr/uploads/2022/09/1181928186.png) 上图解释了为什么下图 rsp = stack_end - 128 ![](https://key08.com/usr/uploads/2022/09/1946764319.png) 设置追踪代码 ![](https://key08.com/usr/uploads/2022/09/894127217.png) 开始模拟 ![](https://key08.com/usr/uploads/2022/09/4094404661.png) 请注意 请注意我还没有设置32的 环境(比如FS结构等) 暂时用不到 以后再设置 ### 追踪代码流 ![](https://key08.com/usr/uploads/2022/09/1009094249.png) 检查是否跳转到我们要执行的代码: ![](https://key08.com/usr/uploads/2022/09/2850065562.png) 请注意retn 4 = add esp 4 + 4 (32位传参) 64传参是 rcx rdx r8 r9 rsp + 8 ... 前四个寄存器传参 然后到栈传参 32是esp + 4 + 8 + 12 + 16... 全部栈传参 到了我们模拟的程序后 手动模拟执行情况&系统环境 ![](https://key08.com/usr/uploads/2022/09/1345829124.png) 当然这是一个大工程我还没有完全做完 我们要设置 RIP/EIP跳回去 同时恢复堆栈避免堆栈被破坏: ![](https://key08.com/usr/uploads/2022/09/2232617079.png) 终于 有了现在的效果: 执行流程可以观察的一清二楚: 堆栈执行情况 ![](https://key08.com/usr/uploads/2022/09/2007834092.png) 程序函数调用情况 ![](https://key08.com/usr/uploads/2022/09/2560891796.png) ```cpp KERNEL32.dll.GetModuleHandleA(0000000053898F60) caller: 0000000053898F60 params: 0{char}ntdll.dll KERNEL32.dll.GetModuleHandleW(0000000053899710) caller: 0000000053899710 params: 0{wchar}duck ucrtba.system(000000005404C250) caller: 000000005404C250 params: 0{char}pause ``` 可以知道这个程序调用了GetModuleHandleA加载了ntdll 并且调用了 GetModuleHandleW加载了duck.dll 最后调用system("pause") 当然这只是一小步 接下来我们还需要继续深入完善它 包括设置完全模拟FS环境 等情况 > 后记: 看不懂不要紧,只是段位还没到,就比如以前看批处理觉得好难 再到易语言觉得好难 ,到看sql语句觉得很难 再到看php觉得好难 再到C觉得好难 再到现在觉得都挺容易的一样 人要有个学习的过程,不要惧怕难,难的是现在段位不够,可以先收藏等过段时间再看.只要不半途而废,或者安于享乐 一定可以学习出来的.学习永无止境,用一个很多年轻人都不认识的学了30多年安全的黑客大佬话来说 ```cpp "学习的热情,不能随着季节的变化而变化" ``` 本文由 huoji 创作,采用 知识共享署名 3.0,可自由转载、引用,但需署名作者且注明文章出处。 点赞 106
老哥分享一下前辈文章链接呗