python系统安全工具软件汇编一线开发 [2021]余弦定理检测文件相似度 & 病毒样本基因检测 本余弦定理有如下应用场景: 1.相似度计算 2.信息推送 在网络安全领域,主要就是样本基因检测,或者叫做样本相似度计算,他的公式长这样:  请注意,之所以叫做余弦定理,是因为,他就是求一个三角形的角,并且在N维这个定理也成立  样本相似度检测,以两个风灵月影为例,属于同一个家族:  # 编码 通过pefile库,读入文件,然后逐个比对字节码,参数A为字节码相同的,参数B为字节码不同的: ```cpp def get_peinfo_by_cos(pSource,pTarget): source = pefile.PE(pSource) target = pefile.PE(pTarget) source_map,source_sizeof_code,source_base_of_code = get_pe_info(source) target_map,target_sizeof_code,target_base_of_code = get_pe_info(target) a1_dict = {} a2_dict = {} for iter in range(source_sizeof_code): v1 = iter + source_base_of_code v2 = iter + source_base_of_code + 1 if source_map[v1:v2] in a1_dict.keys(): a1_dict[source_map[v1:v2]] = a1_dict[source_map[v1:v2]] + 1 else: a1_dict[source_map[v1:v2]] = 0 for iter in range(target_sizeof_code): v1 = iter + target_base_of_code v2 = iter + target_base_of_code + 1 if target_map[v1:v2] in a2_dict.keys(): a2_dict[target_map[v1:v2]] = a2_dict[target_map[v1:v2]] + 1 else: a2_dict[target_map[v1:v2]] = 0 str1_vector=[] str2_vector=[] for key in a1_dict: str1_count = a1_dict[key] str1_vector.append(str1_count) for key in a2_dict: str2_count = a2_dict[key] str2_vector.append(str2_count) str1_map = map(lambda x: x*x,str1_vector) str2_map = map(lambda x: x*x,str2_vector) str1_mod = reduce(lambda x, y: x+y, str1_map) str2_mod = reduce(lambda x, y: x+y, str2_map) str1_mod = math.sqrt(str1_mod) str2_mod = math.sqrt(str2_mod) vector_multi = reduce(lambda x, y: x + y, map(lambda x, y: x * y, str1_vector, str2_vector)) # 计算余弦值 cos = float(vector_multi)/(str1_mod*str2_mod) return cos ``` 其中,两个是相似的,两个是不相似的,两个是恶意样本家族 来试试:  简单粗暴,并且有效. 完整代码: 阅读全文 2021-08-19 huoji 1 条评论
python工具软件 [2021]python批量比较校验两个文件夹里面的文件md5 网上的东西都不怎么靠谱,这里发一个自己写的,用来快速确认有没有后门之类的东西 ```python import base64 import hashlib import os g_origin_path = "【目录】" g_target_path = "【目录】" def get_file_md5(filepath): f = open(filepath, 'rb') md5obj = hashlib.md5() md5obj.update(f.read()) hash = md5obj.hexdigest() f.close() return str(hash).upper() for root, dirs, files in os.walk(g_origin_path): for file in files: origin_file_path = os.path.join(root, file) strlist = origin_file_path.split('\\') target_file_path = g_origin_path + '\\'.join(strlist[5:]) if os.path.exists(target_file_path) == False: print("多出文件: {} ".format(target_file_path)) continue origin_file_md5 = get_file_md5(origin_file_path) target_file_md5 = get_file_md5(target_file_path) #print(origin_file_md5, target_file_md5) if origin_file_md5 != target_file_md5: print("md5不同 路径: {} src: {} target: {}".format( target_file_path, origin_file_md5, target_file_md5)) ``` 阅读全文 2021-08-02 huoji 0 条评论