分类 python 下的文章 - 白帽Wiki

2023-07-19 huoji 3 条评论

python 二进制安全

[2022]填鸭式shellcode编写教程 (四) 服务端编写

阅读全文

2022-09-08 huoji 1 条评论

python 二进制安全 C/C++汇编

此内容已经被删除

阅读全文

2022-04-05 huoji 0 条评论

python 工具软件二进制安全

[2022] 基于NLP的威胁检出引擎

阅读全文

2022-03-17 huoji 1 条评论

python 系统安全工具软件汇编一线开发

[2021]余弦定理检测文件相似度 & 病毒样本基因检测

本余弦定理有如下应用场景:
1.相似度计算
2.信息推送

在网络安全领域,主要就是样本基因检测,或者叫做样本相似度计算,他的公式长这样:
![](https://key08.com/usr/uploads/2021/08/2243488000.png)
请注意,之所以叫做余弦定理,是因为,他就是求一个三角形的角,并且在N维这个定理也成立
![](https://key08.com/usr/uploads/2021/08/3451125394.png)

样本相似度检测,以两个风灵月影为例,属于同一个家族:
![](https://key08.com/usr/uploads/2021/08/947620823.png)

# 编码
通过pefile库,读入文件,然后逐个比对字节码,参数A为字节码相同的,参数B为字节码不同的:
```cpp
def get_peinfo_by_cos(pSource,pTarget):
    source = pefile.PE(pSource)
    target = pefile.PE(pTarget)
    source_map,source_sizeof_code,source_base_of_code = get_pe_info(source)
    target_map,target_sizeof_code,target_base_of_code = get_pe_info(target)
    a1_dict = {}
    a2_dict = {}
    for iter in range(source_sizeof_code):
        v1 = iter + source_base_of_code
        v2 = iter + source_base_of_code + 1
        if source_map[v1:v2] in a1_dict.keys():
            a1_dict[source_map[v1:v2]] = a1_dict[source_map[v1:v2]] + 1
        else:
            a1_dict[source_map[v1:v2]] = 0 
    for iter in range(target_sizeof_code):
        v1 = iter + target_base_of_code
        v2 = iter + target_base_of_code + 1
        if target_map[v1:v2] in a2_dict.keys():
            a2_dict[target_map[v1:v2]] = a2_dict[target_map[v1:v2]] + 1
        else:
            a2_dict[target_map[v1:v2]] = 0
    str1_vector=[]
    str2_vector=[]
    for key in a1_dict:
        str1_count = a1_dict[key]
        str1_vector.append(str1_count)
    for key in a2_dict:
        str2_count = a2_dict[key]
        str2_vector.append(str2_count)
    str1_map = map(lambda x: x*x,str1_vector)
    str2_map = map(lambda x: x*x,str2_vector)

str1_mod =  reduce(lambda x, y: x+y, str1_map)
    str2_mod = reduce(lambda x, y: x+y, str2_map)
    str1_mod = math.sqrt(str1_mod)
    str2_mod = math.sqrt(str2_mod)
    vector_multi = reduce(lambda x, y: x + y, map(lambda x, y: x * y, str1_vector, str2_vector))

# 计算余弦值
    cos = float(vector_multi)/(str1_mod*str2_mod)
    return cos

```
其中,两个是相似的,两个是不相似的,两个是恶意样本家族
来试试:
![](https://key08.com/usr/uploads/2021/08/1402924204.png)
简单粗暴,并且有效.
完整代码:

阅读全文

2021-08-19 huoji 1 条评论

python 一线开发

[2021]从0开始的tensorflow2.0 (三) LSTM

假设给出如下需求:
我给你几个行为
A B C D E F
要求知道A B C D E 推测出F
这种使用场景就能使用LSTM,关于LSTM本文就不废话了,因为网上介绍一大堆了,直接上干货:

首先我们需要将数据 A B C D E F 转为编号:0 1 2 3 4 6
其次,对其进行扁平归一化,并且划分训练和测试数据：
```cpp
train_path = './result_list.csv'
data_frame = pd.read_csv(train_path)
data_frame['activity'] = data_frame['activity'].astype('float32')

scaler = StandardScaler()
data_frame['activity'] = scaler.fit_transform(
    data_frame['activity'].values.reshape(-1, 1), scaler.fit(data_frame['activity'].values.reshape(-1, 1)))

train_size = int(len(data_frame['activity']) * 0.75)
trainlist = data_frame['activity'][:train_size]
testlist = data_frame['activity'][train_size:]
```
读出来应该是:
0
1
2
3
4
5
....

然后构造滑块,成0 1 2 3 4(X), 5(Y)的样子：
```cpp
look_back = 64
trainX, trainY = create_dataset(trainlist, look_back, None)
testX, testY = create_dataset(testlist, look_back, train_size)
```
注意网上的create_dataset代码都过时了,大部分你直接抄就会报错,用我的就行:
```cpp
def create_dataset(dataset, look_back, start_index):
    dataX, dataY = [], []
    for i in range(len(dataset)-look_back-1):
        a = dataset[i:(i+look_back)]
        dataX.append(a)
        if start_index != None:
            dataY.append(dataset[start_index + i + look_back])
        else:
            dataY.append(dataset[i + look_back])

return np.array(dataX), np.array(dataY)
```
记得要reshap一下:
```cpp
trainX = trainX.reshape(trainX.shape[0], trainX.shape[1], 1)
testX = testX.reshape(testX.shape[0], testX.shape[1], 1)
```
之后直接训练即可:
```cpp
model = keras.Sequential()
model.add(keras.layers.LSTM(128, input_shape=(look_back, 1), return_sequences=True))
model.add(keras.layers.LSTM(256))
model.add(keras.layers.Dense(1))
model.compile(optimizer=keras.optimizers.Adam(), loss='mae', metrics=['MeanSquaredError'])
model.fit(trainX, trainY, epochs=26, batch_size=128)
model.save('./model_lstm.h5')
```
测试:
![](https://key08.com/usr/uploads/2021/08/1454514772.png)

阅读全文

2021-08-14 huoji 0 条评论

python 工具软件

[2021]python批量比较校验两个文件夹里面的文件md5

网上的东西都不怎么靠谱，这里发一个自己写的,用来快速确认有没有后门之类的东西
```python
import base64
import hashlib
import os
g_origin_path = "【目录】"
g_target_path = "【目录】"

def get_file_md5(filepath):
    f = open(filepath, 'rb')
    md5obj = hashlib.md5()
    md5obj.update(f.read())
    hash = md5obj.hexdigest()
    f.close()
    return str(hash).upper()

for root, dirs, files in os.walk(g_origin_path):
    for file in files:
        origin_file_path = os.path.join(root, file)
        strlist = origin_file_path.split('\\')
        target_file_path = g_origin_path + '\\'.join(strlist[5:])
        if os.path.exists(target_file_path) == False:
            print("多出文件: {} ".format(target_file_path))
            continue
        origin_file_md5 = get_file_md5(origin_file_path)
        target_file_md5 = get_file_md5(target_file_path)

#print(origin_file_md5, target_file_md5)
        if origin_file_md5 != target_file_md5:
            print("md5不同 路径: {} src: {} target: {}".format(
                target_file_path, origin_file_md5, target_file_md5))

```

阅读全文

2021-08-02 huoji 0 条评论

白帽Wiki

一只鸭子

白帽Wiki - 一个简单的wiki

[2023]现代AI杀毒引擎原理+部分代码

[2022]填鸭式shellcode编写教程 (四) 服务端编写

此内容已经被删除

[2022] 基于NLP的威胁检出引擎

[2021]余弦定理检测文件相似度 & 病毒样本基因检测

[2021]从0开始的tensorflow2.0 (三) LSTM

[2021]python批量比较校验两个文件夹里面的文件md5