[2023]迅速上手meta的llama模型 huoji AI,机器学习,人工智能 2023-03-13 1604 次浏览 153 次点赞 首先确保自己有足够的资源配置.这边给个资源配置表: ### 硬件要求 7B模型: 1张3090就行,很垃圾,没什么效果 13B模型: 2张3090就行,很垃圾,没什么效果 30B模型: 4张3090是最低要求,推理速度极慢.有条件四张A100 60B模型: 没跑过 这边以30B模型为例: 首先克隆项目: https://github.com/facebookresearch/llama 然后自己网上找个30B的权重 ### 环境检查 然后去组个机房,进入服务器后,确定自己的CUDA版本:  我是11.6.有些机房可能很老,没有带torchrun,敲torchrun确保有这个东西: 没有的话,自己去pytorch官网,下载符合你的cuda版本的torch(这里踩过坑).切记不要更新 不要更新 cuda,因为cuda更新了很容易把linux玩坏,玩坏了你就只能重新租另外一台了  检查是否没问题: 在确认用torchrun后,输入如下命令,确定自己是否没问题: ```cpp python >>> import torch >>> print(torch.cuda.is_available()) True ``` 一定要返回true ### 下载 模型太大了,我是放到百度云盘的,linux下没啥好用的百度云盘下载工具,所以用了bypy这个python库.直接安装 ``` python -m pip install bypy ``` 然后bypy info,就能登陆百度云盘. 在百度云盘内 把模型移动到 我的应用数据/bypy 里面 在linux下,输入 bypy list 查看云端目录列表 bypy download 下载 然后就是漫长的等待... ### 运行 这是一个web脚本: ```cpp # Copyright (c) Meta Platforms, Inc. and affiliates. # This software may be used and distributed according to the terms of the GNU General Public License version 3. from typing import Tuple import os import sys import argparse import torch import time import json from pathlib import Path from typing import List from pydantic import BaseModel from fastapi import FastAPI import uvicorn import torch.distributed as dist from fairscale.nn.model_parallel.initialize import initialize_model_parallel from llama import ModelArgs, Transformer, Tokenizer, LLaMA parser = argparse.ArgumentParser() parser.add_argument('--ckpt_dir', type=str, required=True) parser.add_argument('--tokenizer_path', type=str, required=True) parser.add_argument('--max_seq_len', type=int, default=700) parser.add_argument('--max_batch_size', type=int, default=16) app = FastAPI() def setup_model_parallel() -> Tuple[int, int]: local_rank = int(os.environ.get("LOCAL_RANK", -1)) world_size = int(os.environ.get("WORLD_SIZE", -1)) dist.init_process_group("nccl") initialize_model_parallel(world_size) torch.cuda.set_device(local_rank) # seed must be the same in all processes torch.manual_seed(1) return local_rank, world_size def load( ckpt_dir: str, tokenizer_path: str, local_rank: int, world_size: int, max_seq_len: int, max_batch_size: int, ) -> LLaMA: start_time = time.time() checkpoints = sorted(Path(ckpt_dir).glob("*.pth")) assert world_size == len( checkpoints ), f"Loading a checkpoint for MP={len(checkpoints)} but world size is {world_size}" ckpt_path = checkpoints[local_rank] print("Loading") checkpoint = torch.load(ckpt_path, map_location="cpu") with open(Path(ckpt_dir) / "params.json", "r") as f: params = json.loads(f.read()) model_args: ModelArgs = ModelArgs( max_seq_len=max_seq_len, max_batch_size=max_batch_size, **params ) tokenizer = Tokenizer(model_path=tokenizer_path) model_args.vocab_size = tokenizer.n_words torch.set_default_tensor_type(torch.cuda.HalfTensor) model = Transformer(model_args) torch.set_default_tensor_type(torch.FloatTensor) model.load_state_dict(checkpoint, strict=False) generator = LLaMA(model, tokenizer) print(f"Loaded in {time.time() - start_time:.2f} seconds") return generator def init_generator( ckpt_dir: str, tokenizer_path: str, max_seq_len: int = 700, max_batch_size: int = 16, ): local_rank, world_size = setup_model_parallel() if local_rank > 0: sys.stdout = open(os.devnull, "w") generator = load( ckpt_dir, tokenizer_path, local_rank, world_size, max_seq_len, max_batch_size ) return generator if __name__ == "__main__": args = parser.parse_args() generator = init_generator( args.ckpt_dir, args.tokenizer_path, args.max_seq_len, args.max_batch_size, ) class Config(BaseModel): prompts: List[str]="Have a nice day" max_gen_len: int=700 temperature: float = 0.8 top_p: float = 0.95 if dist.get_rank() == 0: @app.post("/llama/") def generate(config: Config): if len(config.prompts) > args.max_batch_size: return { 'error': 'too much prompts.' } dist.broadcast_object_list([config.prompts, config.max_gen_len, config.temperature, config.top_p]) results = generator.generate( config.prompts, max_gen_len=config.max_gen_len, temperature=config.temperature, top_p=config.top_p ) return {"responses": results} uvicorn.run(app, host="127.0.0.1", port=8042) else: while True: config = [None] * 4 try: dist.broadcast_object_list(config) generator.generate( config[0], max_gen_len=config[1], temperature=config[2], top_p=config[3] ) except: pass ``` 放到目录后, ```cpp torchrun --nproc_per_node 4 XXXXX.py --ckpt_dir /root/Public/model/ --tokenizer_path /root/Public/model/tokenizer.model ``` 就能愉快的跑起来了 对话写了个简单py脚本: ```cpp def send_to_ai(text): text = "" + text # 发post请求,类型是json,内容是text: hello! response = requests.post(server_url, json={"prompts": [text]}, proxies={ 'https': '127.0.0.1:7890', 'http': '127.0.0.1:7890'}, headers={'Content-Type': 'application/json', "ngrok-skip-browser-warning": '1'}) if 'error' in response.json(): print(response.json()['error']) return response.json()['responses'][0] ```  不太聪明的样子 试一下自定义角色,主机日志: ```cpp def ai_check_edr_log(log): # + "\nlog_level:" prompt = ''' You are a host security detection system,check EDR logs contain malicious actions: 1) {隐藏} level: malicious 2) {隐藏} level: clean\n ''' log = prompt + "3) " + log + "\nlevel:" return send_to_ai(log) ... print(ai_check_edr_log(detect_log )) ```  看样子应该有部分效果,但是可能由于没有强化学习的情况,他会在后面生成一堆垃圾文本,无法通过角色扮演消除 (比如给你后面生成一堆垃圾代码)  试了一下webshell的,AI看到php就直接高潮了:  再次指定角色的情况下:  告诉他更精准一点:  除了生成垃圾信息都还好 总结: 有基本效果,但是需要强化学习,才能变为强大的AI 强化学习的link: https://github.com/juncongmoo/chatllama 但是没资源,实在是跑不起来了 本文由 huoji 创作,采用 知识共享署名 3.0,可自由转载、引用,但需署名作者且注明文章出处。 点赞 153
上面那个点赞按钮能刷赞
1