777午夜福利理伦电影网,成人精品

>定義一個提示樣式，並帶有佔位符的問題和回答。這指導(dǎo)了模型的分步推理：

首頁

科技週邊

人工智慧

微調(diào)DeepSeek R1（推理模型）

Lisa Kudrow

Mar 01, 2025 am 09:08 AM

DeepSeek的開創(chuàng)性AI模型挑戰(zhàn)Openai的主導(dǎo)地位。這些先進(jìn)的推理模型是免費的，可以使獲得強(qiáng)大AI的訪問民主化。了解如何通過我們的視頻教程微調(diào)DeepSeek：

該教程微調(diào)使用擁抱臉部醫(yī)療鏈數(shù)據(jù)集使用DeepSeek-r1-Distill-lalama-8b型號。這種蒸餾型型號衍生自Llama 3.1 8b，提供了與原始DeepSeek-R1相當(dāng)?shù)耐评砟芰Α? LLM和微調(diào)的新手？考慮我們在Python課程中對LLM的介紹。

Fine-Tuning DeepSeek R1 (Reasoning Model)

>由作者圖像

介紹DeepSeek R1模型

> DeepSeek AI具有開源的DeepSeek-R1和DeepSeek-R1-Zero，在推理任務(wù)（數(shù)學(xué)，編碼，邏輯）中與Openai的O1媲美。探索我們綜合的DeepSeek R1指南以獲取詳細(xì)信息。

> deepSeek-r1-Zero

這個開創(chuàng)性的模型使用大規(guī)模增強(qiáng)學(xué)習(xí)（RL），繞過初始監(jiān)督微調(diào)（SFT）。在實現(xiàn)獨立的經(jīng)營鏈（COT）推理的同時，它提出了重複推理和可讀性問題等挑戰(zhàn)。

> deepSeek-r1

DeepSeek-R1解決DeepSeek-R1-Zero的局限性，在RL之前包含了冷啟動數(shù)據(jù)。這種多階段的訓(xùn)練可實現(xiàn)最先進(jìn)的性能，匹配OpenAI-O1，同時提高輸出清晰度。

DeepSeek Distillation

DeepSeek還提供蒸餾型，平衡功率和效率。這些較小的模型（1.5b至70b參數(shù)）保留了強(qiáng)有力的推理，DeepSeek-R1-Distill-Qwen-32b在基準(zhǔn)中超過OpenAI-O1-Mini。這突出了蒸餾過程的有效性。

來源：DeepSeek-ai/deepSeek-r1

>在我們的博客文章中了解更多有關(guān)DeepSeek-R1的功能，開發(fā)，蒸餾模型，訪問，定價和OpenAi O1比較的信息：“ DeepSeek-R1：功能，O1比較，蒸發(fā)模型及更多”。 Fine-Tuning DeepSeek R1 (Reasoning Model) >微調(diào)DeepSeek R1：實用指南

按照以下步驟微調(diào)您的DeepSeek R1型號：>

1。設(shè)置

>我們利用Kaggle的免費GPU訪問權(quán)限。創(chuàng)建一個Kaggle筆記本電腦，將您的擁抱臉和偏見令牌添加為秘密。安裝

python軟件包，以進(jìn)行更快，更具內(nèi)存效率的微調(diào)。有關(guān)詳細(xì)信息

<code>%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git</code>

>用擁抱的面部CLI和重量和偏見（WANDB）進(jìn)行身份驗證：

<code>from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()

hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN")
login(hf_token)

import wandb

wb_token = user_secrets.get_secret("wandb")

wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset', 
    job_type="training", 
    anonymous="allow"
)</code>

2。加載模型和令牌

>使用4位量化的DeepSeek-R1-Distill-Lalama-8b加載不塞版本，以進(jìn)行優(yōu)化的性能：

3。預(yù)先調(diào)節(jié)推理

<code>from unsloth import FastLanguageModel

max_seq_length = 2048 
dtype = None 
load_in_4bit = True


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token, 
)</code>

>定義一個提示樣式，並帶有佔位符的問題和回答。這指導(dǎo)了模型的分步推理：

用樣本醫(yī)學(xué)問題測試模型：

<code>prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>{}"""</think></code>

>觀察模型的預(yù)先調(diào)整推理，並通過微調(diào)來確定改進(jìn)的領(lǐng)域。

<code>question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"


FastLanguageModel.for_inference(model) 
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])</code>

4。加載和處理數(shù)據(jù)集

修改提示樣式，以包括一個複雜思想鏈的佔位符：>

創(chuàng)建一個函數(shù)以格式化數(shù)據(jù)集：

<code>train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>
{}
</think>
{}"""</code>

加載並處理數(shù)據(jù)集：

<code>EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }</code>

5。設(shè)置模型

<code>from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]</code>

使用lora配置模型：

設(shè)置教練：

<code>model = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  
    loftq_config=None,
)</code>

6。模型培訓(xùn)

<code>from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_,
    ),
)</code>

訓(xùn)練模型：

（注意：原始響應(yīng)包括訓(xùn)練進(jìn)度的圖像；此處省略了這些圖像，因為不可能進(jìn)行圖像複製。

7。郵政調(diào)節(jié)推理

<code>trainer_stats = trainer.train()</code>

通過與以前相同的問題查詢微調(diào)模型來比較結(jié)果。觀察推理和響應(yīng)簡潔性的改善。

（注意：原始響應(yīng)包括改進(jìn)的模型輸出；此處省略了這一點。

8。保存和推動模型

>在本地保存模型，然後將其推到擁抱的臉部集線器：>

（注意：原始響應(yīng)包括顯示成功的模型保存和推動的圖像；此處省略了這些。）

>

9。部署和結(jié)論

>教程結(jié)束時，建議使用Bentoml或本地轉(zhuǎn)換為GGEF格式提出部署選項。它強(qiáng)調(diào)了開源LLM的重要性，並強(qiáng)調(diào)了O3和操作員AI的OpenAI櫃檯。保留了這些資源的鏈接。

<code>new_model_local = "DeepSeek-R1-Medical-COT"
model.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)

model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

new_model_online = "kingabzpro/DeepSeek-R1-Medical-COT"
model.push_to_hub(new_model_online)
tokenizer.push_to_hub(new_model_online)

model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")</code>

>重寫的響應(yīng)在簡化結(jié)構(gòu)並刪除不必要的重複時維護(hù)核心信息。保留代碼塊以進(jìn)行完整。圖像被引用但不復(fù)制。

以上是微調(diào)DeepSeek R1（推理模型）的詳細(xì)內(nèi)容。更多資訊請關(guān)注PHP中文網(wǎng)其他相關(guān)文章！

本網(wǎng)站聲明

本文內(nèi)容由網(wǎng)友自願投稿，版權(quán)歸原作者所有。本站不承擔(dān)相應(yīng)的法律責(zé)任。如發(fā)現(xiàn)涉嫌抄襲或侵權(quán)的內(nèi)容，請聯(lián)絡(luò)admin@php.cn