午夜影院免费看,天天操操操操操操,亚洲国产成人久久综合一

在過去的幾年里，世代人工智能吸引了公眾的注意力和想象力。從給定的自然語言提示，這些生成模型能夠生成人類質(zhì)量的結(jié)果，從清晰表達(dá)的兒童故事到產(chǎn)品原型可視化。

大型語言模型（ LLM ）是這場(chǎng)革命的中心。 LLM 是一種通用的語言理解器，它將人類知識(shí)編纂成法典，可以很容易地應(yīng)用于許多自然語言和編程語言理解任務(wù)，開箱即用。其中包括摘要、翻譯、問題回答以及代碼注釋和完成。

單個(gè)基礎(chǔ)語言模型完成許多任務(wù)的能力開辟了一個(gè)全新的人工智能軟件范式，其中單個(gè)基礎(chǔ)模型可以用于滿足公司所有部門的多個(gè)下游語言任務(wù)。這簡化并降低了人工智能軟件開發(fā)、部署和維護(hù)的成本。

創(chuàng)建自定義大型語言模型簡介

盡管 LLM 強(qiáng)大且前景光明，但通過針對(duì)特定用例的零樣本或少量快照學(xué)習(xí)，與 LLM 現(xiàn)成的性能仍存在差距。特別是，零樣本學(xué)習(xí)性能往往很低且不可靠。另一方面，很少有鏡頭學(xué)習(xí)依賴于找到最佳的離散提示，這是一個(gè)不平凡的過程。

如 GPT Understands, Too 中所述，用于解決下游問題的提示模板的微小變化可能會(huì)對(duì)最終精度產(chǎn)生重大影響。此外，由于提示更大，少鏡頭推理的成本也更高。

已經(jīng)提出了參數(shù)有效的微調(diào)技術(shù)來解決這個(gè)問題。即時(shí)學(xué)習(xí)就是這樣一種技術(shù)，它將虛擬提示令牌附加到請(qǐng)求中。這些虛擬令牌是可學(xué)習(xí)的參數(shù)，可以使用標(biāo)準(zhǔn)優(yōu)化方法進(jìn)行優(yōu)化，而 LLM 參數(shù)是凍結(jié)的。

本文介紹了使用 NVIDIA NeMo 定制 LLM 的過程，這是一個(gè)用于訓(xùn)練、定制和部署基礎(chǔ)模型的通用框架。

什么是 NVIDIA NeMo ？

NVIDIA NeMo 是用于訓(xùn)練、定制和部署大型基礎(chǔ)模型的通用框架。 NeMo 利用各種并行技術(shù)來加速訓(xùn)練和推理，可以部署在用戶首選云、本地和邊緣系統(tǒng)上的多節(jié)點(diǎn)、多 GPU 系統(tǒng)上。要了解更多信息，請(qǐng)參閱 NVIDIA AI Platform Delivers Big Gains for Large Language Models 和 Accelerated Inference for Large Transformer Models Using NVIDIA Triton Inference Server 。

NeMo 生態(tài)系統(tǒng)由以下主要組成部分組成：

NVIDIA NeMo service ：通過 NVIDIA 管理的服務(wù)，為 LLM 的產(chǎn)品化提供快速途徑。開發(fā)人員可以利用 LLM 功能快速輕松地開發(fā)企業(yè)人工智能應(yīng)用程序，而無需擔(dān)心底層基礎(chǔ)設(shè)施。您還可以通過云 API 或網(wǎng)絡(luò)游樂場(chǎng)界面體驗(yàn)最大的語言模型之一 Megatron 530B 。目前處于早期訪問狀態(tài)。

NVIDIA NeMo framework ：一個(gè)端到端的容器化框架，允許開發(fā)人員高效地訓(xùn)練和部署具有數(shù)十億和數(shù)萬億參數(shù)的語言模型，在數(shù)千 GPU 秒內(nèi)提供高訓(xùn)練效率。 NeMo 框架容器目前位于 open beta 中，可通過 NGC 獲得。

NVIDIA/NeMo ：為研究語音人工智能和 NLP （包括 LLM ）的研究人員構(gòu)建的開源對(duì)話式人工智能工具包。可通過 GitHub 獲得。

NeMo 模型： NVIDIA 最近開放了源代碼的預(yù)訓(xùn)練 NeMo 框架模型，從 1.3B GPT-3 、 5B GPT-3 和 3B mT5 model 等小型模型到 20B GPT-3 等大型模型。

NVIDIA/FasterTransformer ：一個(gè)開源工具包，用于通過 GitHub 進(jìn)行 LLM 的高性能推理。要了解有關(guān)如何使用 Faster transformer 部署公共 NeMo 框架模型的更多信息，請(qǐng)參閱 Deploying a 1.3B GPT-3 Model with NVIDIA NeMo Megatron 。

這篇文章解釋了如何使用 NeMo 框架容器通過即時(shí)學(xué)習(xí)技術(shù)自定義公共 NeMo 模型。

使用 NeMo 快速學(xué)習(xí)

Prompt learning 統(tǒng)稱為兩參數(shù)高效微調(diào)技術(shù)，如下所述。有關(guān)更多信息，請(qǐng)參閱 Adapting P-Tuning to Solve Non-English Downstream Tasks 。

在提示調(diào)諧中，軟提示嵌入被初始化為 2D 矩陣。每個(gè)任務(wù)都有自己的 2D 嵌入矩陣。任務(wù)在訓(xùn)練或推理過程中不共享任何參數(shù)。所有 LLM 參數(shù)都被凍結(jié)，并且在訓(xùn)練期間僅更新每個(gè)任務(wù)的嵌入?yún)?shù)。 NeMo 提示調(diào)諧實(shí)現(xiàn)基于 The Power of Scale for Parameter-Efficient Prompt Tuning 。

在 p 調(diào)諧中， LSTM 模型或“提示編碼器”用于預(yù)測(cè)虛擬令牌嵌入。 LSTM 參數(shù)在 p 調(diào)諧開始時(shí)被隨機(jī)初始化。所有 LLM 參數(shù)都被凍結(jié)，并且在每個(gè)訓(xùn)練步驟僅更新 LSTM 權(quán)重。 LSTM 參數(shù)在同時(shí) p 調(diào)諧的所有任務(wù)之間共享，但 LSTM 模型為每個(gè)任務(wù)輸出唯一的虛擬令牌嵌入。 NeMo p 調(diào)諧實(shí)現(xiàn)基于 GPT Understands, Too 。

本例的即時(shí)學(xué)習(xí)使用 NeMo 生態(tài)系統(tǒng)的兩個(gè)開源組件： NeMo OSS 工具包和公共 NeMo 模型。

GitHub 上的 NeMo Multitask Prompt and P-Tuning 教程詳細(xì)介紹了在小型 GPT-3 345M 參數(shù)模型上進(jìn)行提示學(xué)習(xí)的過程。本教程演示了即時(shí)學(xué)習(xí)的端到端過程：下載和預(yù)處理數(shù)據(jù)、下載模型、訓(xùn)練即時(shí)學(xué)習(xí)模型，以及在三個(gè)不同的應(yīng)用程序上進(jìn)行推理。

下面的部分首先瀏覽筆記本，同時(shí)總結(jié)主要概念。然后，這個(gè)筆記本將被擴(kuò)展到對(duì)更大的 NeMo 模型進(jìn)行即時(shí)學(xué)習(xí)。

先決條件

您可以通過 NeMo Docker 容器體驗(yàn) NeMo 。這為 NeMo 的實(shí)驗(yàn)提供了一個(gè)自給自足和可再生的環(huán)境。 NeMo Multitask Prompt and P-Tuning 教程使用 NeMo 22.09 容器進(jìn)行了測(cè)試，但您可以嘗試相同容器的后續(xù)版本。使用以下腳本下載并運(yùn)行此容器：

docker run  -u $(id -u ${USER}):$(id -g ${USER}) --rm -it --net=host nvcr.io/nvidia/nemo:22.09 bash

然后從容器交互式 bash 環(huán)境中啟動(dòng) Jupyter 實(shí)驗(yàn)室：

cd /workspace
jupyter lab --ip 0.0.0.0 --allow-root --port=8888

在 Jupyter 實(shí)驗(yàn)室，您可以在/ workspace / NeMo / tutorial / nlp / Multitask _ Pompt _ and _ PTuning.ipynb 下找到 NeMo 示例，包括上述筆記本。

此外，您需要一個(gè) GPU 來處理較小的 5B 和 1.3B GPT-3 模型，需要四個(gè) NVIDIA Ampere architecture 或 NVIDIA Hopper architecture GPU 用于處理 20B 模型，因?yàn)樗哂兴膫€(gè)張量平行度（ TP ）。

數(shù)據(jù)準(zhǔn)備

筆記本將引導(dǎo)您完成三種不同應(yīng)用程序的數(shù)據(jù)收集和預(yù)處理過程： Financial PhraseBank dataset 用于情緒分析任務(wù)， SQuAD dataset 用于問答任務(wù)， Assistant Benchmarking dataset 用于意圖和時(shí)段分類任務(wù)。

數(shù)據(jù)集應(yīng)為. jsonl 格式，其中包含一組 JSON 對(duì)象。每個(gè) JSON 對(duì)象必須包括字段任務(wù)名稱，這是數(shù)據(jù)示例所對(duì)應(yīng)任務(wù)的字符串標(biāo)識(shí)符。每個(gè) JSON 對(duì)象還應(yīng)包括一個(gè)或多個(gè)字段，這些字段對(duì)應(yīng)于離散文本提示的不同部分。示例見圖 1 。

圖 1 。 NeMo 即時(shí)學(xué)習(xí)的數(shù)據(jù)集格式

提示模板

在形成提示時(shí)，您應(yīng)該確定并遵守一個(gè)模式。這種模式被稱為 prompt template ，并根據(jù)使用情況而變化。情緒分析的示例如下所示。

{
        "taskname": "sentiment",
        "prompt_template": "<|VIRTUAL_PROMPT_0|> {sentence} sentiment:{label}",
        "total_virtual_tokens": 10,
        "virtual_token_splits": [10],
        "truncate_field": None,
        "answer_only_loss": True,
        "answer_field": "label",
    }

提示包含開頭的所有 10 個(gè)虛擬標(biāo)記，然后是要分類的目標(biāo)句子。接下來是一個(gè)文本標(biāo)記（“sentiment:”），最后是用于訓(xùn)練的句子的標(biāo)簽。訓(xùn)練數(shù)據(jù) JSON 對(duì)象中的相應(yīng)字段將映射到此提示模板，以形成完整的訓(xùn)練示例。 NeMo 支持修剪特定字段以滿足模型令牌長度限制（使用 HuggingFace GPT-2 令牌化器的 NeMo 公共模型通常為 2048 個(gè)令牌）。

訓(xùn)練

默認(rèn)的 NeMo 提示調(diào)優(yōu)配置在 yaml 文件中提供，可通過 GitHub 上的 NVIDIA/NeMo 獲得。筆記本加載這個(gè) yaml 文件，然后覆蓋訓(xùn)練選項(xiàng)以適應(yīng) 345M GPT 模型。 NeMo p 調(diào)諧使得能夠同時(shí)學(xué)習(xí)多個(gè)任務(wù)。 NeMo 利用 PyTorch Lightning 接口，因此只需調(diào)用trainer.fit(model)語句即可完成訓(xùn)練。

推論

最后，一旦經(jīng)過訓(xùn)練，模型就可以通過調(diào)用model.generate(inputs=test_examples)語句來用于對(duì)新樣本的推理（省略“answer_field”）。

快速學(xué)習(xí)大型模型

筆記本電腦中演示的 345M GPT-3 模型過程可以應(yīng)用于更大的公共 NeMo GPT-3 型號(hào)，最多 1.3B GPT-3 和 5B GPT-3 。這種尺寸的型號(hào)只需要一個(gè)足夠內(nèi)存容量的 GPU ，例如 NVIDIA V100 、 NVIDIA A100 和 NVIDIA H100 。下載模型后，替換模型名稱；特別是在以下單元格中：

# Download the model from NGC
gpt_file_name = "megatron_gpt_345m.nemo"
!wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/nemo/megatron_gpt_345m/versions/1/files/megatron_gpt_345m.nemo -

不要從 NGC 下載 345M GPT 模型，而是按照 HuggingFace 上的說明下載 1.3B GPT-3 或 5B GPT-3 模型，然后將gpt_file_name變量指向。 NeMo 模型文件。

請(qǐng)注意，對(duì)于 5B 型號(hào)，有兩種變體，一種是 TP 度為 1 （ nemo_gpt5B_fp16_tp1.nemo ），另一種是 TP = 2 （ nemo_gpt5B_fp16_tp2.nemo, nemo_gpt5B_bf16_tp2.nemo ) ）。筆記本電腦只能支持 TP = 1 變體。在其他一切不變的情況下，您可以端到端執(zhí)行同一筆記本電腦。

多 – GPU 即時(shí)學(xué)習(xí)

由于 Jupyter 筆記本環(huán)境的限制，即時(shí)學(xué)習(xí)筆記本僅支持單次 – GPU 訓(xùn)練。針對(duì)更大的模型利用多 GPU 訓(xùn)練，具有更高程度的 TP （例如 20B GPT-3 為 4 ， 5B GPT-3 為其他變體為 2 ）需要使用不同的 NeMo prompt learning script 。此腳本受 config文件在這里可以找到許多參數(shù)的默認(rèn)值。

模型

本節(jié)演示了在作為提示學(xué)習(xí)筆記本一部分下載并預(yù)處理的輔助數(shù)據(jù)集上使用多個(gè) GPU 對(duì)大型模型進(jìn)行提示學(xué)習(xí)的過程。

您可以下載 TP = 2 的 5B GPT 型號(hào)（ nemo_gpt5B_fp16_tp2.nemo) ）或 TP = 4 的 20B GPT-3 型號(hào)。請(qǐng)注意，這些模型存儲(chǔ)在中。 NeMo 壓縮存檔。要大幅加快模型加載速度，請(qǐng)?zhí)崆敖鈮嚎s模型，并在 NeMo 配置中使用此解壓縮的文件夾。使用以下腳本：

tar -xvf nemo_gpt5B_fp16_tp2.nemo -C nemo_gpt5B_fp16_tp2.nemo.extracted

然后使用nemo_gpt5B_fp16_tp2.nemo.extracted NeMo 中提取的目錄nemo_gpt5B_fp16_tp2.nemo.extracted。

配置

適用于輔助數(shù)據(jù)集（意圖和插槽檢測(cè)應(yīng)用程序）的配置文件如下所示：

name: megatron_virtual_prompt_gpt

trainer:
  devices: 2
  accelerator: gpu
  num_nodes: 1
  precision: 16
  logger: False # logger provided by exp_manager
  enable_checkpointing: False
  replace_sampler_ddp: False
  max_epochs: 25 # min 25 recommended
  max_steps: -1 # consumed_samples = global_step * micro_batch_size * data_parallel_size * accumulate_grad_batches
  log_every_n_steps: 10 # frequency with which training steps are logged 
  val_check_interval: 1.0 # If is an int n > 1, will run val every n training steps, if a float 0.0 - 1.0 will run val every epoch fraction, e.g. 0.25 will run val every quarter epoch
  gradient_clip_val: 1.0
  resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
  benchmark: False


exp_manager:
  explicit_log_dir: null
  exp_dir: null
  name: ${name}
  create_wandb_logger: False
  wandb_logger_kwargs:
    project: null
    name: null
  resume_if_exists: True
  resume_ignore_no_checkpoint: True
  create_checkpoint_callback: True
  checkpoint_callback_params:
    monitor: val_loss
    save_top_k: 2
    mode: min
    save_nemo_on_train_end: False # Should be false, correct prompt learning model file is saved at model.nemo_path set below, 
    filename: 'megatron_gpt_prompt_tune--{val_loss:.3f}-{step}'
    model_parallel_size: ${model.tensor_model_parallel_size}
    save_best_model: True

model:
  seed: 1234
  nemo_path: ${name}.nemo # .nemo filename/absolute path to where the virtual prompt model parameters will be saved
  virtual_prompt_style: 'p-tuning' # one of 'prompt-tuning', 'p-tuning', or 'inference'
  tensor_model_parallel_size: 1 # intra-layer model parallelism
  pipeline_model_parallel_size: 1 # inter-layer model parallelism
  global_batch_size: 8
  micro_batch_size: 4

  restore_path: null # Path to an existing p-tuned/prompt tuned .nemo model you wish to add new tasks to or run inference with
  language_model_path: ??? # Path to the GPT language model .nemo file, always required
  save_nemo_on_validation_end: True # Saves an inference ready .nemo file every time a checkpoint is saved during training. 
  existing_tasks: [] # List of tasks the model has already been p-tuned/prompt-tuned for, needed when a restore path is given
  new_tasks: ['intent_and_slot'] # List of new tasknames to be prompt-tuned
  


  ## Sequence Parallelism
  # Makes tensor parallelism more memory efficient for LLMs (20B+) by parallelizing layer norms and dropout sequentially
  # See Reducing Activation Recomputation in Large Transformer Models: https://arxiv.org/abs/2205.05198 for more details.
  sequence_parallel: False

  ## Activation Checkpoint 
  activations_checkpoint_granularity: null # 'selective' or 'full' 
  activations_checkpoint_method: null # 'uniform', 'block', not used with 'selective'
  # 'uniform' divides the total number of transformer layers and checkpoints the input activation
  # of each chunk at the specified granularity
  # 'block' checkpoints the specified number of layers per pipeline stage at the specified granularity
  activations_checkpoint_num_layers: null # not used with 'selective'

  task_templates: # Add more/replace tasks as needed, these are just examples
  - taskname: "intent_and_slot"    
    prompt_template: "<|VIRTUAL_PROMPT_0|>Predict intent and slot: {utterance} nLabel:{label}"
    total_virtual_tokens: 10
    virtual_token_splits: [10]
    truncate_field: null
    answer_only_loss: False
    "answer_field": "label"


  prompt_tuning: # Prompt tunin specific params
    new_prompt_init_methods: ['text'] # List of 'text' or 'random', should correspond to tasks listed in new tasks
    new_prompt_init_text: ['some init text goes here'] # some init text if init method is text, or None if init method is random

  p_tuning: # P-tuning specific params
    encoder_type: "tpmlp" # ['tpmlp', 'lstm', 'biglstm', 'mlp'] 
    dropout: 0.0
    num_layers: 2  # number of layers for MLP or LSTM layers. Note, it has no effect for tpmlp currently as it always assumes it is two layers.
    encoder_hidden: 2048 # encoder hidden for biglstm and tpmlp
    init_std: 0.023  # init std for tpmlp layers

  data:
    train_ds: ???
    validation_ds: ???
    add_eos: True
    shuffle: True
    num_workers: 8
    pin_memory: True
    train_cache_data_path: null  # the path to the train cache data 
    validation_cache_data_path: null  # the path to the validation cache data 
    test_cache_data_path: null  # the path to the test cache data 
    load_cache: False  # whether to load from the cache data


  optim:
    name: fused_adam
    lr: 1e-4
    weight_decay: 0.01 
    betas: 
    - 0.9
    - 0.98
    sched:
      name: CosineAnnealing
      warmup_steps: 50
      min_lr: 0.0 # min_lr must be 0.0 for prompt learning when pipeline parallel > 1
      constant_steps: 0 # Constant steps should also be 0 when min_lr=0
      monitor: val_loss
      reduce_on_plateau: false

得益于 yaml 文本格式和注釋，大多數(shù)超參數(shù)都是不言自明的。使用 Jupyter 實(shí)驗(yàn)室界面，創(chuàng)建一個(gè)包含此內(nèi)容的文件，并將其保存在/workspace/nemo/examples/nlp/language_modeling/conf/megatron_gpt_prompt_learning_intent_n_slot.yaml下。

config文件中最重要的是如下所示的提示模板：

 prompt_template: "<|VIRTUAL_PROMPT_0|>Predict intent and slot: {utterance} nLabel:{label}"
    total_virtual_tokens: 10
    virtual_token_splits: [10]
    truncate_field: null

這里， 10 個(gè)虛擬提示令牌與一些永久文本標(biāo)記一起使用。

訓(xùn)練

要開始培訓(xùn)，請(qǐng)?jiān)?Jupyter 實(shí)驗(yàn)室界面中打開一個(gè)終端窗口（文件→ 新建→ 終端）。然后發(fā)出 bash 命令：

python /workspace/nemo/examples/nlp/language_modeling/megatron_gpt_prompt_learning.py 
    	--config-name=megatron_gpt_prompt_learning_intent_n_slot.yaml 
    	trainer.devices=2 
    	trainer.num_nodes=1 
    	trainer.max_epochs=25 
    	trainer.precision=bf16 
    	model.language_model_path=/workspace/nemo/tutorials/nlp/nemo-megatron-gpt-5B/nemo_gpt5B_fp16_tp2.nemo.extracted 
    	model.nemo_path=/workspace/nemo/examples/nlp/language_modeling/intent_n_slot.nemo 
    	model.tensor_model_parallel_size=2 
    	model.pipeline_model_parallel_size=1 
    	model.global_batch_size=16 
    	model.micro_batch_size=1 
    	model.optim.lr=1e-4 
    	model.data.train_ds=[/workspace/nemo/tutorials/nlp/data/assistant/assistant_train.jsonl] 
    	model.data.validation_ds=[/workspace/nemo/tutorials/nlp/data/assistant/assistant_val.jsonl]

請(qǐng)注意以下內(nèi)容：

對(duì)于 5B GPT 模型（ nemo_gpt5B_fp16_tp2.nemo) ），model.tensor_model_parallel_size應(yīng)設(shè)置為 2 ，對(duì)于 20B GPT-3 模型，應(yīng)設(shè)置為 4

trainer.devices應(yīng)設(shè)置為 TP 值的倍數(shù)。如果 5B 模型為 4 ，則將有兩個(gè)數(shù)據(jù)并行工作者，每個(gè)工作者有兩個(gè) GPU

model.language_model_path應(yīng)設(shè)置為模型提取目錄的絕對(duì)路徑

model.data.train_ds、model.data.validation_ds應(yīng)設(shè)置為列車位置和驗(yàn)證數(shù)據(jù)

推論

最后，經(jīng)過訓(xùn)練后，使用以下腳本在 NeMo 中進(jìn)行推理：

python /workspace/nemo/examples/nlp/language_modeling/megatron_gpt_prompt_learning_eval.py 
            virtual_prompt_model_file=/workspace/nemo/examples/nlp/language_modeling/intent_n_slot.nemo 
            gpt_model_file=/workspace/nemo/tutorials/nlp/nemo-megatron-gpt-5B/nemo_gpt5B_fp16_tp2.nemo.extracted  
            inference.greedy=True 
            inference.add_BOS=False 
            inference.tokens_to_generate=128 
            trainer.devices=2 
            trainer.num_nodes=1 
            tensor_model_parallel_size=2 
            pipeline_model_parallel_size=1 
            data_paths=["/workspace/nemo/tutorials/nlp/data/assistant/assistant_test.jsonl"] 
            pred_file_path="test-results.txt"

請(qǐng)注意以下內(nèi)容：

對(duì)于 5B GPT 模型（ nemo_gpt5B_fp16_tp2.nemo) ），model.tensor_model_parallel_size應(yīng)設(shè)置為 2 ，對(duì)于 20B GPT-3 模型，應(yīng)設(shè)置為 4

trainer.devices應(yīng)設(shè)置為等于 TP 值（如上）

pred_file_path是記錄測(cè)試結(jié)果的文件，每個(gè)測(cè)試樣本一行

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點(diǎn)僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場(chǎng)。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請(qǐng)聯(lián)系本站處理。舉報(bào)投訴