gpu º´·Ä ó¸® ½Ã º´¸ñ Çö»ó ¹®ÀÇ µå¸³´Ï´Ù.

   Á¶È¸ 1717   Ãßõ 0    

딥러능 GPU 관/144; 문1032; 사항1060; 1080;Ǻ12; 메1068; 보내 드립니다.


  • 1109;비명 : DL380a Gen11
  • OS : Ubuntu 22.04
  • Python 3.11.9


드라1060;브 버1204;1008; 535.129.03 / CUDA Version : 12.2 버1204;에서 CUDA Tool kit1012; 1060;용Ȣ16;여 업데1060;트 후

  • NVIDIA Drive version 555.42.02
  • CUDA Version: 12.5

  • H100 80G * 2EA
  • Llama-3-8B처리시간 : 2.4

  • H100 80G * 1EA
  • Llama-3-8B 처리 시간 : 0.5초

테스트 코드lj16; 아래와 같습니다. 


from transformers import AutoTokenizer, AutoModelForCausalLM, TextIteratorStreamer, TextStreamer

import torch

from threading import Thread

import gradio as gr

import time

#import accelerate_speedup


torch.manual_seed(42)


model_id = "meta-llama/Meta-Llama-3-70B-Instruct"

#model_id = "meta-llama/Meta-Llama-3-8B-Instruct"


tokenizer = AutoTokenizer.from_pretrained(model_id)

#max_memory_mapping = {0: "80GB", 1: "80GB"}

model = AutoModelForCausalLM.from_pretrained(

    model_id,

    torch_dtype=torch.bfloat16,

    device_map="auto",

    #device_map="balanced_low_0",

    trust_remote_code=True,

    attn_implementation="flash_attention_2",

    low_cpu_mem_usage=True

    #max_memory=max_memory_mapping

).eval()


terminators = [

    tokenizer.eos_token_id,

&#160; &#160; tokenizer.convert_tokens_to_ids("<|eot_id|>")

]



### base inference

def chat(question):

&#160; &#160; messages = [

&#160; &#160; &#160; &#160; #{"role": "system", "content": "You are AI chatbot. You are honest, do not harm others, and help users."},

&#160; &#160; &#160; &#160; {"role": "system", "content": "Please try to provide useful, helpful answers."},

&#160; &#160; &#160; &#160; {"role": "user", "content": question},

&#160; &#160; ]

&#160; &#160;&#160;

&#160; &#160; input_ids = tokenizer.apply_chat_template(

&#160; &#160; &#160; &#160; messages,

&#160; &#160; &#160; &#160; add_generation_prompt=True,

&#160; &#160; &#160; &#160; return_tensors="pt"

&#160; &#160; ).to(model.device, non_blocking=True)

&#160; &#160;&#160;

&#160; &#160; outputs = model.generate(

&#160; &#160; &#160; &#160; input_ids,

&#160; &#160; &#160; &#160; max_new_tokens=1024,

&#160; &#160; &#160; &#160; eos_token_id=terminators,

&#160; &#160; &#160; &#160; do_sample=True,

&#160; &#160; &#160; &#160; temperature=0.05,

&#160; &#160; &#160; &#160; top_p=0.95,

&#160; &#160; )

&#160; &#160; response = outputs[0][input_ids.shape[-1]:]

&#160; &#160; #print(tokenizer.decode(response, skip_special_tokens=True))

&#160; &#160; return tokenizer.decode(response, skip_special_tokens=True)



response_times = []

for _ in range(100):

&#160; &#160; start_time = time.time()

&#160; &#160; #tmp = chat('hello.')

&#160; &#160; tmp = chat('hello!')

&#160; &#160; #tmp = chat('Testing. Please answer in 10,000 characters.')

&#160; &#160; end_time = time.time()

&#160; &#160; print((end_time - start_time))

&#160; &#160; response_times.append(end_time - start_time)


print(f"Average Response Time: {sum(response_times) / len(response_times):.2f} seconds")

ªÀº±Û Àϼö·Ï ½ÅÁßÇÏ°Ô.
ÀÏ´Ü DL380 Gen 11 System Diagram¸¦ È®ÀÎÇØ º¸¼¼¿ä.
ºñ»ó½ÄÀûÀÎ Delay ³×¿ä.
¸Þ¸ð¸® ¹®Á¦´Â ¾Æ´Ñ°Í °°Àºµ¥¿ä.
PCIe °¡ ¾î¶»°Ô ¹°·Á ÀÖ³ª¿ä?
PCIe 5.0 16¹è¼Ó ÀÌ ÃÖ´ë 64GB/S ±îÁö Áö¿øÀ» Çϴµ¥¿ä..== ÀÏ´ÜÀº ¿©±â¼­ º´¸ñ °°¿¡¿ä..
PCIe 5.0 x16¸¦ °¢ Ä«µå´ç 2°³¾¿ ÁÙ ¼ö ¾ø³ª¿ä??  ±×·¯¸é 128GB/s ±îÁö È®º¸°¡ µÇ´Âµ¥..
ÀÏ´Ü ´ë¾ÈÀº NVlink¸¦ ¼³Ä¡ÇÏ´Â ¼ö ¹Û¿¡´Â ¾ø¾î º¸ÀÔ´Ï´Ù.


#max_memory_mapping = {0: "80GB", 1: "80GB"} ÀÌ°É ÁÜ ÁÙ¿© º¸¼¼¿ä..
PCIe ´ë¿ªÆø¿¡ ¾î¿ï¸®µµ·Ï  {0: "30GB", 1: "30GB"} Á¤µµ·Î¿ä..
epowergate 06-04
ÇÁ·Î±×·¥ÀÌ ´õ ´À¸± ÀÌÀ¯´Â ¾ø´Âµ¥ ´õ ºü¸¦ ÀÌÀ¯µµ ¾ø½À´Ï´Ù.
¼Ò½º°¡ seqÀε¥ ´õ ºü¸¦¸® ¾øÁÒ
¹Ú¹®Çü 06-04
HPE ÂÊ ±â¼úÁö¿ø¿¡´Â ¹®ÀÇÇغ¸¼Ì´ÂÁö¿ä??
ikaros7 06-04
ÄÚµå´Â ¾ÈºÃ½À´Ï´Ù¸¸, Llama3 8Gó·³ gpu Çϳª¿¡ ´Ù ¿Ã¶ó°¡´Â ¸ðµ¨À̶ó¸é ±×³É Çϳª·Î µ¹¸®½Ã´Â°Ô ºü¸£Áö ¾Ê³ª¿ä?
¹ö½º Åë½Å ¿À¹öÇìµå°¡ »ó´çÇÒ °Çµ¥¿ä?
º¹¼ö °³ÀÇ gpu¸¦ ¾²´Â°Ç º¸Åë ÇнÀ½Ã Çϳª·Î´Â gpu ¸Þ¸ð¸®°¡ ºÎÁ·Çؼ­ ±×·²°Ì´Ï´Ù.
±×³ªÀú³ª H100 Á¤¸» ºü¸£±º¿ä. ¤¾¤¾


QnA
Á¦¸ñPage 47/440
07-18   1113   ¾Æ¸¶µ¥¿ì¾²
07-18   1258   ÄÚ»Ô»ç³É
07-18   1050   VSPress
07-18   864   ¾îÄg
07-18   787   ¹è¿ì·¯¿Ô½¿µÕ
07-17   956   ±ÙÀ°°èÀå
07-17   763   Ç×°ø¸ðÇÔ
07-17   774   ³É´õÄí
07-17   1089   ´ºÁø½º
07-17   1043   ³ª¶ó»ç¶û
07-17   1322   µÎÄí
07-17   959   Q8300
07-17   955   ½½·¯±×94
07-16   1138   ¹èÁø¿µ
07-16   1181   dietist
07-16   966   Mode
07-16   1285   ÇÚÁî
07-16   1150   ÇÚÁî
07-15   1179   ÇãÀα¸¸¶Æ¾
07-15   1197   ´ºÁø½º