chenyu
5ac1fa933f
apply the same fix_bf16 in llama and coder ( #3789 )
...
* apply the same fix_bf16 in llama and coder
did not realize the same logic was in llama too.
really fix #2775
* flag for native SUPPORT_BF16 cast
2024-03-17 21:25:24 -04:00
chenyu
639bd5dbfc
move bf16 cast hack to Tensor.llvm_bf16_cast ( #3788 )
2024-03-17 18:51:22 -04:00
chenyu
9255332d9e
use llvm as bridge to fix_bf16 loading ( #3774 )
...
This is how bf16 load is tested in test_bf16_disk_write_read now and it should fix #2775 .
I tested that it fixed loading coder using PYTHON backend.
Will separate this special bf16 load v.s. regular bf16 support
2024-03-16 15:22:19 -04:00
chenyu
f88506e630
move gpt2/llama sampling inside the model call ( #3013 )
...
* move gpt2/llama sampling inside the model call
* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
George Hotz
64dded27f0
pad ops broke coder ( #2881 )
...
* pad ops broke coder
* that contiguous fixes it
* Update lazy.py
2023-12-20 17:03:41 -08:00
George Hotz
0fd44259cd
bf16 fix + cleanups from mixtral ( #2698 )
...
* bf16 fix + cleanups from mixtral
* generic bf16 cast
2023-12-10 16:31:52 -08:00
George Hotz
9d7ead84e1
hotfix: no need for model cache in examples/coder.py
2023-12-05 16:27:36 -08:00
George Hotz
7170a9a057
coder.py can write and run code ( #2439 )
...
* wip mistral
* coder
* touchups
* cleanups
* mistral cleanups
* clean up cache create
* download the weights, fix tests
* fix llama loading
* global fixup
* clean up all
* move llama model
* cleanups
* Revert "cleanups"
This reverts commit a71c5d59eb86290634a258704d8bab2378b8d63d.
* fine, leave it
2023-11-25 12:27:54 -08:00