Commit Graph

23 Commits

Author SHA1 Message Date
George Hotz e79a11b99c hotfix: revert llama change 2024-04-10 20:13:15 -07:00
George Hotz 2e6c39b0b2
Do less realizes (#4141)
* less realize

* corealize jit inputs

* prints

* print before we run
2024-04-10 19:50:50 -07:00
chenyu f8dc82a8a7
use single tensor for llama kv chache (#4108)
similar to optimization in gpt2
2024-04-08 00:38:32 -04:00
chenyu 92c0675ccf
setitem initial support (#4093)
* wip setitem

it's an eager assign to output shapetracker view

* cleanups and tests

* more cleanups
2024-04-07 20:35:22 -04:00
George Hotz 4c4d3cb3e3
restrict assignment to base (#3809)
* restrict assignment to base

* add some restrictions there

* more restrictions
2024-03-18 15:33:06 -07:00
chenyu 5ac1fa933f
apply the same fix_bf16 in llama and coder (#3789)
* apply the same fix_bf16 in llama and coder

did not realize the same logic was in llama too.
really fix #2775

* flag for native SUPPORT_BF16 cast
2024-03-17 21:25:24 -04:00
George Hotz 641f347232
simple LoadOps.ASSIGN (#3745)
* simple LoadOps.ASSIGN

* skip that test

* don't assign in onnx ops gemm

* track cache usage

* recreate the lazybuffer to avoid the cache

* fix contigs

* skip that test

* lol

* better letters
2024-03-14 20:44:34 -07:00
George Hotz a72b1b6d65
sharding for llama (#3151)
* shard llama

* sharding works

* simpler

* simpler

* consume option

* disable that test

* save a line

---------

Co-authored-by: George Hotz <george@tinygrad.org>
2024-01-16 19:28:00 -08:00
Yixiang Gao 13e872b53f
add mutigpu support for llama attention (#3064)
* add llama attention test for multigpu

* test fails

* kv cache trying to shrink on sharded axis

* mask None works for scale dot product

* kv cache seems to be working but scale dot product breaks

* scaled dot product works, but the last linear layer failed

* running into the reshape case where it could be wrong for multigpu

* making sure it was the reshape

* adding contiguous doesn't solve

* need to shard more properly

* remove reshape test

* minor adjustment to scale dot product attention test

* weights are sharded wrong

* continue fix new weight sharding

* clean up

* fix attention when start_pos is 0

* remove print

* add TODOs for the best mutigpu interface
2024-01-11 16:31:02 -08:00
chenyu c9371f0d31
hotfix llama conversation mode (#3031)
without contiguous on keys and values, it runs but the update is incorrect
2024-01-06 16:57:07 -05:00
chenyu f88506e630
move gpt2/llama sampling inside the model call (#3013)
* move gpt2/llama sampling inside the model call

* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
chenyu ad4472e6e8
cleanup llama apply_rotary_emb and other helpers (#2950)
* cleanup llama apply_rotary_emb and other helpers

used ellipsis and other higher level tensor function.
disabled the half @ half -> half tensor core as it fails uop dtype checks

* keep hip 8x8->8 wmma
2023-12-29 11:39:15 -05:00
chenyu 61e255d197
use max for gpt2 and llama (#2949)
not using argmax yet because there's a multinomial outside of function.
2023-12-28 23:26:00 -05:00
chenyu 1fb815e77e
hotfix fix coder. RMSNorm cannot have float16 input (#2932)
* hotfix fix coder. RMSNorm cannot have float16 input

* update real world test due to new kernels

* more type casts
2023-12-25 02:28:11 -05:00
chenyu b55b55d56e
use at least int32 and uint32 for sum output (#2926)
* use at least int32 and uint32 for sum output

* use the correct type for acc

* fix opencl

* llvm mulacc
2023-12-24 01:14:54 -05:00
George Hotz 64dded27f0
pad ops broke coder (#2881)
* pad ops broke coder

* that contiguous fixes it

* Update lazy.py
2023-12-20 17:03:41 -08:00
George Hotz 1765849937
new lazy, benchmark (#2878)
* lazy rewrite, try 2

* min fix tests

* pass contig test

* put broken pads back

* move that to realize

* no contig child fixes array packing

* so wrong

* now that's correct

* base children

* fix bind issues

* disable to_image_idx

* fix tests

* that failure shouldn't break other tests

* more fixes

* fix torch

* skip failing tests in CI

* 1e-7

* half is broken

* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
chenyu c0f76ed4ea
transformer kvcache and mask have same dtype as input (#2771)
* transformer kvcache and mask have same dtype as input

* don't use `=0` in cstyle ternary where

* (bool)

* where float16 test
2023-12-14 22:41:51 -05:00
George Hotz b3982187d1
Mixtral Example (#2691)
* mixtral

* simpler

* global counters

* simpler

* weights arg
2023-12-10 17:18:31 -08:00
chenyu 539b00a645
move llama getenv("JIT") from models to examples (#2671)
Transformer class has a jit param so we should use that in the caller
2023-12-07 12:43:22 -05:00
chenyu 6ba6349c97
JIT=0 llama.py should not jit (#2609) 2023-12-04 20:21:07 -05:00
Davi Silva ddeec24fa8
Cleanup & fix llama.py (#2524)
* docs, cleanup crap

* comma AI

* fix 70B

* this is why lexical scope exists
2023-11-30 16:00:17 -05:00
George Hotz 7170a9a057
coder.py can write and run code (#2439)
* wip mistral

* coder

* touchups

* cleanups

* mistral cleanups

* clean up cache create

* download the weights, fix tests

* fix llama loading

* global fixup

* clean up all

* move llama model

* cleanups

* Revert "cleanups"

This reverts commit a71c5d59eb86290634a258704d8bab2378b8d63d.

* fine, leave it
2023-11-25 12:27:54 -08:00