Tobias Fischer
f9e32f2bb2
clip device fix ( #6924 )
2024-10-07 00:47:32 +08:00
chenyu
01a2d7316d
dtype=float in bert log_softmax for loss and accuracy ( #6916 )
2024-10-06 11:15:56 -04:00
George Hotz
8ca506ee37
remove the magic methods for moving between devices [pr] ( #6881 )
...
* remove the magic methods for moving between devices [pr]
* remove unneeded clang
2024-10-04 20:27:52 +08:00
George Hotz
f4ec39fe58
switch symbolic from old to uops, final PR ( #6872 )
...
* switch symbolic from old to uops, final PR
* two wrong answers
* not needed resolves
* symbolic ops passes
* symbolic ops passes
* progress
* tests pass (almost)
* fix last test
* fix some tests
* global binding and unbinding
* Revert "global binding and unbinding"
This reverts commit 9456725630316487509980af20c6d2981de00bec.
* that test works now
* vars on uop doesn't recurse
* fix fuzzer
* update
* fix type
* fix gpt, it's UOp now
* ssimplify symbolics
2024-10-04 16:42:27 +08:00
chenyu
c3c93f332a
symbolic bool raise ValueError when not sure [pr] ( #6853 )
2024-10-02 09:10:58 -04:00
Tobias Fischer
33f7599158
Compute FID Score ( #6802 )
...
* compute fid score code
* cleaner s1 and m1 loading
2024-10-01 19:47:58 -04:00
chenyu
396c96357b
update mlperf bert scripts ( #6755 )
...
removed DISABLE_DROPOUT=1.
updated BS to 54 that works on tinyboxes with dropouts.
used bert's sparse_categorical_crossentropy that takes Tensor ignore_index in accuracy method
2024-09-25 23:55:05 -04:00
samm393
19c11792fd
Flux.1 ( #6334 )
...
* initial commit
* whitespace
* get rid of torch import
* indentation
* less hardcoding
* add flux.1-dev
* jit
* no double
* t5 tidy up
* validation image
* reuse sdxl autoencoder
* typing changes
* empty lines
* remove unneeded comments
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-24 10:08:04 +08:00
Tobias Fischer
c1bbd15bd9
Sharded SDXL Inference ( #6328 )
...
* initial sharding fixes
* sigma device fix
* emptyline space fix
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-21 01:26:43 -04:00
George Hotz
8f6d0485e7
hotfix: resnet to obj.device
2024-09-06 13:06:02 +08:00
George Hotz
9d72119a0c
minor resnet cleanups ( #6382 )
...
* minor resnet cleanups
* that should have been long
* jit
* meh
2024-09-06 12:50:21 +08:00
Tobias Fischer
3517aa89d9
sdxl batched inference fixes ( #6293 )
2024-08-28 07:44:58 -04:00
Tobias Fischer
211bfb6d8a
fixed batched clip computation ( #6292 )
2024-08-26 20:48:15 -04:00
Tobias Fischer
331b0f5477
new clip gather ( #6277 )
2024-08-25 19:27:24 -04:00
chenyu
e6c7c3e499
update pylint path to check indent/space for all ( #6022 )
...
also fixed many errors. it was not checking nested dirs. exclude autogen for now.
can we use ruff for this?
2024-08-10 14:41:09 -04:00
wozeparrot
d269bc95fa
faster tinychat ( #5993 )
2024-08-08 19:16:26 -07:00
George Hotz
bf8ec23b00
hotfix: contiguous on precompute_freqs_cis
2024-08-07 14:40:56 -07:00
David Hou
9a485f36e4
shard kvcache ( #5830 )
2024-07-30 20:29:54 -07:00
George Hotz
4e89d45513
hotfix: put contiguous back in llama
2024-07-30 18:43:48 -07:00
George Hotz
21c5e8e1b7
extreme llama speed, 57.34 tok/s ( #5827 )
...
* extreme llama speed
* mergable
2024-07-30 18:32:09 -07:00
Tobias Fischer
72da3fe7e6
added clip vision model ( #5595 )
...
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-19 18:35:51 -04:00
Tobias Fischer
85d4ca7caa
FID Inception Model ( #5516 )
...
* added model impl
* minor cleanups
* extracted weights loading into from_pretrained
* reorganized model for better weight loading
* removed lru cache for state dict loading
2024-07-16 23:12:03 -04:00
wozeparrot
fa873df9c1
bring tinychat more inline with tinyos' version ( #5358 )
2024-07-10 13:13:52 -07:00
Tobias Fischer
0c3a35e5c2
Stable Diffusion v2 Inference ( #5283 )
...
* model implementation
* clip fix, more qol options
2024-07-03 22:47:10 -04:00
chenyu
b2c3a28a5e
nn.RMSNorm ( #5272 )
...
the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize
2024-07-02 21:39:01 -04:00
Tobias Fischer
8c9c1cf62f
Pulled CLIP and UNet into Seperate Files ( #5253 )
...
* pulled clip and unet into seperate files
* reference cleanup, lru cache fix
* better pool indexing
2024-07-01 22:33:01 -04:00
George Hotz
14980f79dd
hotfix: unbreak llama
2024-06-30 15:27:54 -07:00
George Hotz
3df47bc21e
OpenELM + repeat_interleave ( #5234 )
...
* start writing openelm
* progress...hit bug
* repeat_interleave support
* gqa
* add rotary embedding
* spp
* i think it runs correctly
* broken
* output is good now
* cleanups
* no io_uring on android
2024-06-30 15:18:39 -07:00
reddyn12
f1c7944c44
Fix batchnorm shapes for resnet.load_pretrained ( #5167 )
...
* Fix batchnorm shapes
* make it general reshape
2024-06-26 18:44:10 -04:00
chenyu
e468601226
update llama attention casting ( #5096 )
...
* update llama attention casting
updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention.
* fix that
2024-06-22 10:57:17 -04:00
chenyu
8bd6cb9511
update llama model RMSNorm casting ( #5095 )
...
following the original implementation, cast back to input dtype before multiplying weight. slightly faster
https://github.com/meta-llama/llama/blob/main/llama/model.py
2024-06-21 23:02:04 -04:00
chenyu
e2c5054bdd
update resnet.load_from_pretrained ( #5040 )
2024-06-18 16:29:22 -04:00
chenyu
67e8df4969
remove numpy from dtype ( #4969 )
...
replaced all dtype.np with _to_np_dtype defined in tensor.py.
after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
2024-06-14 15:38:45 -04:00
Elias Wahl
d2e3c391e8
Residual in MLM loss + Change default steps ( #4935 )
...
* Residual in mlm loss
* Reduce default steps to 160K * 24
* oops
* comment
2024-06-12 16:09:18 -04:00
Elias Wahl
04e237328b
Refactor to class style ( #4804 )
2024-06-04 14:08:31 -07:00
chenyu
31358cbea5
change Tensor.stack to method ( #4719 )
2024-05-24 17:04:19 -04:00
chenyu
ae861325ce
update llama sample for mac 32 input buffer limit ( #4662 )
...
set default sampling params to function call to 0, and top k in llama3 to 25.
2024-05-20 17:23:39 -04:00
wozeparrot
b144d4b460
new llama3 example ( #4576 )
2024-05-19 22:42:23 -07:00
chenyu
a65c8de735
move .half() llama freq_cis to the end of sin and cos ( #4587 )
...
otherwise arange has inf if either dim or context length exceeds half.max
2024-05-14 15:00:18 -04:00
wozeparrot
d2c347fc74
faster gather for bert ( #4526 )
2024-05-10 22:28:48 -07:00
Elias Wahl
27613dd881
MLPerf BERT: Main training loop ( #4288 )
...
* BERT language modeling head + trunc normal initializers
* add train loop + helpers
* shuffle in dataloaders + slight changes in main loop
* beam change
* Minor changes
* random.shuffle
* HParam update
* Use deque for dataloader
* wandb bert project name
* half fixes
* BENCHMARK + remove epoch
* cast + print()
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-29 14:35:27 -04:00
Elias Wahl
2ecd61e3e2
monkey patching ( #4214 )
2024-04-18 19:20:52 -04:00
George Hotz
e79a11b99c
hotfix: revert llama change
2024-04-10 20:13:15 -07:00
George Hotz
2e6c39b0b2
Do less realizes ( #4141 )
...
* less realize
* corealize jit inputs
* prints
* print before we run
2024-04-10 19:50:50 -07:00
chenyu
f8dc82a8a7
use single tensor for llama kv chache ( #4108 )
...
similar to optimization in gpt2
2024-04-08 00:38:32 -04:00
chenyu
92c0675ccf
setitem initial support ( #4093 )
...
* wip setitem
it's an eager assign to output shapetracker view
* cleanups and tests
* more cleanups
2024-04-07 20:35:22 -04:00
David Hou
4b95350c41
fp16 resnet (without expand backwards sum in float, doesn't work) ( #3816 )
...
* fp16 resnet
* cast running mean and var back to default float
* extra cast
* check symbolic no overflow
* add linearizer failure
* loss scaler after grad contig
* oops
* i think this works
* don't loss scale fp32
* remove overflow test case
* remove symbolic bounds check
* loss scaler should be float
* temporarily disable padto cuz bug
shruggie
* make running stats in batchnorm float32?
* calculate lars stuff in fp32?
* oops
* remove most changes
* move loss scaler out of optimizer
* no more FP16 var
* oops
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-28 01:25:37 -04:00
George Hotz
150ea2eb76
create engine folder and move code ( #3948 )
...
* retry
* older tf
* that
2024-03-26 20:38:03 -07:00
George Hotz
4c4d3cb3e3
restrict assignment to base ( #3809 )
...
* restrict assignment to base
* add some restrictions there
* more restrictions
2024-03-18 15:33:06 -07:00
chenyu
5ac1fa933f
apply the same fix_bf16 in llama and coder ( #3789 )
...
* apply the same fix_bf16 in llama and coder
did not realize the same logic was in llama too.
really fix #2775
* flag for native SUPPORT_BF16 cast
2024-03-17 21:25:24 -04:00