Commit Graph

109 Commits

Author SHA1 Message Date
George Hotz 7e73c7b3cc hotfix: bump stable diffusion val distance 2024-09-26 11:15:29 +08:00
wozeparrot c100f3d406
default threefry (#6116) 2024-09-25 17:45:13 +08:00
chenyu c9a9631818
no UnaryOps.NEG in generated UOp patterns (#6209)
* no UnaryOps.NEG in generated UOp patterns

removed pattern `x * (-1) -> -x`  and `x != True`

* those are fine because NEG became CMPNE and True

* fix sd validation L2 norm
2024-08-21 11:08:22 -04:00
Tobias Fischer 8c9c1cf62f
Pulled CLIP and UNet into Seperate Files (#5253)
* pulled clip and unet into seperate files

* reference cleanup, lru cache fix

* better pool indexing
2024-07-01 22:33:01 -04:00
chenyu b9122ecdaf
revert stable diffusion validation with threefry (#5248)
* Revert "use threefry in stable diffusion benchmark (#4988)"

This reverts commit 44dfa37c70.

* sdxl and validation fix

* relax threshold
2024-07-01 14:43:47 -04:00
chenyu 88763eb9ff
fix stable_diffusion with fp16 (#5239) 2024-06-30 12:59:31 -04:00
Tobias Fischer 4688f97d48
Add SDXL Inference to Examples (#5206)
* added sdxl inference code

* fixed trailing whitespace

* use original impl code, removed uneeded numpy calls
2024-06-28 07:42:28 -04:00
chenyu 0ba093dea0
hotfix: only validate stable diffusion when using threefry (#5166) 2024-06-26 16:50:38 -04:00
chenyu e4a5870b36
validate stable_diffusion output (#5163)
changed default steps, forgot to update validation
2024-06-26 16:42:21 -04:00
chenyu e356807696
tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
chenyu 44dfa37c70
use threefry in stable diffusion benchmark (#4988)
also updated default steps to 10. easier to tell the image is following the prompt.
2024-06-15 20:25:29 -04:00
chenyu fd249422f5
minor cleanup example stable_diffusion (#4753) 2024-05-28 00:05:37 -04:00
chenyu 30fc1ad415
remove TODO: remove explicit dtypes after broadcast fix in stable_diffusion (#4241)
this is done
2024-04-21 00:31:24 -04:00
chenyu c71627fee6
move GlobalCounter to helpers (#4002)
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
George Hotz 150ea2eb76
create engine folder and move code (#3948)
* retry

* older tf

* that
2024-03-26 20:38:03 -07:00
George Hotz 3527c5a9d2
add Tensor.replace (#3738)
* add Tensor.replace

* fix dtypes in that test

* should be replace

* and mixtral
2024-03-14 13:34:14 -07:00
rnxyfvls 490c5a3ec3
examples/stable_diffusion: support model checkpoints without alphas_cumprod key (#3681)
* examples/stable_diffusion: support model checkpoints without alphas_cumprod key

(which is most models on civitai)

* fix indent

---------

Co-authored-by: a <a@a.aa>
2024-03-11 16:05:52 -04:00
George Hotz 41efaa848c
move graph.py and jit.py into features (#3376)
* move graph.py into features

* move jit into features

* fix quickstart
2024-02-12 17:34:34 +01:00
George Hotz a280cfe169
move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz c81ce9643d
move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
chenyu 7dc3352877
increase stable diffusion validation threshold 1e-4 -> 3e-4 (#2897)
saw a flaky CI failure with 1.1e-4, and 3e-4 is a good number
2023-12-21 11:45:25 -05:00
chenyu a044125c39
validate stable diffusion for seed 0 (#2773)
* validate stable diffusion for seed 0

the closest false positive i can get is with the setup and one less step. dist = 0.0036
same setup with fp16 has dist=5e-6.
so setting validation threshold to 1e-4 should be good

* run with --seed 0
2023-12-15 00:07:09 -05:00
chenyu 9afa8009c1
hot fix explicitly set arange dtype to float (#2772) 2023-12-14 23:14:38 -05:00
George Hotz 9e07824542
move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
George Hotz 095e2ced61
add name support to fetch (#2407)
* add name support

* use fetch in gpt2

* remove requests from main lib, networkx also optional

* umm, keep that assert

* updates to fetch

* i love the walrus so much

* stop bundling mnist with tinygrad

* err, https

* download cache names

* add DOWNLOAD_CACHE_VERSION

* need env.

* ugh, wrong path

* replace get_child
2023-11-23 14:16:17 -08:00
chenyu 5ef8d682e3
clean up attentions in stable diffusion (#2275) 2023-11-11 14:25:36 -05:00
Ahmed Harmouche 265304e7fd
Stable diffusion WebGPU port (#1370)
* WIP: Stable diffusion WebGPU port

* Load whole model: split safetensor to avoid Chrome allocation limit

* Gitignore .DS_Store, remove debug print

* Clip tokenizer in JS

* WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS

* e2e stable diffusion flow

* Create initial random latent tensor in JS

* SD working e2e

* Log if some weights were not loaded properly

* Remove latent_tensor.npy used for debugging

* Cleanup, remove useless logs

* Improve UI

* Add progress bar

* Remove .npy files used for debugging

* Add clip tokenizer as external dependency

* Remove alphas_cumprod.js and load it from safetensors

* Refactor

* Simplify a lot

* Dedup base when limiting elementwise merge (webgpu)

* Add return type to safe_load_metadata

* Do not allow run when webgpu is not supported

* Add progress bar, refactor, fix special names

* Add option to chose from local vs huggingface weights

* lowercase tinygrad :)

* fp16 model dl, decompression client side

* Cache f16 model in browser, better progress

* Cache miss recovery

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-03 18:29:16 -07:00
George Hotz 6dc8eb5bfd
universal disk cache (#2130)
* caching infra for tinygrad

* nons tr key

* fix linter

* no shelve in beam search

* beam search caching

* check tensor cores with beam too

* pretty print

* LATEBEAM in stable diffusion
2023-10-22 10:56:57 -07:00
Ahmed Harmouche 0d3410d93f
Stable diffusion: Make guidance modifiable (#2077) 2023-10-15 14:36:43 -07:00
Ahmed Harmouche e27fedfc7b
Fix stable diffusion output error on WebGPU (#2032)
* Fix stable diffusion on WebGPU

* Remove hack, numpy cast only on webgpu

* No-copy numpy cast
2023-10-10 06:40:51 -07:00
George Hotz adab724caa
schedule2, keep the tests working with small changes (#1932)
* lazy cleanups

* ast functions take in LazyOps

* op instead of self.op

* _base for mops

* fix contiguous

* start schedule

* test_schedule

* fix openpilot

* more tests

* bugfix and test skip

* work

* make sure things get freed

* fix zerosized tensors

* fix failing test

* fix ceil and friends

* fix openpilot

* disable training

* disable test collectives
2023-09-28 09:14:43 -07:00
Dat D. Nguyen ae9529e678
chore: remove redundant noise in stable diffusion example (#1910) 2023-09-24 21:33:45 +08:00
segf00lt 9e8c1dbf34
patch to remove hack from stable_diffusion.py (#1814)
* patch to remove hack from stable_diffusion.py

* sorry linter

* realize after assign?

* float16 broken in llvmlite use float64 for now

* int32

* idiot forgot to change test array dtype
2023-09-08 09:26:50 -07:00
George Hotz 722823dee1 stable diffusion: force fp16 free 2023-09-06 15:11:05 -07:00
Francis Lam 0379b64ac4
add seed option to stable_diffusion (#1784)
useful for testing correctness of model runs
2023-09-05 19:45:15 -07:00
Karan Handa a8aa13dc91
[ready] Replacing os with pathlib (#1708)
* replace os.path with pathlib

* safe convert dirnames to pathlib

* replace all os.path.join

* fix cuda error

* change main chunk

* Reviewer fixes

* fix vgg

* Fixed everything

* Final fixes

* ensure consistency

* Change all parent.parent... to parents
2023-08-30 10:41:08 -07:00
Umut Zengin 1682e9a38a
Fix: Stable Diffusion index (#1713) 2023-08-30 00:21:10 -04:00
George Hotz aa7c98722b
sd timing (#1706) 2023-08-28 20:22:57 -07:00
nimlgen 1c0449e190
add cache collector (#1595)
* init cache collector

* add test_cache_collector.py

* switch GlobalCounters.cache to CacheCollector

* init jit models test

* jitted SD

* add debug msg to print loaded bufs count

* moved cache collctor to jit

* clearer SD

* no double device import
2023-08-28 19:59:55 -07:00
George Hotz 718ced296c
move state to nn/state (#1619) 2023-08-22 07:36:24 -07:00
George Hotz b9feb1b743 fp16 support in stable diffusion 2023-08-20 05:37:21 +00:00
George Hotz 47f18f4d60
[New] SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1516) (#1518)
* Refactor AttnBlock, CrossAttention, CLIPAttention to share code

* Reshape and transpose in loop

* Bugfix on attention mask

Co-authored-by: Jacky Lee <39754370+jla524@users.noreply.github.com>
2023-08-10 15:04:18 -07:00
George Hotz c82bd59b85
Revert "SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1513)" (#1515)
This reverts commit 85e02311a2.
2023-08-10 09:08:51 -07:00
Jacky Lee 85e02311a2
SD: Refactor AttnBlock, CrossAttention, CLIPAttention to share code (#1513)
* Refactor AttnBlock, CrossAttention, CLIPAttention to share code

* Reshape and transpose in loop
2023-08-10 08:52:33 -07:00
George Hotz d78fb8f4ed
add stable diffusion and llama (#1471)
* add stable diffusion and llama

* pretty in CI

* was CI not true

* that

* CI=true, wtf

* pythonpath

* debug=1

* oops, wrong place

* uops test broken for wgpu

* wgpu tests flaky
2023-08-06 21:31:51 -07:00
Felix 97a6029cf7
Corrected a few misspelled words (#1435) 2023-08-04 16:51:08 -07:00
George Hotz f27df835a6
delete dead stuff (#1382)
* delete bpe from repo

* remove yolo examples

* Revert "remove yolo examples"

This reverts commit cd1f49d4662a5565726ae1fa7bf3f6a3e3985965.

* no windows
2023-07-31 11:17:49 -07:00
George Hotz bfbb8d3d0f
fix ones, BS=2 stable diffusion, caching optimizer (#1312)
* fix ones, BS=2 stable diffusion

* caching optimizer

* print search time

* minor bug fix
2023-07-21 09:55:49 -07:00
George Hotz f45013f0a3 stable diffusion: remove realizes we don't need 2023-07-20 19:53:07 -07:00
George Hotz b58dd015e3 stable diffusion: remove import numpy as np 2023-07-20 19:35:44 -07:00