Commit Graph

6388 Commits

Author SHA1 Message Date
qazal 7451812bbf
delete AST_REWRITE ctx var (#6995) 2024-10-11 11:33:16 +03:00
qazal 7988547df2
start changes from big graph (#6993)
* start changes from big graph [pr]

* space

* still capture ctx
2024-10-11 11:13:46 +03:00
George Hotz e7a0ffe46a
break out linearization [pr] (#6994) 2024-10-11 15:27:33 +08:00
George Hotz f319530191
don't track simplify [pr] (#6992) 2024-10-11 15:03:03 +08:00
George Hotz e441794c4b
remove custom op support, we waste time maintaining this (#6991)
* remove custom op support, we waste time maintaining this

* customop is over
2024-10-11 14:31:09 +08:00
George Hotz c08521e823
minor cleanups from toonygrad (#6990) 2024-10-11 14:19:10 +08:00
George Hotz f50d0e0ee0
cloud device [pr] (#6964)
* first try at cloud device [pr]

* real separation

* we're free

* clang works

* unhappy with timeout

* better timeouts and free

* unrelated

* use http verbs + add test

* lines + better test

* fix DELETE

* shorter cloud

* split key

* fix sending renderer

* PTXRenderer serialization

* add sessions

* http.client

* minor timeout bump

* fix keep-alive

* inc server timeout

* real fix timeout

* that one too
2024-10-11 12:24:06 +08:00
Bhavya Gada 23c09f4b4c
add support for padding='same' in nn.conv (#6975)
* add support for padding='same' in nn.conv

* express concisely

* simplify loop

* test same padding with dilation and conv1d

* fix bad indentation

* make loop one liner
2024-10-11 11:39:07 +08:00
qazal 54dcea235d
viz auto recenter on out of view graph [pr] (#6986) 2024-10-11 02:40:06 +03:00
nimlgen 159ee04489
include qcom in view_supported_devices (#6985)
* include qcom in view_supported_devices

* ignore images
2024-10-11 01:10:51 +03:00
nimlgen f9d454aed5
correct kernargs alignment (#6984) 2024-10-11 00:06:28 +03:00
qazal 2b17279d4e
viz don't default open the browser [pr] (#6983)
* viz don't default open the browser [pr]

* move st

* scale down
2024-10-10 22:12:18 +03:00
qazal 4f60252210
reduce scheduler process replay overhead [pr] (#6981) 2024-10-10 20:03:38 +03:00
Friedrich Carl Eichenroth 859d6d0407
Fix mypy examples/beautiful_*.py (#6978)
* fix mypy examples/beautiful_*.py

* backwards

* add test

* Revert "add test"

This reverts commit 4d88845ba3f24d83621da0abf55096553abda7fa.

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-10-10 11:34:29 -04:00
qazal 4ef5310039
track viz context even if rewrite errors [pr] (#6976) 2024-10-10 18:33:15 +03:00
chenyu 592e5f1df2
skip test_viz test_no_dedup_different_opts (#6979) 2024-10-10 11:10:24 -04:00
chenyu e3dc10f8f6
improve fold_unrolled_divs (#6977)
addressed #6935
the first few terms in fold_unrolled_divs might have been folded already, so the check should first try to add those terms back. there is a case that every but one term is folded which is not an add chain anymore, so just added as a failed test case for now
2024-10-10 10:52:05 -04:00
qazal 3481468702
bring viz to core (#6970)
* move viz to core

* pathfix

* move test_viz to core

* cleanup test_viz diff

* use contextvars
2024-10-10 16:56:26 +03:00
nimlgen fad575ec76
qcom tiny cleanups (#6973) 2024-10-10 12:26:41 +03:00
qazal 3724a66716
move test_viz to test/, prereq for tinygrad/viz [pr] (#6972) 2024-10-10 11:40:46 +03:00
Kinvert 960c495755
added beautiful fashion mnist and example (#6961)
* added beautiful fashion mnist and example

* fixing whitespace

* refactor Fashion MNIST to fewer lines

* fix newline to reduce diff

* Update beautiful_mnist.py

* Update beautiful_mnist.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-10-10 12:01:07 +08:00
chenyu b5546912e2
10% more TRAIN_STEPS for bert (#6971)
got two very close run, adding more steps for buffer
2024-10-09 19:21:43 -04:00
nimlgen f90d8493cc
add HCQDEV_WAIT_TIMEOUT_MS (#6968) 2024-10-09 19:50:00 +03:00
chenyu 35cf48659b
limit beam param for bert on green (#6966)
seems to mitigate the crash
2024-10-09 11:48:18 -04:00
mesozoic-egg 0e8bcda07e
get readable error from wait_check (#6965)
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
2024-10-09 17:28:58 +03:00
qazal 20d3c2d113
unify UOps.SHAPETRACKER and UOps.SWIZZLE with UOps.VIEW (#6955)
* add UOps.VIEW

* update hardcoded asts

* update sops.gz
2024-10-09 02:00:17 +08:00
nimlgen 137ad5519f
amd fix cwsr for gfx11 (#6950)
* amd cwsr

* ()
2024-10-08 17:44:29 +03:00
nimlgen 0d526e251e
nv sync on gpu before local update (#6954) 2024-10-08 17:43:58 +03:00
qazal 2800520dd5
even smaller process_replay.py [pr] (#6941)
* even smaller process_replay.py [pr]

* delete those tests

* dedup asts
2024-10-08 20:43:22 +08:00
qazal 851f39653a
rename to BUFFER_VIEW + MetaOps cleanup (#6953) 2024-10-08 20:09:22 +08:00
chenyu 1ff2c98f8a
fix logfile name for bert red (#6952) 2024-10-08 05:37:52 -04:00
czhu 08bfa8632b
embedding shape (#6930) 2024-10-08 14:42:20 +08:00
vladov 20a9683403
Make self.fd Optional. (#6855)
* Make self.fd Optional.

* Fix io_uring when missing fd.

* Compress io_uring fast path code.
2024-10-08 13:25:34 +08:00
chenyu a78c96273a
update bert epoch logging (#6940)
* update bert epoch logging

epoch for bert is simply number of examples seen (which is used for RCP check)

* update total steps too

* more changes
2024-10-08 00:34:06 -04:00
George Hotz 0498e846a5
break out metaops (#6948) 2024-10-08 12:08:54 +08:00
nimlgen 42609300ff
hcq no timeline signals in init (#6944) 2024-10-07 23:36:19 +03:00
qazal 0ecc417dd2
prep for viz move to core [pr] (#6938)
* prep for viz move to core [pr]

* polish
2024-10-07 23:24:04 +08:00
chenyu e4c0743188
failed example for logcumsumexp (#6936)
need cummax for numerical stability
2024-10-07 10:55:45 -04:00
chenyu 102dfe5510
back to 2**10 for bert loss scaler (#6934)
getting 2 NaN for this, revert back to 2**10
2024-10-07 10:17:21 -04:00
qazal 9250452da4
no codegen import in ops [pr] (#6888)
* no codegen import in ops [pr]

* @track_rewrites

* all functions need this

* polish
2024-10-07 20:54:21 +08:00
George Hotz f7f94cd62f
bitcast cleanup [pr] (#6933) 2024-10-07 19:16:16 +08:00
chenyu 0cf815a93a
bert use BS=66 and update hparams (#6932)
with dropout memory improvement, we can fit BS=66 now. revert back to the hparams in #5891 too
2024-10-07 05:08:27 -04:00
ignaciosica 32ac24c45c
Generic wmma rendering for cuda, ptx [run_process_replay] (#6838)
* generic wmma rendering for cuda, ptx

- also adds wmma generic shape ops_python support

* hotfix: fixed values in ops_python

* hotfix: more fixed values

* hotfix: revert changes in ops_python

* refactor wmma rendering

* hotfix: get n_args directly

* hotfix: use n_args[0] for a

* hotfix: simplify

* hotfix: add args_slices

* hotfix: rename args back to operands

* hotfix: fix spacing

* hotfix: rename upc to sz

* hotfix: rename args to operands in assembly

* hotfix: space

* hotifx: add comment for literal 4

* hotfix: rename some variables and change for clarity
2024-10-07 16:36:36 +08:00
qazal b82023c97e
process replay cleanup to generic _pmap [pr] (#6929)
* process replay cleanup to generic _pmap [pr]

* delete `COMPARE_SCHEDULE`
2024-10-07 13:57:05 +08:00
qazal 16312b4c59
rip out old scheduler process replay stuff, diff pure UOps [pr] (#6927) 2024-10-07 13:20:35 +08:00
chenyu 999e3780e9
dropout contiguous after >= p (#6892)
make it a bool buffer
2024-10-06 19:40:42 -04:00
wozeparrot 9eb6eef441
seed in tensor (#6869) 2024-10-06 14:46:58 -04:00
Tobias Fischer f9e32f2bb2
clip device fix (#6924) 2024-10-07 00:47:32 +08:00
chenyu 01a2d7316d
dtype=float in bert log_softmax for loss and accuracy (#6916) 2024-10-06 11:15:56 -04:00
jeffzh4ng 19a7e41113
implement logcumsumexp (#6921)
* implement logcumsumexp

* change axis=None to axis=0
2024-10-06 10:45:36 -04:00