chenyu
fbaab30fe3
add timing to fuzz_linearizer ( #7056 )
...
and applied smaller FUZZ_MAX_SIZE. this is getting quite slow in CI
2024-10-14 11:57:41 -04:00
chenyu
0d2462cbdf
use more resolve in View merge add [pr] ( #7055 )
2024-10-14 11:31:13 -04:00
qazal
8428244c30
gates are always bool [pr] ( #7054 )
2024-10-14 17:55:08 +03:00
qazal
7a28d50320
small st_fixup changes [pr] ( #7053 )
2024-10-14 16:53:10 +03:00
qazal
0ef186d4be
scheduler internal api cleanups [pr] ( #7052 )
...
* delete external_benchmark_ast.py [pr]
* cleanup 2
* random
2024-10-14 15:56:10 +03:00
qazal
bc95b7e422
actually use UOps.CONTIGUOUS ( #7049 )
2024-10-14 15:11:23 +03:00
George Hotz
f85c9ba00a
rewrite max to use cmplt + where ( #7037 )
2024-10-14 20:00:51 +08:00
qazal
88ce6ec69a
ASSIGN is always (target, val) ( #7048 )
2024-10-14 14:47:52 +03:00
qazal
0f71bc10cd
small changes from the lazy_pm branch [pr] ( #7047 )
2024-10-14 12:21:21 +03:00
qazal
3e795f2e52
verify_ast changes from lazy_pm [pr] ( #7045 )
2024-10-14 12:08:18 +03:00
George Hotz
b20b22a738
hotfix: add test_tiny, because many times it's what you want
2024-10-14 16:32:33 +08:00
George Hotz
c4db927c7b
touchup lowerer [pr] ( #7043 )
2024-10-14 16:13:28 +08:00
Louis Novy
2ac5aec66b
Fix exponential complexity in _is_padding_okay [pr] ( #7008 )
...
* preliminary test
* missed Optional
* don't check for cache during recursion
* match style from st_fixup... may be marginally faster?
* pathological test case: strongly connected DAG
* move to test_schedule as this isn't really a fusion
* oops this shouldn't be edited
* Revert "oops this shouldn't be edited"
This reverts commit 487cb027dc5120542755446d1595ec7b76c207e8.
* Revert "move to test_schedule as this isn't really a fusion"
This reverts commit 48d8c550ce84453e6fc0306e1c6c448fe1286f79.
* move to test_schedule as this isn't really a fusion
* ok no more merge error funny business
2024-10-14 02:34:47 +03:00
chenyu
bd8ecf7fd6
remove NumNode ( #7035 )
2024-10-13 16:42:19 -04:00
chenyu
c4c806a210
generate new kernel dataset ( #7034 )
...
* generate new kernel dataset
pre req to remove NumNode
```
extra/optimization/generate_dataset.sh
gzip -k /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
* fix var range in fuzz_linearizer
2024-10-13 16:19:41 -04:00
chenyu
1a27417262
remove arbitrary multiplication case ( #7033 )
...
adds the wrongly simplified kernel in test_linearizer_failures
#7019
2024-10-13 15:06:05 -04:00
chenyu
13575f080a
remove bitcast backward in function.py ( #7031 )
...
bitcast cannot backward
2024-10-13 10:08:27 -04:00
Harsh Natuskar
ace834ef7b
=docs update ( #7027 )
2024-10-13 19:39:06 +08:00
qazal
13846930cd
hotfix: extract_dataset.py ( #7029 )
2024-10-13 11:18:23 +03:00
nimlgen
942a17109a
qcom use QCOMBuffer for all allocated buffers ( #7023 )
...
* qcom use QCOMBuffer for all allocated buffers
* checks
2024-10-12 23:44:36 +03:00
chenyu
04d9b46d51
derivative of softmax is indepedent of max ( #7009 )
...
* derivative of softmax is indepedent of max
* update test
2024-10-12 15:59:23 -04:00
chenyu
cae1c41755
test case of softmax backward kernel count ( #7022 )
2024-10-12 15:46:32 -04:00
George Hotz
5ce224ceb3
handle arbitrary multiplication case ( #7019 )
...
* handle arbitrary multiplication case
* remove count restriction
2024-10-12 23:16:27 +08:00
chenyu
23faeacb23
remove outdated comments ( #7018 )
2024-10-12 10:51:07 -04:00
George Hotz
85a45164fb
remove pyint [pr] ( #7016 )
...
* remove pyint
* bump time on tp [pr]
* dont truncate in const fold
* remove dead code
* Revert "dont truncate in const fold"
This reverts commit 29c81db0f7880848b001c2728aa555a1ef17e7d3.
* remove define_var
2024-10-12 22:36:24 +08:00
George Hotz
38d45dfba5
hotfix: no rng in test/external/external_benchmark_schedule.py
2024-10-12 22:03:04 +08:00
chenyu
ed1ed9e4ff
bert use BS=72 ( #7015 )
...
memory 131 -> 138
green tflops 201 -> 209
red tflops 160 -> 169
2024-10-12 09:41:56 -04:00
George Hotz
cba4b9a058
clean up ops file [pr] ( #7013 )
2024-10-12 19:53:52 +08:00
qazal
746a1f8c86
prep uoping diff for big graph [pr] ( #7014 )
2024-10-12 14:09:32 +03:00
ignaciosica
334f499e6a
consistent render of recip in cuda with CStyleLanguage ( #6980 )
2024-10-12 18:56:47 +08:00
George Hotz
a71bb09ec3
remove symbolic file [pr] ( #7012 )
2024-10-12 18:44:44 +08:00
George Hotz
16271189ea
hotfix: don't spend lines on a (broken) favicon
2024-10-12 18:21:10 +08:00
George Hotz
b737ee5bac
move to_indexed_uops to uops ( #7011 )
...
* move to_indexed_uops to uops
* UOp.range
2024-10-12 18:20:57 +08:00
George Hotz
5ae2de9845
UOp.variable ( #7010 )
...
* UOp.variable [pr]
* fix tests
* clean
* improve name rendering
* last bug
2024-10-12 18:20:44 +08:00
Bhavya Gada
f79e05cac0
add types in all nn/init.py classes ( #7002 )
...
* add types in batchnorm class
* fix lint error in batchnorm types
* add types to conv1d function
* add types to convtranspose1d func and conv2d, convtranspose2d classes
* add types to all remaining classes
* change conv1d padding type to also accept str
* less is more; only keep non-obvious types
* mkdocs need types
2024-10-12 14:42:14 +08:00
ignaciosica
2bb6b95e9f
refactor _make_hip_code_for_op into pm rules ( #7001 )
2024-10-12 12:46:22 +08:00
George Hotz
5c9f76e274
hotfix: openpilot compile3 compare to i==1
2024-10-12 09:44:24 +08:00
chenyu
36056e0760
update mlperf systems and copy 4.1 to 5.0 ( #7004 )
2024-10-11 16:20:34 -04:00
Markiian Novosad
8831c691e2
Add slice parameter type checking to disallow Tensor usage for slices ( #6967 )
...
* add support for single el tensors for slices
* rm trailing spaces
* cleanup long lines
* remove tensor in slice support, add comprehensive err msg
* cleanup getitem, add slice type check
* Edit err message
2024-10-11 16:20:21 -04:00
Francis Lam
b0dd407cdd
ops_cuda: add optional dynamic smem parameter ( #6956 )
...
* ops_cuda: add optional dynamic smem parameter
This is required to enable larger than 48kb shared memory usage on
a per-kernel basis.
* move setting max dynamic smem size to init
2024-10-11 21:51:06 +03:00
chenyu
0e42662f2a
log seed at the right place for bert ( #7000 )
2024-10-11 10:39:40 -04:00
nimlgen
5496a36536
update red mlperf bert readme ( #6969 )
2024-10-11 13:08:06 +03:00
nimlgen
feb0bcb58b
qcom bench bind to perf cluster ( #6996 )
2024-10-11 12:21:52 +03:00
qazal
7451812bbf
delete AST_REWRITE ctx var ( #6995 )
2024-10-11 11:33:16 +03:00
qazal
7988547df2
start changes from big graph ( #6993 )
...
* start changes from big graph [pr]
* space
* still capture ctx
2024-10-11 11:13:46 +03:00
George Hotz
e7a0ffe46a
break out linearization [pr] ( #6994 )
2024-10-11 15:27:33 +08:00
George Hotz
f319530191
don't track simplify [pr] ( #6992 )
2024-10-11 15:03:03 +08:00
George Hotz
e441794c4b
remove custom op support, we waste time maintaining this ( #6991 )
...
* remove custom op support, we waste time maintaining this
* customop is over
2024-10-11 14:31:09 +08:00
George Hotz
c08521e823
minor cleanups from toonygrad ( #6990 )
2024-10-11 14:19:10 +08:00
George Hotz
f50d0e0ee0
cloud device [pr] ( #6964 )
...
* first try at cloud device [pr]
* real separation
* we're free
* clang works
* unhappy with timeout
* better timeouts and free
* unrelated
* use http verbs + add test
* lines + better test
* fix DELETE
* shorter cloud
* split key
* fix sending renderer
* PTXRenderer serialization
* add sessions
* http.client
* minor timeout bump
* fix keep-alive
* inc server timeout
* real fix timeout
* that one too
2024-10-11 12:24:06 +08:00