Commit Graph

6388 Commits

Author SHA1 Message Date
hikettei 0f0c3934b1
refactor: improved the consistency of the frexp in transcendental (#7060)
* clarify the intetntion of bias

* Improved the consistency of m2

* int16

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-10-15 10:18:38 -04:00
chenyu d12c87dc8e
use ubuntu-22.04 in CI (#7068)
ubuntu-latest points to 24.04 now, maybe it's this?
2024-10-15 09:44:59 -04:00
nimlgen 586ff4c910
nv record uvm mappings (#7059)
* nv record uvm mappings

* linteeer

* smth

* ooops
2024-10-15 00:12:49 +03:00
chenyu 2008bac6bf
use validhack logic to rewrite buffer idx (#6740)
* use validhack logic to rewrite buffer idx

saved a whopping one mod in the conv backward kernel...

* cleanup more
2024-10-14 16:47:31 -04:00
qazal 968a79b56c
lint viz with eslint (#6988)
* lint viz

* green

* move config

* space

* meh, laterg
2024-10-14 22:40:56 +03:00
chenyu a99e42cf2f
clean up test_uop_symbolic.py (#7058)
enable more tests and remove dead tests
2024-10-14 15:35:58 -04:00
nimlgen 8094340221
nv print info about faults (#7057)
* nv print info about faults

* unrelated changes

* nv_gpu.GT200_DEBUGGER in mockgpu

* regen with ocrrect version

* spacing
2024-10-14 21:49:38 +03:00
chenyu fbaab30fe3
add timing to fuzz_linearizer (#7056)
and applied smaller FUZZ_MAX_SIZE. this is getting quite slow in CI
2024-10-14 11:57:41 -04:00
chenyu 0d2462cbdf
use more resolve in View merge add [pr] (#7055) 2024-10-14 11:31:13 -04:00
qazal 8428244c30
gates are always bool [pr] (#7054) 2024-10-14 17:55:08 +03:00
qazal 7a28d50320
small st_fixup changes [pr] (#7053) 2024-10-14 16:53:10 +03:00
qazal 0ef186d4be
scheduler internal api cleanups [pr] (#7052)
* delete external_benchmark_ast.py [pr]

* cleanup 2

* random
2024-10-14 15:56:10 +03:00
qazal bc95b7e422
actually use UOps.CONTIGUOUS (#7049) 2024-10-14 15:11:23 +03:00
George Hotz f85c9ba00a
rewrite max to use cmplt + where (#7037) 2024-10-14 20:00:51 +08:00
qazal 88ce6ec69a
ASSIGN is always (target, val) (#7048) 2024-10-14 14:47:52 +03:00
qazal 0f71bc10cd
small changes from the lazy_pm branch [pr] (#7047) 2024-10-14 12:21:21 +03:00
qazal 3e795f2e52
verify_ast changes from lazy_pm [pr] (#7045) 2024-10-14 12:08:18 +03:00
George Hotz b20b22a738 hotfix: add test_tiny, because many times it's what you want 2024-10-14 16:32:33 +08:00
George Hotz c4db927c7b
touchup lowerer [pr] (#7043) 2024-10-14 16:13:28 +08:00
Louis Novy 2ac5aec66b
Fix exponential complexity in _is_padding_okay [pr] (#7008)
* preliminary test

* missed Optional

* don't check for cache during recursion

* match style from st_fixup... may be marginally faster?

* pathological test case: strongly connected DAG

* move to test_schedule as this isn't really a fusion

* oops this shouldn't be edited

* Revert "oops this shouldn't be edited"

This reverts commit 487cb027dc5120542755446d1595ec7b76c207e8.

* Revert "move to test_schedule as this isn't really a fusion"

This reverts commit 48d8c550ce84453e6fc0306e1c6c448fe1286f79.

* move to test_schedule as this isn't really a fusion

* ok no more merge error funny business
2024-10-14 02:34:47 +03:00
chenyu bd8ecf7fd6
remove NumNode (#7035) 2024-10-13 16:42:19 -04:00
chenyu c4c806a210
generate new kernel dataset (#7034)
* generate new kernel dataset

pre req to remove NumNode
```
extra/optimization/generate_dataset.sh
gzip -k /tmp/sops
mv /tmp/sops.gz extra/datasets/
```

* fix var range in fuzz_linearizer
2024-10-13 16:19:41 -04:00
chenyu 1a27417262
remove arbitrary multiplication case (#7033)
adds the wrongly simplified kernel in test_linearizer_failures
#7019
2024-10-13 15:06:05 -04:00
chenyu 13575f080a
remove bitcast backward in function.py (#7031)
bitcast cannot backward
2024-10-13 10:08:27 -04:00
Harsh Natuskar ace834ef7b
=docs update (#7027) 2024-10-13 19:39:06 +08:00
qazal 13846930cd
hotfix: extract_dataset.py (#7029) 2024-10-13 11:18:23 +03:00
nimlgen 942a17109a
qcom use QCOMBuffer for all allocated buffers (#7023)
* qcom use QCOMBuffer for all allocated buffers

* checks
2024-10-12 23:44:36 +03:00
chenyu 04d9b46d51
derivative of softmax is indepedent of max (#7009)
* derivative of softmax is indepedent of max

* update test
2024-10-12 15:59:23 -04:00
chenyu cae1c41755
test case of softmax backward kernel count (#7022) 2024-10-12 15:46:32 -04:00
George Hotz 5ce224ceb3
handle arbitrary multiplication case (#7019)
* handle arbitrary multiplication case

* remove count restriction
2024-10-12 23:16:27 +08:00
chenyu 23faeacb23
remove outdated comments (#7018) 2024-10-12 10:51:07 -04:00
George Hotz 85a45164fb
remove pyint [pr] (#7016)
* remove pyint

* bump time on tp [pr]

* dont truncate in const fold

* remove dead code

* Revert "dont truncate in const fold"

This reverts commit 29c81db0f7880848b001c2728aa555a1ef17e7d3.

* remove define_var
2024-10-12 22:36:24 +08:00
George Hotz 38d45dfba5 hotfix: no rng in test/external/external_benchmark_schedule.py 2024-10-12 22:03:04 +08:00
chenyu ed1ed9e4ff
bert use BS=72 (#7015)
memory 131 -> 138
green tflops 201 -> 209
red tflops 160 -> 169
2024-10-12 09:41:56 -04:00
George Hotz cba4b9a058
clean up ops file [pr] (#7013) 2024-10-12 19:53:52 +08:00
qazal 746a1f8c86
prep uoping diff for big graph [pr] (#7014) 2024-10-12 14:09:32 +03:00
ignaciosica 334f499e6a
consistent render of recip in cuda with CStyleLanguage (#6980) 2024-10-12 18:56:47 +08:00
George Hotz a71bb09ec3
remove symbolic file [pr] (#7012) 2024-10-12 18:44:44 +08:00
George Hotz 16271189ea hotfix: don't spend lines on a (broken) favicon 2024-10-12 18:21:10 +08:00
George Hotz b737ee5bac
move to_indexed_uops to uops (#7011)
* move to_indexed_uops to uops

* UOp.range
2024-10-12 18:20:57 +08:00
George Hotz 5ae2de9845
UOp.variable (#7010)
* UOp.variable [pr]

* fix tests

* clean

* improve name rendering

* last bug
2024-10-12 18:20:44 +08:00
Bhavya Gada f79e05cac0
add types in all nn/init.py classes (#7002)
* add types in batchnorm class

* fix lint error in batchnorm types

* add types to conv1d function

* add types to convtranspose1d func and conv2d, convtranspose2d classes

* add types to all remaining classes

* change conv1d padding type to also accept str

* less is more; only keep non-obvious types

* mkdocs need types
2024-10-12 14:42:14 +08:00
ignaciosica 2bb6b95e9f
refactor _make_hip_code_for_op into pm rules (#7001) 2024-10-12 12:46:22 +08:00
George Hotz 5c9f76e274 hotfix: openpilot compile3 compare to i==1 2024-10-12 09:44:24 +08:00
chenyu 36056e0760
update mlperf systems and copy 4.1 to 5.0 (#7004) 2024-10-11 16:20:34 -04:00
Markiian Novosad 8831c691e2
Add slice parameter type checking to disallow Tensor usage for slices (#6967)
* add support for single el tensors for slices

* rm trailing spaces

* cleanup long lines

* remove tensor in slice support, add comprehensive err msg

* cleanup getitem, add slice type check

* Edit err message
2024-10-11 16:20:21 -04:00
Francis Lam b0dd407cdd
ops_cuda: add optional dynamic smem parameter (#6956)
* ops_cuda: add optional dynamic smem parameter

This is required to enable larger than 48kb shared memory usage on
a per-kernel basis.

* move setting max dynamic smem size to init
2024-10-11 21:51:06 +03:00
chenyu 0e42662f2a
log seed at the right place for bert (#7000) 2024-10-11 10:39:40 -04:00
nimlgen 5496a36536
update red mlperf bert readme (#6969) 2024-10-11 13:08:06 +03:00
nimlgen feb0bcb58b
qcom bench bind to perf cluster (#6996) 2024-10-11 12:21:52 +03:00