Commit Graph

5458 Commits

Author SHA1 Message Date
nimlgen 590b9ebb34
hcq copy queue is optional (#5909)
* hcq copy queue is optional

* one more

* this
2024-08-05 14:03:25 +03:00
George Hotz 159ac06b5b
remove unused reduce rules + improve unparented (#5908)
* remove unused reduce rules [run_process_replay]

* this work

* those tests are meaningless now
2024-08-04 18:18:27 -07:00
George Hotz d7387d31bf
remove useless reduce cases [run_process_replay] (#5907)
* remove useless reduce cases [run_process_replay]

* do_reduce cleanup

* more cleanups + no longer supported tests

* Revert "more cleanups + no longer supported tests"

This reverts commit e9f2f6ba7061f8697a308aacdc3442fa922a77f5.

* no longer supported tests

* switch ReduceOps.SUM -> BinaryOps.ADD
2024-08-04 17:11:08 -07:00
wozeparrot 94917521ee
fix: sqlite on pypy (#5906) 2024-08-04 16:40:59 -07:00
George Hotz be8958e26b
use CONTRACT before REDUCE (#5903)
* use CONTRACT before REDUCE [run_process_replay]

* support half expand

* EXPAND GEP
2024-08-04 16:17:33 -07:00
wozeparrot f33950f454
tracemeta fixups (#5904) 2024-08-04 16:15:06 -07:00
chenyu adba5efc64
enable llama 2 70B in tinybox green CI (#5905)
runnable with MAX_CONTEXT=256
2024-08-04 18:48:46 -04:00
chenyu 4a65010de8
remove CUDACPU flag in tests [run_process_replay] (#5902)
no longer used
2024-08-04 16:06:38 -04:00
chenyu 996ff0c135
pow(2) -> square in RMSNorm [run_process_replay] (#5901)
reads nicer in metadata
2024-08-04 14:21:31 -04:00
qazal aad9234e52
test fused precompute_freqs_cis (#5900)
* test_precompute_freqs_cis

* tiny for ci
2024-08-04 21:01:05 +03:00
chenyu c67e9887f7
support using str to specify dtype (#5897)
* support using str to specify dtype

in Tensor creation and args into `cast` and `bitcast`, and acc_dtype

* more tests
2024-08-04 12:56:28 -04:00
nimlgen 4f9221e8dd
remove useless _ensure_shared_time_base (#5899) 2024-08-04 17:01:54 +03:00
qazal 4c5ef2cc4f
setitem with arange fusion 1 (#5898) 2024-08-04 16:09:21 +03:00
chenyu 59315ffc78
minor cleanup to UOp mod folding [run_process_replay] (#5895)
some walrus
2024-08-03 21:38:44 -04:00
nimlgen dad8e72ee9
hcq graph refactor (#5887)
* cleanup

* prof

* cleaner

* comments

* more types
2024-08-03 23:35:33 +03:00
chenyu da61dea1b2
simple failed UOp sub symbolic test case (#5894) 2024-08-03 14:27:23 -04:00
Elias Wahl 937bf5fe12
better hparam (#5891) 2024-08-03 12:38:53 -04:00
qazal 37cc87ea75
save lines in the scheduler [run_process_replay] (#5890) 2024-08-03 14:20:11 +03:00
qazal 56ef9e453e
pad reduceops to the max of each dimension (#5889)
* early verify

* pad reduceops to the max of each dim

* remove the function
2024-08-03 14:03:30 +03:00
qazal 65fa86901a
indexing fusion 2 (#5888)
* arange fusion

* kernels that fuse

* tests
2024-08-03 13:13:39 +03:00
qazal af59b2eea9
tests from the indexing fusion branch (#5886) 2024-08-03 11:56:48 +03:00
chenyu a77eab89ca
UOp mod folding cleanup (#5885)
move patterns around and update comments
2024-08-02 22:56:32 -04:00
chenyu d5de44340e
UOp add mod folding (#5862)
* UOp add mod folding

* that passes now
2024-08-02 18:31:46 -04:00
George Hotz 714d00f325 hotfix: median > mean for sampling clock jitter 2024-08-02 22:07:58 +00:00
George Hotz 7348c40d9d
sampling time sync (8700 lines) (#5843)
* sampling time sync

* jitter matrix

* comment

* pass mypy

* line count
2024-08-02 14:44:35 -07:00
chenyu 41bbd3f4c1
update UOp mod reduction patterns (#5883)
prepare generic mod folding, also some test changes from mod folding pr
2024-08-02 17:43:40 -04:00
wozeparrot acadccf344
comma benchmark (#5518) 2024-08-02 14:36:54 -07:00
nimlgen b4709d294a
hotfix: hcq profiler use mid point for deps flow (#5882)
* hcq profiler use mid point for deps

* fixes

* mypy
2024-08-02 23:53:10 +03:00
Elias Wahl 4a114756f6
New BERT dataloader (#5881)
* One file == One topic

* update test

* new dataloader

* update train script

* get index is faster
2024-08-02 15:12:23 -04:00
nimlgen 2777784b91
add dependency viewer to hcq profiler (#5874)
* hcq profiler support deps

* clean up

* cleaner

* cleanup

* revert this

* linter

* mypy

* add test

* sync is strange, need to take the end

* linter + test
2024-08-02 22:07:01 +03:00
George Hotz 23e8c39288
get program fields in __post_init__ [run_process_replay] (#5878)
* get program fields in __post_init__ [run_process_replay]

* remove print
2024-08-02 09:57:12 -07:00
qazal 8611fa6c99
apply opts.extra_matcher in process replay [run_process_replay] (#5877) 2024-08-02 18:07:58 +03:00
qazal 2a791f7924
fuzz uops is simpler with List[UOp] [run_process_replay] (#5875)
* remove from fuzz_uops

* update fuzz_uops.py

* add to realize.py
2024-08-02 17:28:15 +03:00
George Hotz 3995f1ddf1
move ops lds estimate to Program [run_process_replay] (#5872) 2024-08-01 19:12:07 -07:00
George Hotz 877e0b4ba0
define global only has the index [run_process_replay] (#5869)
* define global only has the index [run_process_replay]

* fix that linearizer test

* fix ptx

* stupid ptx fix
2024-08-01 19:01:15 -07:00
chenyu f27f949a5d
Revert "revert some UOp IDIV bound (#5863)" (#5871)
This reverts commit 0c8d202348.
2024-08-01 21:38:31 -04:00
chenyu df138bc558
Revert "revert a mod pattern (#5864)" (#5870)
This reverts commit 5c8de2d044.
2024-08-01 20:44:26 -04:00
chenyu 1b0314d9ef
Revert "remove one more UOp mod pattern (#5865)" (#5868)
This reverts commit b03b8e18c2.
2024-08-01 20:28:35 -04:00
George Hotz d73bc85ba9
UOpGraph not in renderer or Program [run_process_replay] (#5867)
* UOpGraph not in renderer or Program [run_process_replay]

* fix some tests

* fix ptx
2024-08-01 16:20:30 -07:00
chenyu b392b8edc3
increase atol and rtol test_gemm_fp16 (#5866)
* increase atol and rtol test_gemm_fp16

made it pass with NOOPT which has larger accumulated error

* revert that
2024-08-01 19:09:58 -04:00
chenyu b03b8e18c2
remove one more UOp mod pattern (#5865)
fixed UOP_IS_SYMBOLIC=1 test_failure_40
2024-08-01 18:29:04 -04:00
chenyu 5c8de2d044
revert a mod pattern (#5864)
fixed UOP_IS_SYMBOLIC=1 linearizer failure 47
2024-08-01 17:24:26 -04:00
nimlgen 34168a64e3
optimize nv profiler (#5856)
* nv profiler fix

* cleanup hcq a bit

* fixes

* fix

* typo

* all signals put timestamp

* a bit cleaner

* merge fields

* type

* import

* tiny fix
2024-08-01 23:57:45 +03:00
George Hotz 2d3c7e4d4e
some TestPickleJIT tests (#5860)
* some TestPickleJIT tests

* hotfix: print which opencl device we are using
2024-08-01 12:39:59 -07:00
George Hotz e347f10d33 hotfix: print which opencl device we are using 2024-08-01 12:39:46 -07:00
chenyu 0c8d202348
revert some UOp IDIV bound (#5863)
* revert some UOp IDIV bound

breaks conv with UOP_IS_SYMBOLIC, added some conv tests in CI

* those are correct

* skip slow ones
2024-08-01 15:09:06 -04:00
George Hotz 53fcac9e80 hotfix: increase time on flaky NV test 2024-08-01 10:20:07 -07:00
qazal cedf459843
infra for multi view reduce_info [run_process_replay] (#5861) 2024-08-01 19:46:55 +03:00
qazal 26d0265d66
test schedule of LazyBuffers [run_process_replay] (#5859) 2024-08-01 19:06:29 +03:00
George Hotz 0e34d83777 hotfix: don't include the old input_rawbuffers in all_resources 2024-08-01 09:00:11 -07:00