Commit Graph

4422 Commits

Author SHA1 Message Date
chenyu 7afca52796
replace pow in LAMB by tracking b1**t and b2**t per step (#4582)
* replace pow in LAMB by tracking b1**t and b2**t per step

* remove t, add [self.b1_t, self.b2_t] to return

* adam has one less kernel
2024-05-14 13:08:22 -04:00
nimlgen 9b02aef45a
remove rhip (#4579)
* remove rhip

* remove hip runner
2024-05-14 17:58:19 +03:00
Szymon Ożóg 5eb81ff764
Fix speed compare script (#4581)
* Fix speed compare script

* Update speed_compare_cuda_ptx.py

* Update speed_compare_cuda_ptx.py

* Remove unused function
2024-05-14 17:47:03 +03:00
nimlgen 2131556c2c
amd mockgpu (#4535)
* start mock amd gpu

* virt files

* cleaner

* init ci

* small fixes

* linter

* better?

* ugh

* linter

* fix

* diable some

* run shorter

* fixes

* add hcq test

* fix

* fix cmd revert
2024-05-14 14:28:04 +03:00
geohotstan 089eeec271
setitem in-place operator tests (#4577)
* tests and error

* rename to in-place

* add a note

* more comments

* more comments

* disable folded advanced setitem tests for now
2024-05-14 01:28:02 -04:00
chenyu 0fa57b8ce9
raise error if setitem tensors have requires_grad (#4575)
* raise error if setitem tensors have requires_grad

working on supporting this, first properly raises error

* NotImplementedError
2024-05-13 18:56:47 -04:00
Filip Brzek f7d08bd454
feat: add acc_dtype to einsum (#4571) 2024-05-13 14:02:07 -04:00
Szymon Ożóg d97d5a7689
Optimize PTX gated loads index calculation (#4304)
* WIP but working

* Cleanup

* Remove float4 pred and alt

* Cleanup

* this is somehow slowin it down

* Simplify

* add define var to ignore when optimizing gates

* Update assembly.py

* Test for optimizing gated loads

* Cleanup

* Fix NEG needed before if

* Remove unused parameters

* Update assembly.py

* Fix for cachable gone

---------

Co-authored-by: oz <oz@oz-MS-7B86.NAT.gliwice.vectranet.pl>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-13 10:14:01 -07:00
qazal c67b70ca67
small scheduler refactor (#4569)
* outputs

* consistent

* more style

* doesnt need tuple
2024-05-13 10:47:39 +03:00
qazal 77aa8659f5
use assign_targets in LazyOp creation (#4568)
* start

* correct error

* this is possible

* document it
2024-05-13 10:24:35 +03:00
qazal b0fa97e176
assert error detail in test_assign (#4567)
* use regex assert

* that shouldnt raise
2024-05-13 09:56:05 +03:00
chenyu 25ec40ca93
cleanup dtype of tensor creation from list (#4566) 2024-05-13 02:47:41 -04:00
qazal 4e1135a0bc
assign buffer read/write tests (#4565)
* simple tests

* more tests
2024-05-13 09:43:36 +03:00
George Hotz b660f60125
all uops are now cachable (#4564)
* all uops are now cachable

* cachable is gone
2024-05-12 22:34:35 -07:00
George Hotz 02327b8adf
simple stuff from new_uops branch (#4563) 2024-05-12 22:18:05 -07:00
ziereis f53a23d21e
Test for optim assertion (#4558)
* add test for assertion

* whitespace

* restore state

---------

Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>
2024-05-12 14:21:28 -07:00
wozeparrot d7670f8141
quantized llama multilazybuffer fix (#4557) 2024-05-12 14:19:21 -07:00
ziereis bcee4743ce
fix error message (#4556)
* fix error messgae

* typo

* add suggestion to fix error

---------

Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>
2024-05-12 12:35:51 -07:00
chenyu 01a0c1a948
slightly faster nf4 llama (#4542) 2024-05-12 14:24:42 -04:00
qazal 4c232dc0ae
refactor LoadOps scheduling (#4553)
* refactor

* op -> lop
2024-05-12 12:59:24 +03:00
qazal 3da152f0fe
scheduler docs 2 (#4551)
* docs

* delete cleanups
2024-05-12 12:15:39 +03:00
wozeparrot e07c7668b3
nf4 llama (#4540) 2024-05-11 22:22:34 -07:00
George Hotz 7a26bdac65
move scheduleitem to schedule.py (#4541)
* move scheduleitem to schedule.py

* don't need that type checking anymore
2024-05-11 21:13:04 -07:00
George Hotz 508e8a6666
add cpu objdump to LLVM/CLANG (#4537) 2024-05-11 14:28:44 -07:00
chenyu bed70b130c
mlperf bert getenv-able EVAL_STEP_FREQ (#4534) 2024-05-11 14:36:56 -04:00
George Hotz 328b083e66 lil profiling script 2024-05-11 11:02:44 -07:00
chenyu da10cf0be1
extra/threefry.py for mem usage (#4533)
for now it needs 8N mem to generate size N rand
2024-05-11 13:46:44 -04:00
chenyu 8a0fb3d765
delete old extra/autopad.py (#4532) 2024-05-11 13:06:10 -04:00
chenyu 04a4980a51
touchup bert script (#4531)
small adjustments, remove duplicated training setting and stop the script once target is hit
2024-05-11 13:02:02 -04:00
qazal 4871476a1e
move copy kernel to out of schedule ordering (#4530)
* delete from sorting

* move the logic
2024-05-11 14:44:44 +03:00
qazal 2fb564c125
multi reduce linearizer tests start (#4529)
* test_end_local

* test_early_end_local

* todos

* mean+std

* skip no locals
2024-05-11 14:06:40 +03:00
qazal 3cba22920f
test_linearizer_correctness (#4458)
* test helper

* uops asserts

* cleanup args

* nits
2024-05-11 13:02:08 +03:00
qazal b3d9fd48d0
infra for testing linearizer correctness (#4528)
* refactor outbufs

* delete helper
2024-05-11 12:10:33 +03:00
George Hotz 2f970a4fc2
all realize 2 (#4527)
* all realize 2

* tests fixup

* fix more tests

* fix openpilot

* fix tests

* unneeded
2024-05-10 22:43:09 -07:00
wozeparrot d2c347fc74
faster gather for bert (#4526) 2024-05-10 22:28:48 -07:00
George Hotz 922e6e056a hotfix: fix docs 2024-05-10 21:51:35 -07:00
George Hotz 347a3acb37
add renderer class (#4524)
* add renderer class

* tests pass

* fix pylint

* fix tensor cores
2024-05-10 21:40:02 -07:00
chenyu b00b6b16f0
fix TRAIN_BEAM and Tensor.training for mlperf bert (#4525)
also hard coded bert model config instead of looking up a file
2024-05-11 00:18:36 -04:00
chenyu 7fab8c9e17
add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit (#4523)
* add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit

2d symbolic mean in jit does not quite work, order of the variable inputs are not deterministic?

* skip
2024-05-10 23:19:55 -04:00
George Hotz 827058f030
update tests get_runner (#4522) 2024-05-10 20:09:22 -07:00
George Hotz a0448ff595
use copy kernel in schedule (#4520)
* use copy kernel in schedule

* imports
2024-05-10 15:30:33 -07:00
chenyu b15e2309bd
verbose error message in getitem (#4519)
* verbose error message in getitem

still hard to undetstand, at least it prints what it's trying to expand

* sure

* :
2024-05-10 17:25:41 -04:00
George Hotz d438d5698d
bring buffer back to device (#4517) 2024-05-10 11:22:31 -07:00
qazal a2b707a3eb
scheduler comments 1 (#4515) 2024-05-10 20:44:28 +03:00
George Hotz 4eef1ee9bf
move renderer into options (#4514)
* move renderer into options

* fix tests

* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz 7c630a9a53 hotfix: fix llama spacing + fix hcq 2024-05-10 15:10:13 +00:00
George Hotz 58e7256ce9
restore hcq graph (#4513)
* Reapply "hcq graph (#4380)" (#4512)

This reverts commit 06c1e7498e.

* bring back hcq graph
2024-05-10 07:45:05 -07:00
George Hotz 06c1e7498e
Revert "hcq graph (#4380)" (#4512)
This reverts commit 84a2e2b8c1.
2024-05-10 07:18:09 -07:00
nimlgen 84a2e2b8c1
hcq graph (#4380)
* start hcq graph

* hack-fix sync on amd

* nv

* fix nv

* multigrah

* fixes

* temp fix for graph

* this is not needed

* fix

* cleaner

* linetr

* fix none

* faster cuda copy

* faster amd copy

* temp nv fixes

* alloc on gpu

* exp: faster amd

* Revert "exp: faster amd"

This reverts commit 2e4cfd1f7d8a33634c50fb5655cff1b40269d28c.

* revert, unrelated

* not in this pr

* linter
2024-05-10 07:15:12 -07:00
qazal 2b7ab60584
dfs fusion (#4491)
* use continue

* simplify

* flip

* track r

* derive forced_realize

* scheduler needs comments
2024-05-10 17:00:48 +03:00