nimlgen
eb9689336e
nv mockgpu ( #4600 )
...
* mockgpu nv
* works
* comment that out
* fix merge
* setup gpuocelot
* install packages
* not run all of them
* passes
* fix ci
* almost
* should pass
* linter
* linter 2
* try this?
* ugn, not supported
* ci
* remove ticket from description
* better descs
2024-05-15 23:46:08 +03:00
chenyu
3c11ca452e
skip CLANG test casts between double and half for now ( #4609 )
...
start breaking after github CI image update
2024-05-15 16:17:06 -04:00
George Hotz
5ba611787d
move image into tensor.py. delete features ( #4603 )
...
* move image into tensor.py
* change setup.py
* openpilot tests need pythonpath now
2024-05-15 10:50:25 -07:00
qazal
cd4d7e18c7
_recurse_lb small cleanup ( #4601 )
...
* minor cleanups
* comments
* extend env in replay
2024-05-15 19:10:42 +03:00
Ahmed Harmouche
662bca8134
Split UnaryOps.CAST into CAST and BITCAST ( #4487 )
...
* Separate cast and bitcast
* Fix lint
* No more arg[0]
* Revert "No more arg[0]"
This reverts commit dee6911335513f092fe2cbb9684e8a9d26aad964.
* CAST/BITCAST arg is the dtype only, no more tuple
* No image bitcast, regenerate dataset
* Small fixes
2024-05-15 11:43:31 -04:00
qazal
a4a23c40a0
test masked assign views ( #4599 )
...
* possible masked
* not contiguous mask
2024-05-15 15:06:48 +03:00
George Hotz
ff64bcab69
move graph/search to engine ( #4596 )
2024-05-14 23:12:59 -07:00
George Hotz
afa9753d39
ruff cleanup ( #4594 )
...
* check editor config
* no editorconfig, it doesn't work
* ruff cleanups
2024-05-14 21:16:14 -07:00
George Hotz
fd02ab1e8b
move disassemblers and openpilot ( #4592 )
...
* move disassemblers and openpilot
* delete junk
* put that in pre-commit
* fixup readme
2024-05-14 19:30:02 -07:00
chenyu
2b0ee74bb6
lshift and rshift ( #4591 )
2024-05-14 19:16:31 -04:00
qazal
355e1c135c
pad fusion tests ( #4570 )
...
* what breaks
* Revert "what breaks"
This reverts commit e79f679283c853cbadf09bf41fd18bb9601a83ee.
* simplest case
* one unsafe op
* expand+pad, shrink+pad
* safe case
* refactor
2024-05-14 20:34:46 +03:00
chenyu
7afca52796
replace pow in LAMB by tracking b1**t and b2**t per step ( #4582 )
...
* replace pow in LAMB by tracking b1**t and b2**t per step
* remove t, add [self.b1_t, self.b2_t] to return
* adam has one less kernel
2024-05-14 13:08:22 -04:00
nimlgen
9b02aef45a
remove rhip ( #4579 )
...
* remove rhip
* remove hip runner
2024-05-14 17:58:19 +03:00
Szymon Ożóg
5eb81ff764
Fix speed compare script ( #4581 )
...
* Fix speed compare script
* Update speed_compare_cuda_ptx.py
* Update speed_compare_cuda_ptx.py
* Remove unused function
2024-05-14 17:47:03 +03:00
nimlgen
2131556c2c
amd mockgpu ( #4535 )
...
* start mock amd gpu
* virt files
* cleaner
* init ci
* small fixes
* linter
* better?
* ugh
* linter
* fix
* diable some
* run shorter
* fixes
* add hcq test
* fix
* fix cmd revert
2024-05-14 14:28:04 +03:00
geohotstan
089eeec271
setitem in-place operator tests ( #4577 )
...
* tests and error
* rename to in-place
* add a note
* more comments
* more comments
* disable folded advanced setitem tests for now
2024-05-14 01:28:02 -04:00
chenyu
0fa57b8ce9
raise error if setitem tensors have requires_grad ( #4575 )
...
* raise error if setitem tensors have requires_grad
working on supporting this, first properly raises error
* NotImplementedError
2024-05-13 18:56:47 -04:00
Filip Brzek
f7d08bd454
feat: add acc_dtype to einsum ( #4571 )
2024-05-13 14:02:07 -04:00
Szymon Ożóg
d97d5a7689
Optimize PTX gated loads index calculation ( #4304 )
...
* WIP but working
* Cleanup
* Remove float4 pred and alt
* Cleanup
* this is somehow slowin it down
* Simplify
* add define var to ignore when optimizing gates
* Update assembly.py
* Test for optimizing gated loads
* Cleanup
* Fix NEG needed before if
* Remove unused parameters
* Update assembly.py
* Fix for cachable gone
---------
Co-authored-by: oz <oz@oz-MS-7B86.NAT.gliwice.vectranet.pl>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-13 10:14:01 -07:00
qazal
77aa8659f5
use assign_targets in LazyOp creation ( #4568 )
...
* start
* correct error
* this is possible
* document it
2024-05-13 10:24:35 +03:00
qazal
b0fa97e176
assert error detail in test_assign ( #4567 )
...
* use regex assert
* that shouldnt raise
2024-05-13 09:56:05 +03:00
qazal
4e1135a0bc
assign buffer read/write tests ( #4565 )
...
* simple tests
* more tests
2024-05-13 09:43:36 +03:00
George Hotz
b660f60125
all uops are now cachable ( #4564 )
...
* all uops are now cachable
* cachable is gone
2024-05-12 22:34:35 -07:00
George Hotz
02327b8adf
simple stuff from new_uops branch ( #4563 )
2024-05-12 22:18:05 -07:00
ziereis
f53a23d21e
Test for optim assertion ( #4558 )
...
* add test for assertion
* whitespace
* restore state
---------
Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>
2024-05-12 14:21:28 -07:00
wozeparrot
d7670f8141
quantized llama multilazybuffer fix ( #4557 )
2024-05-12 14:19:21 -07:00
George Hotz
7a26bdac65
move scheduleitem to schedule.py ( #4541 )
...
* move scheduleitem to schedule.py
* don't need that type checking anymore
2024-05-11 21:13:04 -07:00
George Hotz
508e8a6666
add cpu objdump to LLVM/CLANG ( #4537 )
2024-05-11 14:28:44 -07:00
George Hotz
328b083e66
lil profiling script
2024-05-11 11:02:44 -07:00
qazal
2fb564c125
multi reduce linearizer tests start ( #4529 )
...
* test_end_local
* test_early_end_local
* todos
* mean+std
* skip no locals
2024-05-11 14:06:40 +03:00
qazal
3cba22920f
test_linearizer_correctness ( #4458 )
...
* test helper
* uops asserts
* cleanup args
* nits
2024-05-11 13:02:08 +03:00
qazal
b3d9fd48d0
infra for testing linearizer correctness ( #4528 )
...
* refactor outbufs
* delete helper
2024-05-11 12:10:33 +03:00
George Hotz
2f970a4fc2
all realize 2 ( #4527 )
...
* all realize 2
* tests fixup
* fix more tests
* fix openpilot
* fix tests
* unneeded
2024-05-10 22:43:09 -07:00
George Hotz
347a3acb37
add renderer class ( #4524 )
...
* add renderer class
* tests pass
* fix pylint
* fix tensor cores
2024-05-10 21:40:02 -07:00
chenyu
b00b6b16f0
fix TRAIN_BEAM and Tensor.training for mlperf bert ( #4525 )
...
also hard coded bert model config instead of looking up a file
2024-05-11 00:18:36 -04:00
chenyu
7fab8c9e17
add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit ( #4523 )
...
* add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit
2d symbolic mean in jit does not quite work, order of the variable inputs are not deterministic?
* skip
2024-05-10 23:19:55 -04:00
George Hotz
827058f030
update tests get_runner ( #4522 )
2024-05-10 20:09:22 -07:00
George Hotz
d438d5698d
bring buffer back to device ( #4517 )
2024-05-10 11:22:31 -07:00
George Hotz
4eef1ee9bf
move renderer into options ( #4514 )
...
* move renderer into options
* fix tests
* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz
89e119bc58
move Allocator to buffer.py ( #4502 )
...
* move Allocator to buffer.py
* move those to realize
* memory file
* cleanup
2024-05-09 19:45:56 -07:00
George Hotz
1e843d495e
cleaning up search with Program ( #4500 )
...
* cleaning up search
* fix tests
* test fix
* minor compiler cleanup
2024-05-09 19:01:53 -07:00
chenyu
d3dc332c2e
Tensor.logsumexp ( #4442 )
...
the subtract max part should share with safe softmax
cleaner
2024-05-09 20:49:06 -04:00
George Hotz
c9e84ed0da
refactor to Program class ( #4476 )
...
* refactor to Program class
* switch to Program
* fix tests
* smaller diff
* self.p
* more tests
* fix metal test
* tests
* fix openpilot
* move that to linearizer
* p.launchdims
2024-05-09 17:29:07 -07:00
nimlgen
a2e2ba380c
nv tune shmem size ( #4495 )
...
* nv tune shmem size
* compare them
* linter
* linter2
2024-05-10 00:35:01 +03:00
nimlgen
e14d5b6fd7
nv fix oob qmd ptr ( #4478 )
...
* nv fix oob qmd ptr
* test kernargs no oob
2024-05-08 23:11:04 +03:00
chenyu
36a1f38049
lazy folding: mul -1 is neg, and neg neg is noop ( #4472 )
2024-05-08 01:52:22 -04:00
chenyu
c508eb7425
revert the removal of CAST_BEFORE_VIEW ( #4471 )
...
this brings most of the memory gain for resnet back.
2024-05-08 00:14:29 -04:00
chenyu
7eb035e7c5
stronger test case for half mean overflow ( #4470 )
2024-05-07 22:40:09 -04:00
chenyu
ca7300c783
fix half mean and its backward ( #4469 )
...
* fix half mean and its backward
cast to sum_acc_type, sum, div, then cast back
* mean dtype tests
2024-05-07 21:46:41 -04:00
Francis Lam
7da1b41f38
fuzz_linearizer: add FUZZ_REQUIRE_TC option to require TC in opts ( #4468 )
...
useful for checking late opts after TC such as GROUP, etc.
2024-05-07 17:14:21 -04:00