Commit Graph

1783 Commits

Author SHA1 Message Date
nimlgen eb9689336e
nv mockgpu (#4600)
* mockgpu nv

* works

* comment that out

* fix merge

* setup gpuocelot

* install packages

* not run all of them

* passes

* fix ci

* almost

* should pass

* linter

* linter 2

* try this?

* ugn, not supported

* ci

* remove ticket from description

* better descs
2024-05-15 23:46:08 +03:00
chenyu 3c11ca452e
skip CLANG test casts between double and half for now (#4609)
start breaking after github CI image update
2024-05-15 16:17:06 -04:00
George Hotz 5ba611787d
move image into tensor.py. delete features (#4603)
* move image into tensor.py

* change setup.py

* openpilot tests need pythonpath now
2024-05-15 10:50:25 -07:00
qazal cd4d7e18c7
_recurse_lb small cleanup (#4601)
* minor cleanups

* comments

* extend env in replay
2024-05-15 19:10:42 +03:00
Ahmed Harmouche 662bca8134
Split UnaryOps.CAST into CAST and BITCAST (#4487)
* Separate cast and bitcast

* Fix lint

* No more arg[0]

* Revert "No more arg[0]"

This reverts commit dee6911335513f092fe2cbb9684e8a9d26aad964.

* CAST/BITCAST arg is the dtype only, no more tuple

* No image bitcast, regenerate dataset

* Small fixes
2024-05-15 11:43:31 -04:00
qazal a4a23c40a0
test masked assign views (#4599)
* possible masked

* not contiguous mask
2024-05-15 15:06:48 +03:00
George Hotz ff64bcab69
move graph/search to engine (#4596) 2024-05-14 23:12:59 -07:00
George Hotz afa9753d39
ruff cleanup (#4594)
* check editor config

* no editorconfig, it doesn't work

* ruff cleanups
2024-05-14 21:16:14 -07:00
George Hotz fd02ab1e8b
move disassemblers and openpilot (#4592)
* move disassemblers and openpilot

* delete junk

* put that in pre-commit

* fixup readme
2024-05-14 19:30:02 -07:00
chenyu 2b0ee74bb6
lshift and rshift (#4591) 2024-05-14 19:16:31 -04:00
qazal 355e1c135c
pad fusion tests (#4570)
* what breaks

* Revert "what breaks"

This reverts commit e79f679283c853cbadf09bf41fd18bb9601a83ee.

* simplest case

* one unsafe op

* expand+pad, shrink+pad

* safe case

* refactor
2024-05-14 20:34:46 +03:00
chenyu 7afca52796
replace pow in LAMB by tracking b1**t and b2**t per step (#4582)
* replace pow in LAMB by tracking b1**t and b2**t per step

* remove t, add [self.b1_t, self.b2_t] to return

* adam has one less kernel
2024-05-14 13:08:22 -04:00
nimlgen 9b02aef45a
remove rhip (#4579)
* remove rhip

* remove hip runner
2024-05-14 17:58:19 +03:00
Szymon Ożóg 5eb81ff764
Fix speed compare script (#4581)
* Fix speed compare script

* Update speed_compare_cuda_ptx.py

* Update speed_compare_cuda_ptx.py

* Remove unused function
2024-05-14 17:47:03 +03:00
nimlgen 2131556c2c
amd mockgpu (#4535)
* start mock amd gpu

* virt files

* cleaner

* init ci

* small fixes

* linter

* better?

* ugh

* linter

* fix

* diable some

* run shorter

* fixes

* add hcq test

* fix

* fix cmd revert
2024-05-14 14:28:04 +03:00
geohotstan 089eeec271
setitem in-place operator tests (#4577)
* tests and error

* rename to in-place

* add a note

* more comments

* more comments

* disable folded advanced setitem tests for now
2024-05-14 01:28:02 -04:00
chenyu 0fa57b8ce9
raise error if setitem tensors have requires_grad (#4575)
* raise error if setitem tensors have requires_grad

working on supporting this, first properly raises error

* NotImplementedError
2024-05-13 18:56:47 -04:00
Filip Brzek f7d08bd454
feat: add acc_dtype to einsum (#4571) 2024-05-13 14:02:07 -04:00
Szymon Ożóg d97d5a7689
Optimize PTX gated loads index calculation (#4304)
* WIP but working

* Cleanup

* Remove float4 pred and alt

* Cleanup

* this is somehow slowin it down

* Simplify

* add define var to ignore when optimizing gates

* Update assembly.py

* Test for optimizing gated loads

* Cleanup

* Fix NEG needed before if

* Remove unused parameters

* Update assembly.py

* Fix for cachable gone

---------

Co-authored-by: oz <oz@oz-MS-7B86.NAT.gliwice.vectranet.pl>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-13 10:14:01 -07:00
qazal 77aa8659f5
use assign_targets in LazyOp creation (#4568)
* start

* correct error

* this is possible

* document it
2024-05-13 10:24:35 +03:00
qazal b0fa97e176
assert error detail in test_assign (#4567)
* use regex assert

* that shouldnt raise
2024-05-13 09:56:05 +03:00
qazal 4e1135a0bc
assign buffer read/write tests (#4565)
* simple tests

* more tests
2024-05-13 09:43:36 +03:00
George Hotz b660f60125
all uops are now cachable (#4564)
* all uops are now cachable

* cachable is gone
2024-05-12 22:34:35 -07:00
George Hotz 02327b8adf
simple stuff from new_uops branch (#4563) 2024-05-12 22:18:05 -07:00
ziereis f53a23d21e
Test for optim assertion (#4558)
* add test for assertion

* whitespace

* restore state

---------

Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>
2024-05-12 14:21:28 -07:00
wozeparrot d7670f8141
quantized llama multilazybuffer fix (#4557) 2024-05-12 14:19:21 -07:00
George Hotz 7a26bdac65
move scheduleitem to schedule.py (#4541)
* move scheduleitem to schedule.py

* don't need that type checking anymore
2024-05-11 21:13:04 -07:00
George Hotz 508e8a6666
add cpu objdump to LLVM/CLANG (#4537) 2024-05-11 14:28:44 -07:00
George Hotz 328b083e66 lil profiling script 2024-05-11 11:02:44 -07:00
qazal 2fb564c125
multi reduce linearizer tests start (#4529)
* test_end_local

* test_early_end_local

* todos

* mean+std

* skip no locals
2024-05-11 14:06:40 +03:00
qazal 3cba22920f
test_linearizer_correctness (#4458)
* test helper

* uops asserts

* cleanup args

* nits
2024-05-11 13:02:08 +03:00
qazal b3d9fd48d0
infra for testing linearizer correctness (#4528)
* refactor outbufs

* delete helper
2024-05-11 12:10:33 +03:00
George Hotz 2f970a4fc2
all realize 2 (#4527)
* all realize 2

* tests fixup

* fix more tests

* fix openpilot

* fix tests

* unneeded
2024-05-10 22:43:09 -07:00
George Hotz 347a3acb37
add renderer class (#4524)
* add renderer class

* tests pass

* fix pylint

* fix tensor cores
2024-05-10 21:40:02 -07:00
chenyu b00b6b16f0
fix TRAIN_BEAM and Tensor.training for mlperf bert (#4525)
also hard coded bert model config instead of looking up a file
2024-05-11 00:18:36 -04:00
chenyu 7fab8c9e17
add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit (#4523)
* add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit

2d symbolic mean in jit does not quite work, order of the variable inputs are not deterministic?

* skip
2024-05-10 23:19:55 -04:00
George Hotz 827058f030
update tests get_runner (#4522) 2024-05-10 20:09:22 -07:00
George Hotz d438d5698d
bring buffer back to device (#4517) 2024-05-10 11:22:31 -07:00
George Hotz 4eef1ee9bf
move renderer into options (#4514)
* move renderer into options

* fix tests

* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz 89e119bc58
move Allocator to buffer.py (#4502)
* move Allocator to buffer.py

* move those to realize

* memory file

* cleanup
2024-05-09 19:45:56 -07:00
George Hotz 1e843d495e
cleaning up search with Program (#4500)
* cleaning up search

* fix tests

* test fix

* minor compiler cleanup
2024-05-09 19:01:53 -07:00
chenyu d3dc332c2e
Tensor.logsumexp (#4442)
the subtract max part should share with safe softmax

cleaner
2024-05-09 20:49:06 -04:00
George Hotz c9e84ed0da
refactor to Program class (#4476)
* refactor to Program class

* switch to Program

* fix tests

* smaller diff

* self.p

* more tests

* fix metal test

* tests

* fix openpilot

* move that to linearizer

* p.launchdims
2024-05-09 17:29:07 -07:00
nimlgen a2e2ba380c
nv tune shmem size (#4495)
* nv tune shmem size

* compare them

* linter

* linter2
2024-05-10 00:35:01 +03:00
nimlgen e14d5b6fd7
nv fix oob qmd ptr (#4478)
* nv fix oob qmd ptr

* test kernargs no oob
2024-05-08 23:11:04 +03:00
chenyu 36a1f38049
lazy folding: mul -1 is neg, and neg neg is noop (#4472) 2024-05-08 01:52:22 -04:00
chenyu c508eb7425
revert the removal of CAST_BEFORE_VIEW (#4471)
this brings most of the memory gain for resnet back.
2024-05-08 00:14:29 -04:00
chenyu 7eb035e7c5
stronger test case for half mean overflow (#4470) 2024-05-07 22:40:09 -04:00
chenyu ca7300c783
fix half mean and its backward (#4469)
* fix half mean and its backward

cast to sum_acc_type, sum, div, then cast back

* mean dtype tests
2024-05-07 21:46:41 -04:00
Francis Lam 7da1b41f38
fuzz_linearizer: add FUZZ_REQUIRE_TC option to require TC in opts (#4468)
useful for checking late opts after TC such as GROUP, etc.
2024-05-07 17:14:21 -04:00