Commit Graph

4151 Commits

Author SHA1 Message Date
chenyu bf3583f9b2
use Buffer.ensure_allocated in search _ensure_buffer_alloc (#4132) 2024-04-10 13:11:50 -04:00
George Hotz a35375df85
run_schedule is so simple now (#4130) 2024-04-10 09:49:30 -07:00
George Hotz 86bd2eb500 hotfix: update copy_from_fd for new DiskBuffer 2024-04-10 15:41:06 +00:00
George Hotz ee457a4b20
no more underlying diskbuffer, that's just the device (#4129) 2024-04-10 08:32:25 -07:00
geohotstan fe88591890
update onnx to 1.16.0 (#4127)
* update

* pass tests and skip tests
2024-04-10 11:19:13 -04:00
chenyu 6bbbeb93ac
skip a few clang test that took > 30 seconds in CI (#4126)
* skip slow CLANG test test_train_cifar

* skip those too

* and that

* only CI

* one more
2024-04-10 02:00:34 -04:00
George Hotz 08ddeb5685
create schedule has global vars (#4125)
* abstractions3 is currently wishful thinking

* create_schedule_with_vars
2024-04-09 21:42:16 -07:00
George Hotz 216eb235e5 hotfix: cast mnist to float 2024-04-09 19:30:03 -07:00
George Hotz fea774f669
spend 5 lines to bring mnist into the repo (#4122) 2024-04-09 19:24:57 -07:00
qazal 42edae8935
pickle schedules (#4114)
* pickle schedules

* Update test_pickle.py

* Update test_pickle.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-09 13:47:25 -07:00
Felix Kuehling 38ae4194a6
Fixes for ops_kfd (#4105)
* kfd_ops: Fix GPU node discovery on NUMA systems

Ignore potentially multiple CPU NUMA nodes and any GPU nodes that are
not accessible because of device cgroups.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>

* kfd_ops: Format the GFX arch target name correctly

The target version in sysfs properties is a decimal representation with
two digits per component.

The format for LLVM GFX target names is a bit quirky for historical
reasons. It uses one digit for the minor version and stepping. When it
ran out of decimal digits for the stepping on gfx90X it started using
hexadecimal there. But the major version is still decimal and went
double digit in GFX10.

Make sure to parse and format it accordingly for all supported GPUs.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>

---------

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
2024-04-09 13:21:21 -07:00
George Hotz 10dbf90b2c hotfix: test speed 2024-04-09 13:20:39 -07:00
George Hotz ae849d12d7
numpy device + pickle it (#4120) 2024-04-09 13:19:30 -07:00
chenyu 1ef9c50fd7
Update ssa input order and annotate types in cstyle and assembly (#4117)
variable prefix is never optional (removed the default "t") and UOp can be optional (added the default None).
2024-04-09 13:10:29 -04:00
geohotstan 15f2f39658
conceptually simpler fancy index (#3335)
* init

* add failed case

* fix: temp comment out MULACC cast

* is this right?

* add test case

* oops, forgot to get rid of temp test

* WOOOOOO TOOK OUT 2 TRANSPOSES IN GATHER YAY

* cleaner

* comment cleanup

* update docs

* resolve conflict

* oops

* SUPA FAST

* comment out a test

* del some print statements

* use new broadcast stuff

* more clean up

* move try except

* skip fancy indexing for python backend test_ops
2024-04-09 11:18:04 -04:00
David González Martínez 980124a605
add lerp operation to tensor (#4102)
* feat: add lerp operation to tensor

* fix

* style: fit in one line:

* tests: test backward for lerp
2024-04-08 17:03:27 -07:00
Francis Lam 46850a0269
search: add a BEAM_COMPARE env to optionally not compare to hc/tc (#4107)
* search: add a BEAM_COMPARE env to optionally not compare to hc/tc

setting BEAM_COMPARE=0 will prevent additional memory allocation
needed to do the timing tests assuming the BEAM result is in
the diskcache.

* change to optionally use Buffer.allocate
2024-04-08 18:54:01 -04:00
qazal c390828f61
refactor outbufs (#4112) 2024-04-08 14:54:10 -07:00
andresgit 7fd12aba85
graph remove input buffer references (#4100)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-08 16:49:16 -04:00
chenyu 078d841479
add SPLIT_REDUCEOP to disable reduce split (#4115)
verify with `SPLIT_REDUCEOP=0 BIG=2 MPS=1 python3 -m pytest -rA test/test_speed_v_torch.py -k sum`. 10X slower on mac
2024-04-08 16:31:08 -04:00
qazal eea42d864f
account for all outputs (#4113) 2024-04-08 10:04:19 -07:00
chenyu dbd39ab78a
setitem support setting python const (#4111) 2024-04-08 11:37:50 -04:00
chenyu f8dc82a8a7
use single tensor for llama kv chache (#4108)
similar to optimization in gpt2
2024-04-08 00:38:32 -04:00
chenyu 92c0675ccf
setitem initial support (#4093)
* wip setitem

it's an eager assign to output shapetracker view

* cleanups and tests

* more cleanups
2024-04-07 20:35:22 -04:00
geohotstan 183708b3fd
broadcast expand to match torch (#4085)
* initial version

* heh gimme grrrreen

* version 2

* clean ups

* some test confusion

* fix onnx

* rename to _broadcast_tensors

* improved errors and test

* fixed?

* some test fixup

* version 3 lol

* comments

* cleaner

* add failure test for expand to 0 test

* 1 more assertRaises test

* make err msg better

* also rewrite the expand onnx op? :s
2024-04-07 16:23:13 -04:00
uuuvn 2b81d9b334
Fix broken test (#4104) 2024-04-07 12:02:12 -04:00
chenyu 9a95d87366
metal CI run llama with 4 shards (#4103)
this can catch multi tensor issue on mac.
2024-04-07 11:04:08 -04:00
George Hotz 444d2a7487 hotfix: fix SDMA read_pointer_address in KFD 2024-04-07 13:13:15 +00:00
uuuvn bb7567b365
Fix metal (#4101) 2024-04-07 05:21:19 -07:00
chenyu bdbcac67f1
assign jit test case with other tensor as input (#4098)
hmm it works
2024-04-06 14:41:14 -04:00
George Hotz e4a1858471
revert command queue (#4097) 2024-04-06 08:58:18 -07:00
George Hotz 97c402d69e
use imagenet spawn (#4096) 2024-04-06 08:34:10 -07:00
George Hotz fffd9b05f5
mock mnist data for imagenet trainer (#4095)
* mock mnist data for imagenet

* move print and test

* needed to reshape
2024-04-06 08:08:40 -07:00
George Hotz 8739d33fe9
kfd: disable copy_from_fd while debugging (#4091)
* kfd: disable copy_from_fd while debugging

* increase timeout to a minute
2024-04-05 18:02:58 -07:00
George Hotz 93824e59eb
support MOCKDATA=1 for resnet (#4090)
* mockdata for resnet

* fix eval, revert hsa
2024-04-05 17:19:18 -07:00
George Hotz 164329a8ea
address kfd feedback (#4087)
* address kfd feedback

* signals cleanup

* signals cleanup

* handle 2 doorbell pages correctly

* signal reset cleanup

* signals cleanup

* more GTT

* cleanups

* minor cleanups
2024-04-05 15:24:41 -07:00
geohotstan dafa42e864
clean up (#4081)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-05 11:57:44 -04:00
Akshit Talwar 750ecf8fef
replace slice by pad/shrink in _pool (#4082) 2024-04-05 11:47:22 -04:00
George Hotz a337922c44
more work on kfd (#4079)
* more work on kfd

* fix multitensor test on kfd

* stuff
2024-04-05 08:36:36 -07:00
chenyu e7ff5102cf
failed test in test_pattern_matcher (#4080)
something about the PTX rewrite is incorrect that it has duplicated rewritten uops
2024-04-05 02:53:50 -04:00
chenyu a023a1ed87
update github action to actions/cache@v4 (#4077)
get rid of warning `Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/cache@v3.`
2024-04-04 22:24:26 -04:00
George Hotz 28ec6c67be hotfix: hlb_cifar KFD works 2024-04-05 02:19:14 +00:00
chenyu 1de9778949
import Buffer and BufferOption from tinygrad.buffer (#4076) 2024-04-04 22:12:23 -04:00
chenyu 9e0ebf8979
remove dtype from FlopCounter (#4075)
the annoying thing to remove all FlopCounter is that for device that does not support local, matmul index alu is huge.
we can remove the dtype first.

sneak in updating `ruff` command to `ruff check`
2024-04-04 21:23:28 -04:00
George Hotz 3de855ea50
don't use SVM memory in KFD (#4072)
* don't use SVM memory in KFD

* copy from fd

* cleanups

* transfer

* hacks

* ops_hsa

* tighter API
2024-04-04 17:33:21 -07:00
chenyu 5e6e6c9a67
use ConstType in various const function type hint (#4074) 2024-04-04 20:32:07 -04:00
chenyu c1cffed1df
add LazyOp.dtype (#4073)
an inferred cached_property.
removed all cases that use get_lazyop_info just to get the dtype of an op.
prereq to remove InterpretedFlopCounter
2024-04-04 17:38:19 -04:00
chenyu f836d6a03f
is_unrealized_unpadded_const -> is_unrealized_unmasked_const (#4071)
realized #3580 was doing the same thing. unmasked is more accurate
2024-04-04 14:25:17 -04:00
Szymon Ożóg 82b7b9655f
test for dtype set (#4069) 2024-04-04 11:24:33 -04:00
geohotstan 1a1dd1c1a7
add and enable tests for indexing const folding (#4068)
* enable test in test_indexing

* added tests

* rename stuff

* del a test case cuz it's loadops.copy
2024-04-04 10:46:28 -04:00