Commit Graph

1208 Commits

Author SHA1 Message Date
George Hotz c81ce9643d
move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
chenyu 8291986959
Variable.sum -> Node.sum, Variable.ands -> Node.ands (#2961) 2024-01-01 16:21:28 -05:00
chenyu 3d720b5761
move expand_idx, iter_idxs and expand_node from symbolic to linearizer (#2959) 2024-01-01 14:41:21 -05:00
George Hotz 56f44bd10e
move the compiler cache to be global (#2957)
* move the compiler cache to be global

* remove non robust test

* remove dead code
2024-01-01 10:59:56 -08:00
George Hotz 063f465604
simpler webgpu (#2956)
* simpler webgpu

* skip that test
2024-01-01 10:28:59 -08:00
chenyu 50f2e31d26
cleanup float4 grouping in global_load and global_store (#2942)
* cleanup float4 grouping in global_load and global_store

* fix test decorator
2023-12-27 14:10:04 -05:00
chenyu 54629b56d2
minor cleanup in kernel and linearizer (#2937)
* minor cleanup in kernel and linearizer

less long line, spaces and colocate variables

* no deadline in hypothesis test
2023-12-26 12:05:32 -05:00
chenyu 820f2e054e
fix PADTO optimization (#2935)
the correct condition is that PADTO cannot be applied to reduce axis, not Reduce.MAX in ops.
even for Reduce.SUM it's possible that the reduce axis had a div before, and the padded 0 became inf then sum over it is incorrect.
2023-12-25 22:52:49 -05:00
qazal dca5e4fe74
tensor == tensor should be bool (#2916)
* return bool

* add tests to the type spec

* fix multinomial

* fix tril

* fix round

* fix NegativeLogLikelihoodLoss

* rm debug

* webgpu

* more webgpu

* bitwise or for adding two bools

* onnx ops dont need to cast anymore

* Revert "bitwise or for adding two bools"

This reverts commit b413babffa4d93c5cc94a252cb7086b9a899a437.

* workaround for metal neg

* just the tests in the type spec
2023-12-25 12:38:47 -05:00
chenyu 8a8aed23d2
test dtypes of return values of cumsum, argmax/min, multinomial (#2933)
* test dtypes of return values of cumsum, argmax/min, multinomial

cumsum behaves like sum, and functions that return an index return in dtypes.default_int

* because webgpu is different
2023-12-25 11:33:17 -05:00
chenyu 1fb815e77e
hotfix fix coder. RMSNorm cannot have float16 input (#2932)
* hotfix fix coder. RMSNorm cannot have float16 input

* update real world test due to new kernels

* more type casts
2023-12-25 02:28:11 -05:00
Will 016aebcd84
Fixed Tensor.randint() not accepting tuple shapes (#2923)
* ww/Fixed Tensor.randint() to accept shape tuples ()

* ww/Wrote a test to cover this typo

* ww/Updated Tensor random objects to optionally take (,) or *() to be more consistent

* ww/no lint no worries

* ww/Made peace with linter

* ww/Added new line can't reduce line size without reducing readablitity

* ww/reverted to using .mul
2023-12-24 20:32:26 -05:00
Isalia20 8de1fc2539
Einsum space fix (#2927)
* space removal in formula and a single test to cover it

* space in torch einsum as well

* replacing spaces in a var formula to support truncating all the spaces
2023-12-24 01:23:27 -05:00
chenyu b55b55d56e
use at least int32 and uint32 for sum output (#2926)
* use at least int32 and uint32 for sum output

* use the correct type for acc

* fix opencl

* llvm mulacc
2023-12-24 01:14:54 -05:00
chenyu 089703a390
cleanup test_dtype_alu (#2919)
wrapped long lines and lowered atol for METAL.sin to 2 since atol of two sins are bounded by 2
2023-12-22 17:29:31 -05:00
chenyu 50927defad
s/lazydata.realized/lazydata.base.realized/g (#2914)
* s/lazydata.realized/lazydata.base.realized/g

* not that
2023-12-22 14:45:13 -05:00
chenyu 2783e1b50d
bugfix Tensor.item when it's unbased (#2913)
it's possible for numel 1 tensor lazydata to be unbased and should call lazydata.base.realized
2023-12-22 13:50:06 -05:00
Oleg Rybalko c3133adb8c
Disk shm refactor (#2912)
* better support for platform dependent flags

* osx test support

* removed unused import and made line length <150

* changed osx ci shm

* lstrip in case SharedMemory._name is passed
2023-12-22 09:23:37 -08:00
chenyu 3855432265
don't use numpy to create Tensor(None) (#2909)
* don't use numpy to create Tensor(None)

empty suffices

* parentheses
2023-12-22 01:07:44 -05:00
chenyu 50cfb1fb3a
update onnx model links (#2908)
updated in https://github.com/onnx/models/pull/644
2023-12-22 00:19:41 -05:00
chenyu 1bbeb3fe2f
remove the different rtol / atol for openpilot CUDA in benchmark (#2907)
not sure what the issue was but seems to be fixed on master
2023-12-21 22:23:39 -05:00
chenyu a543d8bea8
fuzz default dtypes for some test_dtype tests (#2906)
* fuzz default dtypes for some test_dtype tests

* ocd

* setUp and tearDown
2023-12-21 22:00:21 -05:00
George Hotz 5cac6338a4
apply the multitensor optimizations in lazy.py (#2901)
* apply the multitensor optimizations in lazy.py

* less lines

* hack for webgpu

* save a line
2023-12-21 13:55:49 -08:00
chenyu 5bf43c9634
reenable one onnx test failed due to dtype (#2902) 2023-12-21 15:50:02 -05:00
George Hotz 193109a88c hotfix: compare on ids 2023-12-20 23:47:50 -08:00
George Hotz f6c7833f9f
fast compare for lazyop (#2893) 2023-12-20 23:32:27 -08:00
George Hotz 41b2a25be6
Fix exponential behavior in lazyops (#2890)
* add cache to ast_parse and lazyop builder

* add caches
2023-12-20 22:06:50 -08:00
George Hotz 8c4a0f8e15
Fix int child count (#2882)
* pad ops broke coder

* that contiguous fixes it

* Update lazy.py

* recursive add

* fix all

* revert that

* todo test
2023-12-20 21:06:27 -08:00
George Hotz 7da2325dc7
get_lazyops() -> lazyops (#2884)
* get_lazyops() -> lazyops

* don't compare empty mem
2023-12-20 18:04:49 -08:00
George Hotz e1861ab65e
remove realize from optimizer (#2880)
* remove realize from optimizer

* one still needed

* opt realize
2023-12-20 16:42:41 -08:00
George Hotz 1765849937
new lazy, benchmark (#2878)
* lazy rewrite, try 2

* min fix tests

* pass contig test

* put broken pads back

* move that to realize

* no contig child fixes array packing

* so wrong

* now that's correct

* base children

* fix bind issues

* disable to_image_idx

* fix tests

* that failure shouldn't break other tests

* more fixes

* fix torch

* skip failing tests in CI

* 1e-7

* half is broken

* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
Peter Cawley dae8976889
Fix reshape merging with masks (#2877) 2023-12-20 14:00:58 -08:00
George Hotz 8fe24038d8
Revert "mulacc fusion cleanup (#2871)" (#2876)
This reverts commit 863c5b26ed.
2023-12-20 13:26:25 -08:00
qazal 863c5b26ed
mulacc fusion cleanup (#2871)
* add mulacc fusion tests

* cleanup the implementation

* fix indent in the test utility

* less verbose
2023-12-20 15:39:54 -05:00
chenyu e13b4964d7
remove the all_int(shape) check in Tensor._loadop (#2874)
* remove the all_int(shape) check in Tensor._loadop

we can support jittable symbolic shape random with custom rand now, and we can formalize it in the test after threefry is ready

* MOCKHIP false positive
2023-12-20 15:04:50 -05:00
qazal 5f07ef455e
update dtypes (#2872) 2023-12-20 15:04:02 -05:00
George Hotz ca59054463
fix shapetracker math (#2861)
* proper test

* all st math good now

* fix real_strides bug
2023-12-19 22:17:34 -08:00
chenyu 5a739e8c20
update one skipped pad_reshape test that was fine (#2860)
* update one skipped pad_reshape test that was fine

had a typo

* this one passed
2023-12-19 23:25:52 -05:00
chenyu ad233d557f
disable reshape merging with masks (#2858)
fuzzer found a bug, and it's not complete
2023-12-19 19:06:16 -05:00
Oleg Rybalko 42a038c83f
More readable torch_load ext check (#2853)
* more readable extension check

* enable tarfile test

* detach tensor if requires grad in torch
2023-12-19 14:53:15 -05:00
chenyu 172a88e719
skip slow test_indexing on METAL (#2852)
LLVM still runs and is a lot faster, would be curious to know why.
also reworded some error messages and remove regex check
2023-12-19 12:00:54 -05:00
geohotstan fec8e9060c
Add simple fancy indexing exceptions (#2706)
* fancy indexing raise error

* updated error message

* improved error check

* oops

* fixed onnx

* oops typo

* merge

* add full_flatten

* try

* merged and updated some tests

* more cleaning

* done

* temp fix onnx

* try

* add todo in onnx_test

* reword

* gah
2023-12-19 11:23:51 -05:00
George Hotz 90fb09b55c remove unused _device_extra_args 2023-12-18 22:14:58 -08:00
George Hotz b2192b5400
minor improvements (#2845) 2023-12-18 22:09:08 -08:00
George Hotz d086325b1b hotfix: failing tests 2023-12-18 21:12:42 -08:00
George Hotz 07df14aa0e
HIP cleanups (#2843)
* move everything to code_for_op to reason about it

* loop the loopable parts

* its not that unreadable

* these are loopable too

* nitpick

* tests p1 - replace these with the actual compiler running alu ops tests

* tests p2: compile test_dtype_alu in HIP!

+add to CI

* nobody liked test_renderer

* revert test_dtypes change

* isolated mockhip tests

* dont need the WHERE hack after #2782

+ruff

* bf16 is broken in HIP

job failed in: https://github.com/tinygrad/tinygrad/actions/runs/7232101987/job/19705951290?pr=2778#step:8:73

* picking this back up

* add compile tests for unary ops and binary ops

* MOD is only in ints

* CMPLT wont work after the dtypes pr is merged because it will always be bool

* test all combinations

* Update cstyle.py

* don't use vload

* no getenv

* set seed

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2023-12-18 21:09:32 -08:00
George Hotz b6d71b131e hotfix: push broken tests 2023-12-18 21:08:42 -08:00
George Hotz 80f53245e8
shapetracker add and invert (#2828)
* invert (broken)

* decent invert

* shapetracker invert works

* plus is meh, invert is good

* support invert mask

* a few more invert tests

* shapetracker math invert test
2023-12-18 16:03:27 -08:00
chenyu 73cadfbb3c
Remove pytest markers (#2831)
* remove pytest marker

* fix some, skip some

* tweak

* fix

* skip slow

* skip more
2023-12-18 18:53:28 -05:00
chenyu 264fe9c93f
clean up test_dtype.py (#2827)
make is_dtype_supported a pure function and clean up long lines
2023-12-18 16:06:09 -05:00