Commit Graph

3572 Commits

Author SHA1 Message Date
chenyu 3c4c8c9e16
bump db version to 11 (#3398)
followup after disabled fast math on metal.
2024-02-14 10:13:18 -05:00
qazal 27f4de2ce4
delete half_prekernel (#3388)
* generic rendering of half and bf16

hotfix

* fix uops + regression test

* fix the test for metal's half4

* uop.uop fixup

* mypy with --strict-equality, fix ops_gpu
2024-02-14 15:40:48 +01:00
chenyu 078a2603d5
set metal fast math default to 0 (disabled) (#3370)
* set metal fast math default to 0 (disabled)

It's a correctness fix because we use inf and nan. Let's see how slow it is

* skip failed onnx tests

* tmp DISABLE_COMPILER_CACHE=1 in metal benchmark

* Revert "tmp DISABLE_COMPILER_CACHE=1 in metal benchmark"

This reverts commit 22267df38099acbf949aefdb6a5911ebc3a31984.
2024-02-14 11:42:33 +01:00
Francis Lam 668324d92b
wmma: protect TC locals from modification and use only LOCAL (#3379)
also remove unnecesssary upcast_dim from tensor_core and calculate
it from the dimensions and thread sizes
2024-02-13 10:19:35 +01:00
Francis Lam f1ad01fd91
test_linearizer_failures: add new linearizer compile failure on METAL (#3380) 2024-02-12 20:28:34 -05:00
George Hotz ce1f9f5556 hotfix: new linearizer docs 2024-02-12 18:56:30 +01:00
George Hotz 2e60012bcf
move create schedule and delete old API (#3377)
* move create schedule and delete old API

* fix test multitensor
2024-02-12 18:10:45 +01:00
George Hotz 41efaa848c
move graph.py and jit.py into features (#3376)
* move graph.py into features

* move jit into features

* fix quickstart
2024-02-12 17:34:34 +01:00
George Hotz 0f6cde243d
import from wino_cleanup (#3374) 2024-02-12 16:26:50 +01:00
George Hotz f47e297d4e refactor: END -> ENDLOOP 2024-02-12 15:46:18 +01:00
George Hotz 29d68ae637
uops endif (#3372)
* use is instead of ==

* add endif
2024-02-12 15:43:37 +01:00
George Hotz 1d45f3899d
use is instead of == (#3371) 2024-02-12 15:35:55 +01:00
David Hou 323393b650
verbose apply_matrix (#3333)
* verbose apply_matrix

* types

* not so verbose

* small comment change

* fix typo

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-02-12 12:06:12 +01:00
Jyotirmaya Mahanta d55f99e881
patch merge_views (#3311) 2024-02-12 11:53:55 +01:00
Jyotirmaya Mahanta b6a2600c86
fix merging condition in merge_dims (#3363)
* fix merging condition in merge_dims

* add tests

* set contiguous after mask is canonicalized

* minor fix
2024-02-12 11:50:26 +01:00
qazal c8fd66a131
Run RDNA3 tensor core tests in CI (#3367)
* add test_linearizer

* skip test_padto_matmul
2024-02-11 19:54:06 -05:00
chenyu f798b60338
add METAL_FAST_MATH env var to disable metal fast math (#3369)
* env var METAL_FAST_MATH to disable fastmath for metal

use this to test impact of fast math. might need to disable compiler cache with DISABLE_COMPILER_CACHE

* failed onnx test with fast math

METAL_FAST_MATH=0 DISABLE_COMPILER_CACHE=1 NOOPT=1 python -m pytest -n=auto test/external/external_test_onnx_backend.py -k test_MaxPool3d_stride_padding_cpu
2024-02-11 04:26:09 -05:00
chenyu 1156a27619
cleanup atol in test_ops (#3368)
removed the explicit set value if it's the same as default 1e-6, or higher but can be set to default.
2024-02-10 19:44:44 -05:00
Yoshinori Sano 98c732cf9d
fix metal compile error in extra/gemm (#3365) 2024-02-10 12:54:41 +01:00
George Hotz d1fb1e0ba4
full sync to fix HIP memory leak (#3364) 2024-02-10 11:50:27 +01:00
Francis Lam ddb22a60c8
linearizer: fix up edge case bugs in UNROLL opt (#3362)
Fully UNROLLing the first_reduce should not change the number of
local_dims.

Fully UNROLLing a GROUP dim should reduce the number of
group_for_reduces by one.

Also changed group_for_reduces to be a count as the axis number
isn't used anywhere (they are always the first reduce dims).
2024-02-10 11:49:25 +01:00
George Hotz dc82ef6660 hotfix: swap HIP/CUDA bringup order to prevent delay on tinybox 2024-02-09 18:41:25 +01:00
andresgit 28ba1c5406
fix Tensor.randint ignoring kwargs (#3350)
* fix Tensor.randint ignoring kwargs

* randint kwargs fix
2024-02-09 17:12:16 +01:00
Francis Lam ce21fdfb67
ops_python: add HIP tensor core mock and refactor METAL (#3354)
* ops_python: add HIP tensor core mock and refactor METAL

* Add tests to CI

* add DEBUG=2 to full tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-02-09 12:46:06 +01:00
George Hotz b385234961
oops, change to 3.12 (#3357) 2024-02-09 12:21:06 +01:00
George Hotz 7726eef464
ops_python: add image support (#3356)
* ops_python: add image support

* uops tests in their own CI

* fix ci
2024-02-09 12:02:06 +01:00
George Hotz 5f93061f67
ops_python: gated load support (#3355)
* start uop emu

* tiny_add passes

* more ops

* emulate the whole warp

* test_gemm passes

* metal gemm test pass

* works on big gemm

* works on big gemm

* more tests pass

* touch ups

* fix mypy

* cleanups

* exp2 mypy

* arch is where it belongs

* actually emulate tensor cores

* fix test

* new style

* add gated load support to PYTHON

* out of bounds error message

* cleaner
2024-02-09 11:16:25 +01:00
chenyu c151131d1b
update onnx tests that no longer fail on CI (#3353)
was debugging fast math and turned out it passed on CI now. more like a bug in CI
2024-02-08 21:19:00 -05:00
chenyu 7c1c6efee5
exclude half with PYTHON in test_dtype.is_dtype_supported (#3351)
half memoryview only in 3.12+. rest of the test_dtype (bounty) seems to be legit issue in ops_python.
2024-02-08 20:10:25 -05:00
George Hotz c32ea95d7d
Python uop emulator (#3327)
* start uop emu

* tiny_add passes

* more ops

* emulate the whole warp

* test_gemm passes

* metal gemm test pass

* works on big gemm

* works on big gemm

* more tests pass

* touch ups

* fix mypy

* cleanups

* exp2 mypy

* arch is where it belongs

* actually emulate tensor cores

* fix test

* new style
2024-02-08 19:24:55 +01:00
Mason Mahaffey 3ebf7a3e38
reflect changes to shapetracker in doc printouts (#3349) 2024-02-08 16:20:30 +01:00
Francis Lam 2266152b28
linearizer: added FUZZ_BEAM to fuzz_linearizer and additional tests (#3340)
Fixed test_tensor_core_opts to test all the TCs.

Added commented out failing tests in test_color_shapes_with_local.
2024-02-08 16:12:58 +01:00
chenyu b110c4a7b8
explicitly set input low and high in test_ops (#3347)
easier to set `(low, high)` than figuring out a,b for `(x+a)*b`. this pr kept the same input ranges
2024-02-08 04:11:45 -05:00
chenyu d8ad9e5660
verify eval acc for hlb_cifar training (#3344)
set to 93% to reduce flakiness for now
2024-02-07 19:19:59 -05:00
chenyu 0d2dacb549
test intermediate tensors created by function have same device as input (#3338)
run on TORCH since it's the fastest one on CI.
caught a bug in multinomial, and update the behavior of fancy index and gather to move the indices Tensor to same device as self.
2024-02-07 09:24:36 -05:00
chenyu 1732f1ba83
fix import and long lines in view (#3337) 2024-02-07 06:50:21 -05:00
chenyu 02636ff62d
re-enable test_reduce_0d_default int test case in test_dtype (#3336) 2024-02-07 05:30:14 -05:00
chenyu ca66be6a70
add failed Tensor.pow test cases (#3334)
tried refactoring pow and found some bugs
2024-02-07 04:28:24 -05:00
chenyu ea74856d99
remove some noqa: E501 in tensor (#3332)
left ones in conv2d and wino, no E501 elsewhere in tensor.
three functions need general readability improvement: getitem and gather, conv2d and wino, and pow
2024-02-07 00:03:05 -05:00
David Hou 6478ee5c75
PoC UnaryOps before expand (#3319)
* PoC cast before expand

* maybe also do unaryops before expand?

* undo unaryops change
2024-02-06 19:05:13 -08:00
chenyu d9ef8e25b3
fix Tensor.var with 0 in reduce dim. (#3324)
fix when correction is too big. it seems to only work when input size is 0 though.
torch can output -inf in var when correction is too big, which does not make sense.
2024-02-05 20:59:13 -05:00
Obada Khalili ee25f73283
Fix Tensor.mean to compute the mean correctly when 0-length axes are selected (#3318)
* fix Tensor.mean to compute the mean correctly with 0-length axes are selected

* add a regression test

* rename sum variable to sum_t to avoid conflict with built it function

* refactor Tensor.mean to has less lines
2024-02-05 01:40:37 -05:00
terafo 3752e97c8f
Fix: Always cast ONNX Slice op arguments into ints (#3317)
* fix: ensure that axes and steps are always ints

* Cast everything in tinygrad

---------

Co-authored-by: terafo <terafo@protonmail.com>
2024-02-04 18:40:48 -05:00
chenyu 97275101e9
fix safetensor load uint32 and uint64 (#3315)
the correct keys are U32 and U64.
2024-02-04 10:46:27 -05:00
Yoshinori Sano edb74897b2
support safe load bf16 (#3310)
* support safe load bf16

* fix lint error E501

* add test for loading safetensors

* key should be BOOL

* fix lint
2024-02-04 10:08:39 -05:00
chenyu ca7973f61c
clean up einsum_mulacc (#3312)
* clean up einsum_mulacc

* push get_strides

* stride

* get_strides for ndim
2024-02-04 06:21:19 -05:00
chenyu d459956966
move TestGetContraction to test_helpers (#3313)
also cleaned long lines in test_shapetracker and enabled the line length check
2024-02-04 06:05:01 -05:00
Obada Khalili b4ea0e18e3
Fix dot product on buffers with zero strides (#3303)
* skip matacc opt if the all src buffers of mul op are const buffers

* add noqa directive for long test

* unskip MALACC opt

* ensure that a_axes at least includes summation axes in order to perform np.einsum correctly

* add regression test for mulacc op

* compute a_slices using a_axes

* refactor  helper of  function to retrieve axes and slices for nonzero strides as well as summation axes

* include a regression test that uses  and  to test the behaviour indirectly
2024-02-04 05:15:06 -05:00
chenyu 30a3288c4a
touchup canonicalize empty mask (#3308)
empty list -> None. also added env SEED for fuzz_shapetracker_math
2024-02-03 21:05:10 -05:00
Jyotirmaya Mahanta f5e0d9673c
canonicalize empty masks (#3292) 2024-02-03 20:27:57 -05:00