Commit Graph

128 Commits

Author SHA1 Message Date
chenyu 04f2327ca3
fix abs of diff of uint (#4411) 2024-05-15 18:39:11 -04:00
nimlgen eb9689336e
nv mockgpu (#4600)
* mockgpu nv

* works

* comment that out

* fix merge

* setup gpuocelot

* install packages

* not run all of them

* passes

* fix ci

* almost

* should pass

* linter

* linter 2

* try this?

* ugn, not supported

* ci

* remove ticket from description

* better descs
2024-05-15 23:46:08 +03:00
chenyu 3c11ca452e
skip CLANG test casts between double and half for now (#4609)
start breaking after github CI image update
2024-05-15 16:17:06 -04:00
chenyu 7eb035e7c5
stronger test case for half mean overflow (#4470) 2024-05-07 22:40:09 -04:00
chenyu ca7300c783
fix half mean and its backward (#4469)
* fix half mean and its backward

cast to sum_acc_type, sum, div, then cast back

* mean dtype tests
2024-05-07 21:46:41 -04:00
qazal 35dfbc6354
rand_for_dtype helper (#4459) 2024-05-07 00:03:42 +03:00
chenyu 826cccd54d
fix mean underflow for half tensor (#4377)
* fix mean underflow for half tensor

divide only the reduce factor. added unit test and non-nan assertion in resnet training. also added a failed test cast for symbolic shape var

* skip for python backend
2024-05-01 13:38:57 -04:00
chenyu 077ea6926c
remove downcast_half in sum (#4376)
breaks boolean mean and other stuff
2024-05-01 11:46:44 -04:00
chenyu 93abcd3113
fix function.py sum backward without downcast_half (#4353)
without downcast_half, sum output dtype can be different from input dtype. cast back to input dtype in function.py
2024-04-29 17:53:02 -04:00
chenyu c1d8d425eb
fix mean of half tensor if sum is greater than hlaf.max (#4327)
sum of half does acc in float32 already, add an arg to not downcast to half and use that in mean
2024-04-28 18:04:54 -04:00
qazal 23445db2b9
no skipped tests in RHIP (#4337)
* delete skip

* delete split skip

* remu dev

* compiler fails here

* Revert "remu dev"

This reverts commit 28b933d4eb54c9a3fb4c39f584122f501c791d27.
2024-04-28 12:23:05 -04:00
chenyu 63eb0a68af
fix return dtype of gather (#4159) 2024-04-12 16:25:12 -04:00
chenyu d9c5a2b1bb
fix return dtype of getitem Tensor indexing (#4158)
the use of sum can auto-upcast the result. fixed by using the data dtype as the acc_dtype
2024-04-12 15:55:02 -04:00
chenyu 380f27d629
move sum acc_dtype into lazy so it applies to backward (#4149)
* move sum acc_dtype into lazy so it applies to backward

* unit test
2024-04-11 14:43:56 -04:00
chenyu 7bc560ec49
remove outdated bf16 comments in test_dtype (#3987) 2024-03-29 00:56:18 -04:00
uuuvn 8a40d7d423
Shape changing bitcast and assert bitcast in disk (#3973)
* Shape changing bitcast

* only support it on disk

* basic test

* more tests

* RuntimeError instead of assert

* create unique temp files

* move tests that use disk to test_disk_tensor

* linter

* remove assert on error messages

* that's RuntimeError now

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-28 21:49:10 -07:00
chenyu 793ab0512e
use ctypes to truncate float64 and float32 in uops (#3986)
this fixed the softmax.argmax bug for ops_python as the float is truncated to float32
2024-03-28 23:56:50 -04:00
chenyu 4ecd5789ab
#include <tgmath.h> in ops_clang (#3927)
* different clang sqrt/log2/exp2/sin function based on dtype

fixed softmax_argmax issue in #3552 for clang.

* tgmath.h

* revert those
2024-03-25 17:48:57 -04:00
chenyu 83f39a8ceb
env var to change default float (#3902)
* env var to change default float to fp16 or bf16

looking for standard names for these. we have FLOAT16 that does something to IMAGE and HALF to convert weights.

working on default bf16 too.
```
RuntimeError: compile failed: <null>(6): error: identifier "__bf16" is undefined
    __bf16 cast0 = (nv_bfloat16)(val0);
```

remove that in cifar

* DEFAULT_FLOAT

* default of default

* unit test

* don't check default

* tests work on linux
2024-03-24 20:33:57 -04:00
chenyu 2c69888654
include negative float in test_dtype (#3884)
* include negative float in test_dtype

* that is ub

* too annoying

* pack can overflow
2024-03-24 02:39:15 -04:00
chenyu 2d3ce53348
touchup test_dtype.test_gradient_dtype (#3887)
add back bad merge from #3613 and add float.double and float.bfloat16 to test
2024-03-22 20:56:45 -04:00
David Hou fc11808a79
initialize Tensor grad same type as self (#3613)
* initialize Tensor grad same type as self

* also test different default float

* check dtype + try/finally

* don't test_gradient_dtype if f16 is not supported

* fix bad merge

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-22 20:33:18 -04:00
chenyu c5467e5bd6
diverse test value in test_dtype DATA based on dtype (#3864)
* diverse test value in test_dtype DATA based on dtype

* eh fix typo

* that too?

* PTX does not support i8 and s8

* skip that

* unused line

* pus the hack back

* remove that
2024-03-22 14:22:06 -04:00
chenyu d17900bc45
use int32 instead of default_int in simplify_phi_loops (#3828)
* use int32 instead of default_int in simplify_phi_loops

indices are in int32 now and is separated from buffer dtype. fix #3823

* return early if not supported

* it's not that

* why is it failing for RHIP
2024-03-19 17:49:58 -04:00
chenyu 99cbc24390
use dtypes.int32 as return dtype for functions that return indices (#3827)
behavior matches jax. It's fine to have a tensor greater than max int8 size even if we set default int to int8
2024-03-19 17:06:57 -04:00
chenyu fa1921ec7d
move test_dtype tests to test dtype and output value (#3826) 2024-03-19 16:31:27 -04:00
chenyu 639bd5dbfc
move bf16 cast hack to Tensor.llvm_bf16_cast (#3788) 2024-03-17 18:51:22 -04:00
chenyu a2d3cf64a5
move is_dtype_supported to test.helpers (#3762)
* move is_dtype_supported to test.helpers

updated all places that check if float16 is supports

* fix tests
2024-03-15 14:33:26 -04:00
chenyu d3a6319630
bf16 tests in test_dtype.py (#3749)
With bf16 creation and bf16 to numpy, we can test bf16 in test_dtype.
Only support HIP now as it needs bf16 buffer support. Also the rtoal is slightly larger
2024-03-15 00:17:11 -04:00
chenyu 75d4344cda
UOps.BITCAST (#3747)
* UOps.BITCAST

implicitly fixed no const folding for bitcast

* python backend

* ptx

* consistent llvm
2024-03-14 21:00:35 -04:00
chenyu 11c61ae044
Revert "fix const bitcast should not be constant folded (#3743)" (#3744)
This reverts commit 38ba277ac8.
2024-03-14 19:24:05 -04:00
chenyu 38ba277ac8
fix const bitcast should not be constant folded (#3743)
* fix const bitcast should not be constant folded

* fixed const bf16 creation

* LLVM still broken
2024-03-14 19:13:52 -04:00
chenyu 4d6ec41adb
failed test cases for bf16 Tensor.full (#3729)
fixable with float const then cast to bf16. cast folding with bitcast is incorrectly skipped
2024-03-13 20:46:45 -04:00
chenyu 6793db169b
bfloat16 tensor creation from list and numpy (#3724) 2024-03-13 18:44:05 -04:00
George Hotz 69ca7f7bf9
changes for teenygrad (#3665)
* changes for teenygrad

* upd

* simpler test
2024-03-09 15:30:34 -08:00
Zaffer 1853ec9a02
add tests for bfloat16 on HIP (#3638)
* Fix bug in login functionality

* Remove HSA backend test and add bfloat16 dtype tests that run in CI

* Skip tests on HIPCPU

* skip tests causing segfault on LLVM backend

* Exclude bfloat16 tests causing segfaults in LLVM backend

* move bf16 cast tests to only test on HIP
2024-03-07 10:45:36 -08:00
qazal abc5f3a6a0
hip bf16 hotfix (#3630)
* hip bf16

* remu dev mac

* Revert "remu dev mac"

This reverts commit 465069a0dc3c7f2045f3348b312a1dcbf1587acd.

* skip disk tests in CI

* bring float8 back
2024-03-06 11:42:30 -08:00
chenyu bc2a13a5f7
test case to show clang and python doing math in double (#3628) 2024-03-06 13:49:03 -05:00
chenyu 3275260c98
Revert "test: add failing bfloat16 test case for metal backend (#3481)" (#3618)
This reverts commit 1e12a2ae80.
2024-03-05 09:08:42 -05:00
Skosh 1e12a2ae80
test: add failing bfloat16 test case for metal backend (#3481)
* test: add failing bfloat16 test case for metal backend

* test: move bfloat 16 test to dtypes test
2024-03-05 08:44:54 -05:00
qazal a29cd6d464
run f64 increased precision tests on remu (#3509)
* run the test in CI

* temp: use the pre-release

* Revert "temp: use the pre-release"

This reverts commit 28e8571421aa66e54594c3eb3efce43130557dc8.
2024-02-26 18:01:07 -05:00
chenyu b154089884
float64 function support for HIP (#3492)
* float64 function support for HIP

* not CI
2024-02-24 09:46:20 -05:00
chenyu 35aff8b0c2
properly exclude PYTHON backend and support of half (#3491)
should be able to run in CI with python 3.12
2024-02-24 09:22:06 -05:00
Patrick Tsai 9dd64b1f5f
Fix python cast uint/int overflow (#3448)
* Fix numpy uint/int overflow

* lol

* Works

* Update

* Move overflow test to float64/float32

* One line

* Update

* One more

---------

Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-02-20 09:20:43 +01:00
zku 2d702ca073
If feasible, do not truncate float64 down to float32 in cstyle renderer (#3420)
* do not truncate float64 precision

* use l suffix to try avoid overload confusion

* long line, ruff bloats the function otherwise

* fmt

* remove long double suffix (l), it's sufficient to have the float32 (f) suffix to avoid function overload ambigouity; add test showcasing rtol=1e-12 precision increase, the test fails without the renderer changes

* use more reasonable test values, same as test_int_to_float_unary_func

* disable test for CUDACPU, does not support half and segfaults on some operations per dtypes_alu test

* disable test for HIP, renderer does not support f64 precision

* do not use noqa E501, break up condition
2024-02-16 10:08:59 +01:00
xarkes 28a8b72024
Remove Interpreted device & remaining CPU/TORCH ref (#3423)
* Remove Interpreted device & remaining CPU/TORCH ref

* Oops

* supports_device was useful

* Fix doc wording

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-02-16 00:30:21 -05:00
geohotstan 5eb4c902f6
correct division dtype casting (#3405)
* 新年快乐

* fix: exclude floordiv onnx tests

* fix: less weird if statements in div

* 龙年大吉

* fix: tempfix onnx div

* fix: use reference impl for div
2024-02-15 19:34:40 -05:00
qazal 27f4de2ce4
delete half_prekernel (#3388)
* generic rendering of half and bf16

hotfix

* fix uops + regression test

* fix the test for metal's half4

* uop.uop fixup

* mypy with --strict-equality, fix ops_gpu
2024-02-14 15:40:48 +01:00
chenyu 7c1c6efee5
exclude half with PYTHON in test_dtype.is_dtype_supported (#3351)
half memoryview only in 3.12+. rest of the test_dtype (bounty) seems to be legit issue in ops_python.
2024-02-08 20:10:25 -05:00
chenyu 02636ff62d
re-enable test_reduce_0d_default int test case in test_dtype (#3336) 2024-02-07 05:30:14 -05:00