Commit Graph

155 Commits

Author SHA1 Message Date
George Hotz 85a45164fb
remove pyint [pr] (#7016)
* remove pyint

* bump time on tp [pr]

* dont truncate in const fold

* remove dead code

* Revert "dont truncate in const fold"

This reverts commit 29c81db0f7880848b001c2728aa555a1ef17e7d3.

* remove define_var
2024-10-12 22:36:24 +08:00
chenyu 75d9dcf000
support dtype in softmax and log_softmax (#6914)
matches torch. for mixed precision training, we would want to use float for softmax
2024-10-06 07:18:15 -04:00
wozeparrot 2b899164c6
no numpy (#6751) 2024-09-26 16:40:18 +08:00
George Hotz cb22ef379a
truncate consts early (#6741)
* truncate consts early

* ptx still fails

* Update dtype.py
2024-09-25 16:49:51 +08:00
George Hotz 1b4d1823b7
add pyint to DTYPES_DICT [run_process_replay] (#6477)
* add pyint to DTYPES_DICT [run_process_replay]

* also fix uop alu bug

* exclude pyint there too

* ne ne

* force explicit dtype
2024-09-11 17:31:59 +08:00
chenyu 002303c145
fix output of truncate_fp16 (#6381)
make sure the non-inf path returns the truncated value
2024-09-05 22:55:43 -04:00
chenyu 590c0922b6
Tensor.prod (#6250)
* Tensor.prod

a new reduce op!

* onnx ReduceProd
2024-08-23 10:06:32 -04:00
wozeparrot 0c5189de25
threefry half (#6154) 2024-08-18 15:23:12 -07:00
samm393 2dc586ffe5
Shape change bitcast for more dtypes (#6047)
* bitcast & tests

* use to_dtype

* put disk tensor tests back

* tests

* bitmask

* no bitmask

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-08-14 10:03:34 -07:00
chenyu 4a65010de8
remove CUDACPU flag in tests [run_process_replay] (#5902)
no longer used
2024-08-04 16:06:38 -04:00
chenyu c67e9887f7
support using str to specify dtype (#5897)
* support using str to specify dtype

in Tensor creation and args into `cast` and `bitcast`, and acc_dtype

* more tests
2024-08-04 12:56:28 -04:00
samm393 2c94316bd2
ull literal support and test (#5789)
* ull literal support and test

* missing .numpy()
2024-07-29 11:50:49 -04:00
chenyu 600a39771d
fix Tensor.arange if (stop-start) and step have different signs (#5775) 2024-07-28 14:34:10 -04:00
kormann 2c4add6844
pretty print lazy op per default (#5505)
* pretty lop

* min diff

* walrus

* fix

* min diff

* simplify

* pretty helper function

* ws

* pretty uop upat

* tests

* stricter tests

* test passes

* ws

* stronger upat test

* delete print_tree

* min diff

* stricter exp test

* fix merge

* stronger uops eval test

* +readable and deep upat test

* +readable and deep upat test

* sort inv fix

* fix

* revert allowed_len
2024-07-18 09:34:08 -07:00
chenyu f8a47608cc
test dtype.min and dtype.max (#5479)
compared with np.iinfo for integer dtype
2024-07-14 15:31:37 -04:00
chenyu ca021229e4
fix attention to always return in the same dtype as input (#5100)
mid cast to default_float does not work as intended when default is float32 and qkv is in half
2024-06-22 10:34:57 -04:00
chenyu cc2be9064f
fix out of bound python list into numpy array (#5043)
numpy 2.0 does not allow oob python const and recommends writing as `np.array(value).astype(dtype)`
2024-06-18 18:05:21 -04:00
chenyu acaf9a490d
RECIP(-0.0) should be -inf (#5024)
* RECIP(-0.0) should be -inf

added test_dtype_alu for PYTHON backend

* catcht that

* fix those two
2024-06-17 22:26:58 -04:00
chenyu 03b367c014
handle float16 overflow in PYTHON (#5022)
* handle float16 overflow in PYTHON

use `truncate` when constructing tensor from list to make sure all values are packable (might be slow, but should be correct). add truncate_fp16 to cast overflowed values to inf/-inf.

* all valid fmt supports truncate
2024-06-17 21:12:52 -04:00
chenyu 4296507021
Tensor.sum returns in acc_dtype if specified (#5012)
* Tensor.sum returns in acc_dtype if specified

* skip PYTHON for now

* revert that

* relax that
2024-06-17 16:35:52 -04:00
chenyu 2b07847f2b
matmul returns in acc_dtype if specified (#4994)
more flexible to not automatically downcast, can fix bert mixed precision training with this
2024-06-16 12:56:15 -04:00
chenyu 67e8df4969
remove numpy from dtype (#4969)
replaced all dtype.np with _to_np_dtype defined in tensor.py.

after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
2024-06-14 15:38:45 -04:00
chenyu 287d3c3b84
support list, tuple input in dtypes.from_py (#4945)
* support list, tuple input in dtypes.from_py

and used it to infer dtype from python list and tuple in Tensor constructor.

* fix tests
2024-06-13 13:38:06 -04:00
qazal 637f482588
configure derandomizing CI tests (#4793) 2024-05-31 17:06:58 +03:00
Szymon Ożóg de5c69c4c9
Unify test_dtype naming conventions (#4730) 2024-05-25 10:12:40 -04:00
chenyu 47aba47f64
update Torch.gather api (#4692)
* update Torch.gather api

gather(self, dim, index) to match torch

* fix that
2024-05-22 21:54:06 -04:00
chenyu 286b4dbdf2
compile raise CompileError and skip only RuntimeError in multiprocess… (#4646)
* compile raise CompileError and skip only RuntimeError in multiprocess beam

renderer error with multiprocess should not be skipped by beam

* use `==` for dtype to dtype comparison

* that needs to be is

* typo
2024-05-19 00:25:25 -04:00
chenyu 04f2327ca3
fix abs of diff of uint (#4411) 2024-05-15 18:39:11 -04:00
nimlgen eb9689336e
nv mockgpu (#4600)
* mockgpu nv

* works

* comment that out

* fix merge

* setup gpuocelot

* install packages

* not run all of them

* passes

* fix ci

* almost

* should pass

* linter

* linter 2

* try this?

* ugn, not supported

* ci

* remove ticket from description

* better descs
2024-05-15 23:46:08 +03:00
chenyu 3c11ca452e
skip CLANG test casts between double and half for now (#4609)
start breaking after github CI image update
2024-05-15 16:17:06 -04:00
chenyu 7eb035e7c5
stronger test case for half mean overflow (#4470) 2024-05-07 22:40:09 -04:00
chenyu ca7300c783
fix half mean and its backward (#4469)
* fix half mean and its backward

cast to sum_acc_type, sum, div, then cast back

* mean dtype tests
2024-05-07 21:46:41 -04:00
qazal 35dfbc6354
rand_for_dtype helper (#4459) 2024-05-07 00:03:42 +03:00
chenyu 826cccd54d
fix mean underflow for half tensor (#4377)
* fix mean underflow for half tensor

divide only the reduce factor. added unit test and non-nan assertion in resnet training. also added a failed test cast for symbolic shape var

* skip for python backend
2024-05-01 13:38:57 -04:00
chenyu 077ea6926c
remove downcast_half in sum (#4376)
breaks boolean mean and other stuff
2024-05-01 11:46:44 -04:00
chenyu 93abcd3113
fix function.py sum backward without downcast_half (#4353)
without downcast_half, sum output dtype can be different from input dtype. cast back to input dtype in function.py
2024-04-29 17:53:02 -04:00
chenyu c1d8d425eb
fix mean of half tensor if sum is greater than hlaf.max (#4327)
sum of half does acc in float32 already, add an arg to not downcast to half and use that in mean
2024-04-28 18:04:54 -04:00
qazal 23445db2b9
no skipped tests in RHIP (#4337)
* delete skip

* delete split skip

* remu dev

* compiler fails here

* Revert "remu dev"

This reverts commit 28b933d4eb54c9a3fb4c39f584122f501c791d27.
2024-04-28 12:23:05 -04:00
chenyu 63eb0a68af
fix return dtype of gather (#4159) 2024-04-12 16:25:12 -04:00
chenyu d9c5a2b1bb
fix return dtype of getitem Tensor indexing (#4158)
the use of sum can auto-upcast the result. fixed by using the data dtype as the acc_dtype
2024-04-12 15:55:02 -04:00
chenyu 380f27d629
move sum acc_dtype into lazy so it applies to backward (#4149)
* move sum acc_dtype into lazy so it applies to backward

* unit test
2024-04-11 14:43:56 -04:00
chenyu 7bc560ec49
remove outdated bf16 comments in test_dtype (#3987) 2024-03-29 00:56:18 -04:00
uuuvn 8a40d7d423
Shape changing bitcast and assert bitcast in disk (#3973)
* Shape changing bitcast

* only support it on disk

* basic test

* more tests

* RuntimeError instead of assert

* create unique temp files

* move tests that use disk to test_disk_tensor

* linter

* remove assert on error messages

* that's RuntimeError now

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-28 21:49:10 -07:00
chenyu 793ab0512e
use ctypes to truncate float64 and float32 in uops (#3986)
this fixed the softmax.argmax bug for ops_python as the float is truncated to float32
2024-03-28 23:56:50 -04:00
chenyu 4ecd5789ab
#include <tgmath.h> in ops_clang (#3927)
* different clang sqrt/log2/exp2/sin function based on dtype

fixed softmax_argmax issue in #3552 for clang.

* tgmath.h

* revert those
2024-03-25 17:48:57 -04:00
chenyu 83f39a8ceb
env var to change default float (#3902)
* env var to change default float to fp16 or bf16

looking for standard names for these. we have FLOAT16 that does something to IMAGE and HALF to convert weights.

working on default bf16 too.
```
RuntimeError: compile failed: <null>(6): error: identifier "__bf16" is undefined
    __bf16 cast0 = (nv_bfloat16)(val0);
```

remove that in cifar

* DEFAULT_FLOAT

* default of default

* unit test

* don't check default

* tests work on linux
2024-03-24 20:33:57 -04:00
chenyu 2c69888654
include negative float in test_dtype (#3884)
* include negative float in test_dtype

* that is ub

* too annoying

* pack can overflow
2024-03-24 02:39:15 -04:00
chenyu 2d3ce53348
touchup test_dtype.test_gradient_dtype (#3887)
add back bad merge from #3613 and add float.double and float.bfloat16 to test
2024-03-22 20:56:45 -04:00
David Hou fc11808a79
initialize Tensor grad same type as self (#3613)
* initialize Tensor grad same type as self

* also test different default float

* check dtype + try/finally

* don't test_gradient_dtype if f16 is not supported

* fix bad merge

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-22 20:33:18 -04:00
chenyu c5467e5bd6
diverse test value in test_dtype DATA based on dtype (#3864)
* diverse test value in test_dtype DATA based on dtype

* eh fix typo

* that too?

* PTX does not support i8 and s8

* skip that

* unused line

* pus the hack back

* remove that
2024-03-22 14:22:06 -04:00