Commit Graph

41 Commits

Author SHA1 Message Date
wozeparrot 9eb6eef441
seed in tensor (#6869) 2024-10-06 14:46:58 -04:00
wozeparrot c100f3d406
default threefry (#6116) 2024-09-25 17:45:13 +08:00
wozeparrot 2be0b26a1f
rand only supports single device (#6682) 2024-09-24 16:07:44 +08:00
wozeparrot 46e360fdc0
check bfloat16 range with threefry (#6660) 2024-09-23 10:48:44 +08:00
chenyu 2de174677a
threefry touchup [run_process_replay] (#6169)
also why is test_gc testing _rng_counter is allocated??
2024-08-18 23:01:24 -04:00
wozeparrot 0c5189de25
threefry half (#6154) 2024-08-18 15:23:12 -07:00
chenyu dc942bf1f6
jit sampling functionn in test_randomness.test_multinomial (#5034)
* jit sampling functionn in test_randomness.test_multinomial

`THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec

* skip that
2024-06-18 14:21:05 -04:00
qazal 637f482588
configure derandomizing CI tests (#4793) 2024-05-31 17:06:58 +03:00
chenyu 53b9081aab
check arg types of Tensor.randint (#4751)
raise TypeError if low, high, dtype are not ints
2024-05-27 20:24:10 -04:00
nimlgen 9b02aef45a
remove rhip (#4579)
* remove rhip

* remove hip runner
2024-05-14 17:58:19 +03:00
nimlgen 2131556c2c
amd mockgpu (#4535)
* start mock amd gpu

* virt files

* cleaner

* init ci

* small fixes

* linter

* better?

* ugh

* linter

* fix

* diable some

* run shorter

* fixes

* add hcq test

* fix

* fix cmd revert
2024-05-14 14:28:04 +03:00
qazal 23445db2b9
no skipped tests in RHIP (#4337)
* delete skip

* delete split skip

* remu dev

* compiler fails here

* Revert "remu dev"

This reverts commit 28b933d4eb54c9a3fb4c39f584122f501c791d27.
2024-04-28 12:23:05 -04:00
chenyu b43e470f80
always use f32 for rand source of randn (#3998)
* always use f32 for source of randn

fixed bfloat16 randn to not have inf.
don't really care about float64. threefry is float32 based too

* HSA is broken
2024-03-29 17:04:34 -04:00
chenyu 6b6461122e
test case Tensor.randn should be finite (#3994)
* test case Tensor.randn should be finite

there's a hack to fix float16, need a generic solution that works with bf16 and threefry

* skip not supported

* bfloat16 local is wrong

* skip RHIP
2024-03-29 14:51:02 -04:00
wozeparrot a0ab755317
threefry again (#3785)
* feat: initial xor

* feat: initial threefly

* feat: remove custom random

* fix: really need to install precommit

* feat: lmao forgot that this is rotate not a shift

* clean: put that there

* feat: numpy xor

* feat: quick test for xor

* feat: llvm xor

* feat: slightly working xor in torch

* feat: rand works in jit

* clean: save a line

* feat: match jax

* feat: maybe test against jax

* feat: requires_grad

* fix: fix test_symbolic_ops

* feat: lower alpha

* feat: just pad

* fix: maybe fix training tests?

* fix: fix some llvm stuff

* feat: cursed realize on the way out

* feat: testing jax

* fix: why is the jax install process not simple

* fix: maybe passing test

* fix: symbolic workarounds

* clean: still need that precommit

* fix: aaaa

* fix: more test fixes

* fix: quick fix for wgsl

* feat: need to set requires_grad on the final tensor

* feat: one more tensor

* feat: don't take forever

* feat: seeing y ci is brok

* feat: can't allocate 64GiB lmao

* fix: fix this

* feat: hope this doesn't break smth before i go to bed

* feat: don't destroy ram

* feat: int

* feat: remove jax

* feat: properish workaround?

* feat: skip slow webgpu tests

* feat: no longer fails

* feat: use dtypes

* feat: real number

* fix: torch

* fix: don't test against reference for torch

* feat: to device

* feat: fix advanced indexing

* feat: correct casting

* feat: even rng_counter

* feat: match master

* feat: this was actually bad

* fix: maybe?

* feat: store

* feat: remove realizes

* feat: somehow this is important

* feat: somehow this is also important

* feat: save a line

* fix: don't need that anymore

* feat: restore this

* fix: linter

* feat: remove realizes

* fix: realized is in base now

* fix: add back cast

* fix: bump deadline

* fix: bump deadline

* fix: bump deadline

* fix: bump deadline

* fix: bump deadline

* fix: :(

* fix: :(

* fix: not being dumb

* feat: try changing less tests

* feat: shouldn't have to change that

* feat: contiguous bumps it by one

* fix: hmm

* fix: numpy memory moment

* fix: cl_khr_fp16

* fix: torch has different tensor count

* fix: missing contiguous

* hmm: hmm

* fix: some fixes

* fix: typing

* feat: dont do that

* feat: typing fixes

* feat: why is this realize required?

* feat: ngl kinda odd typing

* feat: oh

* feat: remove realizes

* feat: why is this realize required?

* fix: hacky patch for cudacpu

* fix: without this realize pytest crashes?????

* fix: shorter line

* fix: cudacpu fixes

* fix: cudacpu fixes

* feat: real buffer

* feat: don't search when searching lmao

* fix: can't use contiguous things

* fix: no more 100GB arrays

* fix: revert

* fix: skip 7 and 10

* feat: working ish beam

* feat: minimize changes

* feat: seed 0 stable diffusion example changed

* fix: different on ci

* fix: no beam

* feat: make threefry optional

* fix: check value

* fix: unused import

* feat: threefry default

* fix: 5d

* feat: allow non upcast div

* fix: 5d better

* fix: 5d better

* fix: save all dtype

* feat: proper error

* feat: lazyop key

* fix: check float

* feat: try removing this realize now

* feat: disable threefry for uops hip tensor cores

* feat: don't need that

* feat: only check upcast

* fix: disable threefry for some metal tests

* feat: disable for metal tensor uops as well

* feat: disable for most uops

* fix: disable threefry for new uops tests

* feat: multitensor

* fix: typing

* feat: threefry default off

* feat: skip threefry half rand

* feat: restore old

* fix: bad git

* clean: ruff

* feat: bfloat16 fix

* fix: :|

* feat: restore old

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-18 16:47:07 -04:00
George Hotz 311cf2b7d3
Revert "threefry_2x32 (#2601)" (#3784)
This reverts commit db3de54bc4.
2024-03-17 10:27:20 -07:00
wozeparrot db3de54bc4
threefry_2x32 (#2601)
* feat: initial xor

* feat: initial threefly

* feat: remove custom random

* fix: really need to install precommit

* feat: lmao forgot that this is rotate not a shift

* clean: put that there

* feat: numpy xor

* feat: quick test for xor

* feat: llvm xor

* feat: slightly working xor in torch

* feat: rand works in jit

* clean: save a line

* feat: match jax

* feat: maybe test against jax

* feat: requires_grad

* fix: fix test_symbolic_ops

* feat: lower alpha

* feat: just pad

* fix: maybe fix training tests?

* fix: fix some llvm stuff

* feat: cursed realize on the way out

* feat: testing jax

* fix: why is the jax install process not simple

* fix: maybe passing test

* fix: symbolic workarounds

* clean: still need that precommit

* fix: aaaa

* fix: more test fixes

* fix: quick fix for wgsl

* feat: need to set requires_grad on the final tensor

* feat: one more tensor

* feat: don't take forever

* feat: seeing y ci is brok

* feat: can't allocate 64GiB lmao

* fix: fix this

* feat: hope this doesn't break smth before i go to bed

* feat: don't destroy ram

* feat: int

* feat: remove jax

* feat: properish workaround?

* feat: skip slow webgpu tests

* feat: no longer fails

* feat: use dtypes

* feat: real number

* fix: torch

* fix: don't test against reference for torch

* feat: to device

* feat: fix advanced indexing

* feat: correct casting

* feat: even rng_counter

* feat: match master

* feat: this was actually bad

* fix: maybe?

* feat: store

* feat: remove realizes

* feat: somehow this is important

* feat: somehow this is also important

* feat: save a line

* fix: don't need that anymore

* feat: restore this

* fix: linter

* feat: remove realizes

* fix: realized is in base now

* fix: add back cast

* fix: bump deadline

* fix: bump deadline

* fix: bump deadline

* fix: bump deadline

* fix: bump deadline

* fix: :(

* fix: :(

* fix: not being dumb

* feat: try changing less tests

* feat: shouldn't have to change that

* feat: contiguous bumps it by one

* fix: hmm

* fix: numpy memory moment

* fix: cl_khr_fp16

* fix: torch has different tensor count

* fix: missing contiguous

* hmm: hmm

* fix: some fixes

* fix: typing

* feat: dont do that

* feat: typing fixes

* feat: why is this realize required?

* feat: ngl kinda odd typing

* feat: oh

* feat: remove realizes

* feat: why is this realize required?

* fix: hacky patch for cudacpu

* fix: without this realize pytest crashes?????

* fix: shorter line

* fix: cudacpu fixes

* fix: cudacpu fixes

* feat: real buffer

* feat: don't search when searching lmao

* fix: can't use contiguous things

* fix: no more 100GB arrays

* fix: revert

* fix: skip 7 and 10

* feat: working ish beam

* feat: minimize changes

* feat: seed 0 stable diffusion example changed

* fix: different on ci

* fix: no beam

* feat: make threefry optional

* fix: check value

* fix: unused import

* feat: threefry default

* fix: 5d

* feat: allow non upcast div

* fix: 5d better

* fix: 5d better

* fix: save all dtype

* feat: proper error

* feat: lazyop key

* fix: check float

* feat: try removing this realize now

* feat: disable threefry for uops hip tensor cores

* feat: don't need that

* feat: only check upcast

* fix: disable threefry for some metal tests

* feat: disable for metal tensor uops as well

* feat: disable for most uops

* fix: disable threefry for new uops tests

* feat: multitensor

* fix: typing

* feat: threefry default off

* feat: skip threefry half rand

* feat: restore old

* fix: bad git

* clean: ruff

* feat: bfloat16 fix

* fix: :|

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-17 10:19:33 -07:00
chenyu 8ea53951c1
bfloat16 Tensor.rand (#3764)
* Tensor.rand for bfloat16

for numpy based random, generate one for float then cast for bfloat16.

close #3653

* remove realize
2024-03-15 15:05:13 -04:00
Skosh e8c350fdac
fix: make Tensor.rand produce correct values for float16 (#3654)
* fix: make Tensor.rand produce correct values for float16

Due to precision loss when casting to float16, the data distribution created by custom_random isnt correctly in the interval ]0, 1[, but instead in the interval ]0, 1], which causes the Tensor.randn to incorrectly generate values of infinity.

The solution uses a scaling value to make sure the values stay under 1, when using half precision.

Closes #3611

* update implementation to truncate to closest f16 value to 1

* chore: fix whitespace

* test larger distribution

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-10 18:48:00 -04:00
andresgit 28ba1c5406
fix Tensor.randint ignoring kwargs (#3350)
* fix Tensor.randint ignoring kwargs

* randint kwargs fix
2024-02-09 17:12:16 +01:00
chenyu ae112c9dbe
fix some long lines in tests (#3006)
* fix some long lines in tests

* better
2024-01-03 23:53:33 -05:00
George Hotz a280cfe169
move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
Will 016aebcd84
Fixed Tensor.randint() not accepting tuple shapes (#2923)
* ww/Fixed Tensor.randint() to accept shape tuples ()

* ww/Wrote a test to cover this typo

* ww/Updated Tensor random objects to optionally take (,) or *() to be more consistent

* ww/no lint no worries

* ww/Made peace with linter

* ww/Added new line can't reduce line size without reducing readablitity

* ww/reverted to using .mul
2023-12-24 20:32:26 -05:00
chenyu 9c32474a1f
Revert "Revert "Tensor.randint is Tensor.uniform with dtypes.int32 (#2801)" (#2802)" (#2814)
This reverts commit fa84998244.
2023-12-17 12:14:17 -05:00
chenyu fa84998244
Revert "Tensor.randint is Tensor.uniform with dtypes.int32 (#2801)" (#2802)
This reverts commit 86c2f267d4.
2023-12-16 15:53:28 -05:00
chenyu 86c2f267d4
Tensor.randint is Tensor.uniform with dtypes.int32 (#2801) 2023-12-16 15:14:50 -05:00
George Hotz 6d6eb9302d
ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00
chenyu 1ac958a058
update pytest marks and CI test filters (#2587)
* remove pytest marks

* test more stuff

* fine revert some

* add that mark back

* skip that

* hmm LLVM does not work on ubuntu

* too slow on CUDA CI

* dup test
2023-12-03 15:20:44 -05:00
chenyu 03968622a2
Pretty multinomial (#2365)
* pretty multinomial

p, cdf_normalized -> weight, cdf
symmetric unsqueeze / squeeze
check num_sample > 0

TODO: how do we want to handle 0/0 in general?

* no 0-dim input

* single sum
2023-11-19 15:10:10 -05:00
Marcello Fuschi b8d460d203
Add Tensor.multinomial (#2295)
* add Tensor.multinomial only with replacement

* add support for 2D input in Tensor.multinomial

* fix multinomial output shape

* allow passing replacement=False to Tensor.multinomial when num_samples=1

* improve tests for Tensor.multinomial

* fix edge case in Tensor.multinomial

* Tensor.multinomial no more staticmethod
2023-11-15 11:38:39 -08:00
chenyu 9215bccb41
Tensor.uniform set default to standard uniform (#2158)
* Tensor.uniform set default to standard uniform

* clean up test to reuse function
2023-10-27 16:15:30 -04:00
Jordan Wright 25be7f745d
Tensor.uniform with dtype=int bug fix (#1593) 2023-08-26 01:59:53 -04:00
Yixiang Gao 8d6662a741
.cpu().numpy() -> .numpy() (#1594)
* .cpu().numpy() -> .numpy()

* restore ops_torch

* restore test_speed_v_torch
2023-08-21 09:53:29 -07:00
JaSpa99 5ab12059da
rng hlops: add normal and kaiming_normal (#1378)
* add normal and kaiming_normal

* make sure its float

* add tests
2023-07-31 10:37:02 -07:00
cheeetoo a0965ee198
CI < 5 minutes (#1252)
* models matrix

* fix typo and install gpu deps

* install llvm deps if needed

* fix

* testops with cuda

* remove pip cache since not work

* cuda env

* install cuda deps

* maybe it will work now

* i can't read

* all tests in matrix

* trim down more

* opencl stuff in matrix

* opencl pip cache

* test split

* change cuda test exclusion

* test

* fix cuda maybe

* add models

* add more n=auto

* third thing

* fix bug

* cache pip more

* change name

* update tests

* try again cause why not

* balance

* try again...

* try apt cache for cuda

* try on gpu:

* try cuda again

* update packages step

* replace libz-dev with zlib1g-dev

* only cache cuda

* why error

* fix gpuocelot bug

* apt cache err

* apt cache to slow?

* opt and image in single runner

* add a couple n=autos

* remove test matrix

* try cuda apt cache again

* libz-dev -> zlib1g-dev

* remove -s since not supported by xdist

* the cache takes too long and doesn't work

* combine webgpu and metal tests

* combine imagenet to c and cpu tests

* torch tests with linters

* torch back by itself

* small windows clang test with torch tests

* fix a goofy windows bug

* im dumb

* bro

* clang with linters

* fix pylint error

* linter not work on windows

* try with clang again

* clang and imagenet?

* install deps

* fix

* fix quote

* clang by itself (windows too slow)

* env vars for imagenet

* cache pip for metal and webgpu tests

* try torch with metal and webgpu

* doesn't work, too long

* remove -v

* try -n=logical

* don't use logical

* revert accidental thing

* remove some prints unless CI

* fix print unless CI

* ignore speed tests for slow tests

* clang windows in matrix (ubuntu being tested in imagenet->c test)

* try manual pip cache

* fix windows pip cache path

* all manual pip cache

* fix pip cache dir for macos

* print_ci function in helpers

* CI as variable, no print_ci

* missed one

* cuda tests with docker image

* remove setup-python action for cuda

* python->python3?

* remove -s -v

* try fix pip cache

* maybe fix

* try to fix pip cache

* is this the path?

* maybe cache pip

* try again

* create wheels dir

* ?

* cuda pip deps in dockerfile

* disable pip cache for clang

* image from ghcr instead of docker hub

* why is clang like this

* fast deps

* try use different caches

* remove the fast thing

* try with lighter image

* remove setup python for cuda

* small docker and cuda fast deps

* ignore a few more tests

* cool docker thing (maybe)

* oops

* quotes

* fix docker command

* fix bug

* ignore train efficientnet test

* remove dockerfile (docker stuff takes too long)

* remove docker stuff and normal cuda

* oops

* ignore the tests for cuda

* does this work

* ignore test_train on slow backends

* add space

* llvm ignore same tests as cuda

* nvm

* ignore lr scheduler tests

* get some stats

* fix ignore bug

* remove extra '

* remove and

* ignore test for llvm

* change ignored tests and durationon all backends

* fix

* and -> or

* ignore some more cuda tests

* finally?

* does this fix it

* remove durations=0

* add some more tests to llvm

* make last pytest more readable

* fix

* don't train efficientnet on cpu

* try w/out pip cache

* pip cache seems to be generally better

* pytest file markers

* try apt fast for cuda

* use quick install for apt-fast

* apt-fast not worth

* apt-get to apt

* fix typo

* suppress warnings

* register markers

* disable debug on fuzz tests

* change marker names

* apt update and apt install in one command

* update marker names in test.yml

* webgpu pytest marker
2023-07-23 13:00:56 -07:00
George Hotz 791530045d
Refactor LoadOps (#910)
* test

* work

* upd test

* loadops

* cleanups

* real ones

* remove LazyNumpyArray

* fix assign test

* remove range

* np.require

* llama uses arange kernels

* no caching consts

* fix enet

* torch load support

* tests cleanup

* fix shufflenet

* fix image

* fix torch_load test
2023-06-03 09:40:43 -07:00
George Hotz 4d28d55683 add nn layer tests 2023-06-01 21:34:24 -07:00
George Hotz 8a928ed2f3
nn init matches torch (#901) 2023-06-01 21:24:11 -07:00
Rabia Eda Yılmaz 3075988468
Added kaiming_uniform initialization for Conv2d and Linear layers (#756)
* added kaiming_uniform init for conv2d and linear layers

* fix: set getattr

* up

* fix: set getattr

* fix comments

* better does not mean it is good

* more nonlinearities

* added test

checks the distribution of default relu option

* prettier

* fix kernel size

* edit distribution of returned tensor

* complete tests and fix fan_mode

* added higher dim test

* prettier test

* fix silly blank

* just leaky_relu mode

* default fan in and leaky relu

* update params

* fix test

* shorter

* generalize Tensor.uniform and adjust kaiming init

- added low and high parameters to Tensor.uniform function, so it can have a specific range (default is 0 to 1)
- adjusted return line of kaiming_uniform

* range from -1 to 1

* delete comment

* adjusted test_uniform

* fixed

* delete comment
2023-05-29 15:09:55 -07:00
Jacky Lee b80cf9220c
Statistics test: check if distributions match torch (#769)
* Check if tensor values match torch

* Clean up randomness tests and remove dependency

* Remove kaiming uniform test
2023-05-07 21:43:23 -07:00
Jacky Lee 5e41d5857c
Add tests for randomness (#621)
* Add tests for random creation functions

* It worked on my machine!

* Rename to helper_same_distribution

* Remove extra line

* Add tests for equal distribution

* Test without scipy

* Do a different test for randn
2023-03-01 15:39:20 -08:00