* torch and numpy don't share ops anymore
* that should be filtered out elsewhere
* still const
* graph + enet example cleanup
* hmm, we do still need it because of symbolic
* add name support
* use fetch in gpt2
* remove requests from main lib, networkx also optional
* umm, keep that assert
* updates to fetch
* i love the walrus so much
* stop bundling mnist with tinygrad
* err, https
* download cache names
* add DOWNLOAD_CACHE_VERSION
* need env.
* ugh, wrong path
* replace get_child
* update HIP language
* vectorized render_cast with special treatment for hip only
* test coverage for all cases
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* remove force_wait
* refactor
* get rid of stupid ASTRunner
* fix del in diskbuffer
* BufferOps.FROM_UNDERLYING
* put offset in the rawbuffer
* fix bugs
* use exec
* autopad shapetracker for BEAM
* OptOps.PADTO
* skip that test for now
* correct padding reduce axis
* just 32
* avoid more than double the FLOPs
* cleanups
* test case
* no support for triton and llvm yet
* typos
* symbolic shape would not work
* cannot PADTO with MAX kernel
* advance db version
* no breaking change - don't advance db version
* is triton just python?
* Revert "is triton just python?"
This reverts commit 17e776c25587615e33a3634c2fb0bb8591ce65d4.
* Revert "Revert "is triton just python?""
This reverts commit 6c434c01e1c4b0ea0431ec18632cd859fb3cf260.
* support llvm
* is it really passing in CI only?
* update tests
* oh triton test passed
* simpler
* revert that, with a test
* check if st are the same
* Revert "check if st are the same"
This reverts commit d2a5eac110a5da1af82a2728c883779ef69c3cad.
* update the db version
* rebase artifact
* replace all _dtypen with dtype.vec(n)
fix: print works
* conceptul refactor of cstyle render_load logic
* linearizer GEP is explicit that its dtype is the scalar version of localtype
* vectorized global_store and load don't need a conditional
* pretty multinomial
p, cdf_normalized -> weight, cdf
symmetric unsqueeze / squeeze
check num_sample > 0
TODO: how do we want to handle 0/0 in general?
* no 0-dim input
* single sum
* beautiful mnist
* beautiful mnist example
* from tinygrad import Tensor
* more beautiful
* the jit is super core tinygrad
* globalcounters reset on jit run
* symlinks and exclude
* beautiful_cartpole
* evaluate is it's own function
* no symlinks
* more beautiful
* jit reset for double speed
* type hinting for JIT
* beautiful_mnist gets 98%
* beautiful_mnist < 4s with BEAM=2
* better cartpole
* use actor critic
* zero_grad got lost
* delete double relu
* stable cartpole with PPO
* beautiful_cartpole is more beautiful
* REPLAY_BUFFER
* beautiful stuff typechecks
* None support in shape
* hp tuning
* add back as_strided, move rebuilt mops to extra
* negative stride for ops_cpu
* Revert "negative stride for ops_cpu"
This reverts commit a13b6815ac31478d31ae71c26f4d4e4d274bf155.
* skip that
* style
* force rebuild of ocelot
* SzymonOzog gpuocelot
* delete that
* downgrade that
* non parallel
* force rebuild
* use llvm
* nauto
* less mem maybe
* print test
* helper_test_exception skip CUDACPU
* helper_test_exception
* shippable
* assert adequate memory has been freed
* cleaned up runtime error message
* improved metal buffer alloc error catching and reporting
* decreased lines and altered messages
* removed unnecessary _get_cur_free_space() call
* improved assert message
* added allocate massive buffer test
* added test_lru_allocator_metal_max_buffer_length
* split into two asserts and removed walrus assignment from assert expression
* update assert message and use byte data type for clarity
* add Tensor.multinomial only with replacement
* add support for 2D input in Tensor.multinomial
* fix multinomial output shape
* allow passing replacement=False to Tensor.multinomial when num_samples=1
* improve tests for Tensor.multinomial
* fix edge case in Tensor.multinomial
* Tensor.multinomial no more staticmethod
* zero in shape start
* no assert for that
* if output size is 0, return without exec
* tweak
* strides
* reduce over non-zero
* shrink and expand
* fix import
* test_elementwise where
* cannot reshape from size 0 to size 1
* compiled backend reduce over 0
* zeros for numpy
* reduce over 0 and keepdim resulted in 1
* reduce empty set default values
* compare with same input
* pad test case
* cat test case
* torch does not support that?