* update logo
* update svg
* put svg in file
* Revert "put svg in file"
This reverts commit 735528047ac439c8be164b1f782dd44c279f3e8a.
* better
* move a tag
* remove extra
* remove HIP in core tinygrad
ci test uses device RHIP and HSA compiler (LinearizerOpt), so fine to remove HIP from tc.
Also updated README and EMULATE tc test flag
* EMULATE_CUDA
* global -> group
* allow None for local_size in custom function
* lil local
* comment on shape
* fix cuda
* smart local cast
* better local heuristic
* fix ptx, and work_dim cleanup
* fix metal
* fix ops test
* fix openpilot jit
* no more optlocal
* might fix metal tests
* try metal now
* see generated metal code
* test free removal. REVERT THIS
* mergable
* Make GPU the default device
* Compile EfficientNet with CPU
* don't print device
* use METAL and CUDA if possible
* Revert some changes to workflow
* Fix import error when checking device availability
* device lookup is now optional
* hopefully fix linter and tests
* fix workflow
* Skip device if not available
* don't change default if CPU=1
* simplify device selection
* Default to CPU if no GPU
* don't print device name...
* No need to change default in llama
* Make GPU the default device
* Compile EfficientNet with CPU
* don't print device
* use METAL and CUDA if possible
* Revert some changes to workflow
* Fix import error when checking device availability
* device lookup is now optional
* hopefully fix linter and tests
* fix workflow
* Skip device if not available
* don't change default if CPU=1
* simplify device selection
* Default to CPU if no GPU
* don't print device name...
* No need to change default in llama
* run github workflow
* Fix logic to select default
* pass if an error occurs
* use separate function for try except
* Less, LessOrEqual, Greater, GreaterOrEqual, Equal
* lint fix
* using built in functions
* overriding __eq__ breaks things
* backwards pass for less - foward only tests
* one other spot
* removing backwards for comparison ops to match pytorch
* raise runtime error
* more tests for comparison ops
* fixed the lineup
* added number upcast tests
* conv2d is an hlop
* shorter conv
* KOPT=-1
* alt imp
* MULACC
* smarter mulacc
* pop conv
* 7x7 -> 5x5
* didn't fix, that's not going to work
* this is faster and matches old behavior
* oh, non lazy just won't work with mulacc
* mulacc in torch
* bool types were creeping in
* optimizer is actually better with hlop conv
* fix pushing permutes issue
* refactor einsum_mulacc
* fix up readme
* update readme
* _image_conv2d
* fix bias addition location
* pushing permutes gets back to 200 kernels
* conv cleanup
* disable hlop conv
* don't hide that in helpers