* Fixes + improved test coverage for helpers.py
- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread
* Made `_early_exec_process` catch any Exception
Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`
* Fixed `from tinygrad.helpers import Timing` import
oops, for some reason my IDE cleaned that import from extra/helpers.
* Fixed import in llama.py
Another one that I skipped by accident, mybad
* Extracted a class for tests of early exec
* Normalize line endings, windows uses /r/n
* Made `cross_process` not a daemon
* fold expands that precede a reduce if the reduction is on the same axis as the expansion
* add deterministic test for SIMPLIFY_SUM_RESHAPE_EXPAND_SUM optimization
* add a test case to make sure we don't fold reduce-expand-reduce on different axes
* fixed division by zero for fast operations
* made et closer to 0
* replace POW llop with SQRT
* updated mlops to swap SQRT and POW llops
* updated hlops to swap POW and SQRT
* added sqrt llop to cpu runtime
* added sqrt llop to cstyle codegen
* added POW llop to llvm ir codegen
* added SQRT llop to torch runtime
* moved pow from mlops to hlops
* found a better way to do reverse pow
* fixed indentation
* added SQRT llop to triton
* update docs to match new llops
* removed POW operator from assembly codegen
* added sqrt and rsqrt to pow hlop
* rewrote pow function in tensor.py
* Adjust tolerance
* Adjust for adamw
* Reduce for Adam too
* removed accidental leftover code
* removed all of accidental code
* added rsqrt test
* removed pow from mlops again
it was added back when resolving merge conflicts
---------
Co-authored-by: Jacky Lee <jla524@sfu.ca>
* fix syntax issues in imagenet_download.py
* use cloudpickle in cross_process to make it work in Python 3.9+
* add cross_process test
* prevent unpickling on every function call
* add cloudpickle to setup.py
* add support for args/kwargs
* Use generators in any(..) instead of lists for better best-case
* Use generators in all(...) instead of lists
* enable R1729 in .pylintrc
* revert import sorting
---------
Co-authored-by: Anselm Coogan <anselm@scandit.com>
* perf: remove extra function, include in cached getitem
* perf: only calculate hash once per node
---------
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
* new upcast works
* float4 try
* fix unaligned float4
* disallow unaligned access
* upcast dim
* maybe good now
* fix gpu half
* vstore_half4
* fix deep image bugs
* improve symbolic to fix issues
* fix symbolic
* cl test
* this maybe
* gcd of 1 is 1
* real fix for old python
* improve fuzzer
* realize hotspots
* no str check
* minor changes
* make this an assert
* faster and more readable
* nicer self.buffers
* tests for weak op + LAZYCACHE=0