mirror of https://github.com/commaai/tinygrad.git
c704a77ca0
* dtypes alu test * those types don't exist in torch * floats * more tests * disable those * a couple unary tests * skip float16 tests in CI for GPU * fix LLVM bool add True+True=1+1=2 which truncates to False in native LLVM * remove hardcoded float for LLVM ALU fns * less sensitive atol for fp32, 1e-10 is flaky and sometimes failed even if you revert the merge commit for non-fp32 math, nothing has changed in our kernels for fp32. * return on overflows * fix CUDA exp2 * compute results of op regardless of bounds in a python backend * skip fp16 in GPU and CUDACPU * fuzz a smaller range in the float_midcast_int32 test I sampled this and we overflow ~70% of the time. because numpy behaves differently on different devices for overflows and Metal seems to do the same, I'm opting to eliminate the non-determinism here * remove CUDA exp2 overload it's already there now --------- Co-authored-by: George Hotz <geohot@gmail.com> |
||
---|---|---|
.. | ||
external | ||
extra | ||
imported | ||
models | ||
unit | ||
web | ||
Dockerfile | ||
__init__.py | ||
helpers.py | ||
test_assign.py | ||
test_conv.py | ||
test_conv_shapetracker.py | ||
test_copy_speed.py | ||
test_custom_function.py | ||
test_dtype.py | ||
test_dtype_alu.py | ||
test_gc.py | ||
test_hip_rdna3.py | ||
test_image_dtype.py | ||
test_jit.py | ||
test_kernel_cache.py | ||
test_lazybuffer.py | ||
test_lazyop.py | ||
test_linearizer.py | ||
test_linearizer_failures.py | ||
test_net_speed.py | ||
test_nn.py | ||
test_ops.py | ||
test_optim.py | ||
test_randomness.py | ||
test_sample.py | ||
test_schedule.py | ||
test_search.py | ||
test_specific_conv.py | ||
test_speed_v_torch.py | ||
test_symbolic_jit.py | ||
test_symbolic_ops.py | ||
test_symbolic_shapetracker.py | ||
test_tensor.py | ||
test_to_numpy.py | ||
test_uops.py | ||
test_winograd.py | ||
test_zero_copy.py |