mirror of https://github.com/commaai/tinygrad.git
38fb1e14a2
* fixed xmx demo * i think i'm invoking the DPAS but it's slow * compiler build arg to stop register spilling, indicated where to fix flop counter * don't mind this * do NOT mind me * do not mind me * do not view * i will add bf16 later * in process of figuring out tc fields * we figured out the fields!!! * added check for cl device vendor, added seperate IntelRenderer * remove tc thread_local_aliases * cleaning debris before draft pr * edits for linter * deduping and checking device extensions * i will find more line reductions in other places * before merge upstream * double grf size in compiler to fix register spilling (bandaid), device checking changes * tc python emulation * fixed emulation * tests for emulated intel tensor core * TC=0, 1 working on upstream, fixed perf * test * debris * check for specialized cl device when we canonicalize device * bf16 support, tc=3 test added * address tests * revert half2 loads on intel tc, cleanup * linter * fold_expanded revert * lint, whitespace fix * cuda bf16 (only one with bf16) is skipped in test tensor cores, so i will skip for intel bf16 too * make line shorter, no need for noqa E501 * removed device intel * fix python emulation --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> |
||
---|---|---|
.. | ||
external | ||
imported | ||
models | ||
testextra | ||
unit | ||
Dockerfile | ||
__init__.py | ||
helpers.py | ||
test_arange.py | ||
test_assign.py | ||
test_compile_failures.py | ||
test_const_folding.py | ||
test_conv.py | ||
test_conv_shapetracker.py | ||
test_copy_speed.py | ||
test_custom_function.py | ||
test_device_speed.py | ||
test_dtype.py | ||
test_dtype_alu.py | ||
test_fusion_op.py | ||
test_fuzz_shape_ops.py | ||
test_gc.py | ||
test_graph.py | ||
test_hcq.py | ||
test_image_dtype.py | ||
test_jit.py | ||
test_kernel_cache.py | ||
test_lazybuffer.py | ||
test_lazyop.py | ||
test_linearizer.py | ||
test_linearizer_dumb.py | ||
test_linearizer_failures.py | ||
test_linearizer_overflows.py | ||
test_masked_st.py | ||
test_method_cache.py | ||
test_multitensor.py | ||
test_net_speed.py | ||
test_nn.py | ||
test_ocl.py | ||
test_ops.py | ||
test_optim.py | ||
test_pattern_matcher.py | ||
test_pickle.py | ||
test_profiler.py | ||
test_randomness.py | ||
test_renderer_failures.py | ||
test_sample.py | ||
test_schedule.py | ||
test_search.py | ||
test_setitem.py | ||
test_specific_conv.py | ||
test_speed_v_torch.py | ||
test_subbuffer.py | ||
test_symbolic_jit.py | ||
test_symbolic_ops.py | ||
test_symbolic_shapetracker.py | ||
test_tensor.py | ||
test_tensor_data.py | ||
test_tensor_variable.py | ||
test_to_numpy.py | ||
test_transcendental.py | ||
test_uop_graph.py | ||
test_uops.py | ||
test_uops_stats.py | ||
test_verify_lazyop.py | ||
test_winograd.py | ||
test_zero_copy.py |