tinygrad

Commit Graph

Author	SHA1	Message	Date
nimlgen	81a4a9623c	add qcom dsp runtime (#6112 ) * calling qualcomm dsp from python * include so files * add include file * adsprpc.py * running with adsprpc * work * 32-bit support in elf * compilation works * ion * msm_ion * working DSP backend * getting 500 MFLOPS on matmul * beam works with timing * move to autogen * disasm * progress * simple tests pass * qcom_dsp * more dsp autogen * progress * some progress * works w/o lib * checkpoint * no lib * ugh, better * cleaner, but with lib. test good, but with the hack * remove autogens * small * push * simpler * revert this * run_3 * simpler * android * handle * run it * why? * run2 * to gen * cc * cleaner * elf * part of autogen * comemnt * no lib * autohen * linter * bug reproducer * cleaner * this repro is almost empty and doesn't work!!!! * with this test_ops passes, no crashes anymore * cleaner * linter * renames * shorter * remoev contextlib * ugh * myoy * cleaner * cleaner * remove import * conn * import * revert this * remove heavy .so * shorter alloc * not tue anymore --------- Co-authored-by: Comma Device <device@comma.ai> Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <george@comma.ai>	2024-09-13 21:01:33 +03:00
Francis Lam	c91b7b1739	test: add fuzz_matmul and better debugging for simple_matmul (#4199 ) also show unoptimized shape in verify_kernel	2024-04-16 23:40:31 -04:00
Francis Lam	7c5729a3bd	wmma: refactor to remove wmma_func and create TC funcs as needed (#3945 ) * wmma: refactor to remove wmma_func and create TC funcs as needed * test_linearizer: disable bf16 CUDA during emulation testing * cstyle: clean up creation of CUDA vec dtypes * extra/gemm: add option to accumulate to bfloat16 * cleanups * benchmark: add CUDA bfloat16 matmul * more cleanups	2024-03-27 16:43:09 -04:00
Francis Lam	a26090d404	search: change to use "spawn" and limit the number of tasks per child (#3862 ) also clean up some examples to use __main__ and not initialize resources outside of main	2024-03-21 21:23:36 -07:00
Francis Lam	ddbdb52f77	wmma: enable METAL half tensor cores and clean up cstyle (#3095 ) * wmma: enable METAL half tensor cores and clean up cstyle * revert simple_matmul rand changes and break line in tensor * added metal fp16->fp32 tensor core	2024-01-12 16:25:28 -05:00
chenyu	1d730b8853	remove ACCUM_FP32 in simple_matmul.py (#3045 ) * remove ACCUM_FP32 in simple_matmul.py accumate for half inputs is always in float * move test llama compile speed to metal	2024-01-08 17:37:57 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	b5fd160b39	hotfix: increase rtol on simple_matmul	2023-12-11 10:10:29 -08:00
Francis Lam	dece9958f8	wmma: clean up to make WMMA arg order consistent (#2014 ) also add cache defeat to extra/gemm/simple_matmul.py	2023-10-07 17:45:40 -07:00
Francis Lam	f445e056ed	wmma: add test and tensor core shape (#1925 )	2023-09-28 18:04:28 -07:00
George Hotz	e464442adf	WMMA for 7900XTX (#1563 ) * go * hip no LRU * work * works * 16 TFLOPS * 29 TFLOPS * 30 TFLOPS * never mind, it's 60 TFLOPS * fix metal WMMA * put hip alloc back	2023-08-19 09:07:23 -07:00
George Hotz	90fff82c8a	Rdna (#776 ) * assembler maybe * custom asm * rdna3 on quiet * trigger crashes * fixed notes * non-fatal rdna2 crash * Crash4 * improve rdna sniffer * comments * improve sniffer * asm * 131 TFLOPS RDNA3 * opt simple matmul * todos	2023-05-16 05:33:57 -07:00

12 Commits