tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	3635540ddb	shorter line (#2733 )	2023-12-12 15:34:17 -08:00
nimlgen	ede7971ada	save some lines (#2731 ) * remove unsused mem_cached var * one more	2023-12-12 15:26:27 -08:00
chenyu	00b611c156	simplify type promotion - remove weak types (#2730 )	2023-12-12 16:12:57 -05:00
Nguyen Nguyen Phuong	07cf45e133	fix cuda matmul (#2725 )	2023-12-12 07:59:31 -08:00
chenyu	ef6e942a23	dtype promotion helpers (#2724 ) * dtype promotion helpers * better tests * space	2023-12-11 23:14:23 -05:00
Christopher Mauri Milan	0232db294d	fix tolist issue (#2723 )	2023-12-11 19:14:00 -08:00
chenyu	4075208127	some dtype creation spec test cases (#2722 )	2023-12-11 19:33:49 -05:00
Guy Leroy	ee9e1d3662	Extend available types for `safe_save` (#2720 ) * Extend available types to save with * Linter fix	2023-12-11 14:50:35 -08:00
George Hotz	b5fd160b39	hotfix: increase rtol on simple_matmul	2023-12-11 10:10:29 -08:00
Gregor Kikelj	4feaaa27aa	ensure shrink is valid (#2717 )	2023-12-11 09:58:43 -08:00
qazal	a43bc78804	fix dtypes helpers for integers (#2716 ) * scalar * maybe do this instead * Revert "scalar" everything is a scalar * add tests in test_dtype * fuzz testing + fix unsigned ints * fuzz everything	2023-12-11 09:28:19 -08:00
nimlgen	bc3c4ce50b	cuda set context before sync (#2715 ) * cuda set context before sync * no helper	2023-12-11 09:26:53 -08:00
Ivan Vnučec	8d206f6bfd	fix help message (#2705 ) llama -> mixtral	2023-12-10 22:04:35 -08:00
George Hotz	59ab3675a3	faster mixtral + green for new kernels (#2701 ) * green for new kernels * track ram	2023-12-10 19:04:58 -08:00
chenyu	2ee6f689c5	simpler einsum (#2700 )	2023-12-10 21:24:44 -05:00
George Hotz	b01e3907a1	mixtral touch up: two lines	2023-12-10 17:21:49 -08:00
George Hotz	b3982187d1	Mixtral Example (#2691 ) * mixtral * simpler * global counters * simpler * weights arg	2023-12-10 17:18:31 -08:00
George Hotz	0fd44259cd	bf16 fix + cleanups from mixtral (#2698 ) * bf16 fix + cleanups from mixtral * generic bf16 cast	2023-12-10 16:31:52 -08:00
Davi Silva	7fbebb3df6	Implement einsum (#2686 ) * hopeful impl for Tensor.einsum * satisfy mypy by having less typing. :( * a few simple tests * even more tests * permute tests * xfails for improper usage * fix LLVM test fail * use argfix * more helpful error message on shape mismatch	2023-12-10 15:56:01 -08:00
chenyu	181b0970b5	slightly better extra/to_movement_ops dedups (#2695 )	2023-12-10 11:05:44 -05:00
chenyu	ef18d79faa	remove noop from to_movement_ops (#2693 )	2023-12-10 00:50:24 -05:00
chenyu	2d0e38e201	fix jit input_rawbuffers check wrt consts (#2689 ) * fix jit input_rawbuffers check wrt consts * .numpy()	2023-12-09 15:59:03 -05:00
geohotstan	67ff2b2b18	Formatted test_indexing (#2688 ) * added tensor.clone() for more correct cloning behavior * some work and randint issue * formatted * final cleanups * oops, bug fix	2023-12-09 11:38:36 -05:00
chenyu	1e7823e1f5	combine GROUP and GROUPTOP to a single block (#2687 )	2023-12-09 01:19:32 -05:00
chenyu	0fb1d47aa0	two linearizer fuzzer failed test case for webgpu (#2685 ) * add a linearizer fuzzer failed for webgpu * CI specific	2023-12-08 22:52:34 -05:00
chenyu	fae5394845	validate llama output (#2681 ) * validate llama output * does not work with quantize	2023-12-08 16:42:01 -05:00
nickovaras	182d067407	Update yolov3.py (#2680 ) The current yolov3 example is broken with the current implementation of of fetch in the helpers. I was tempted to fix the helpers instead but that could have just as well broken other examples.	2023-12-08 12:59:38 -08:00
qazal	73b067f5ce	Bitcast p2 bfloat16 tests + clang fix (#2635 ) * add bf16 test support this model takes me almost a minute to download though: https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded/resolve/main/pytorch_model-00001-of-00014.bin?download=true: 100%\|█████████████████████████████\| 981M/981M [00:40<00:00, 24.2MB/s] * ensure we first load if it is bitcast to avoid taking the address of an rvalue * tiny bf16 in the cloud skip GPU * should skip torch lint * Revert "ensure we first load if it is bitcast to avoid taking the address of an rvalue" This reverts commit b86a28ab84bc1173764b2d480218e8de41a32390. * break the kernel * skip LLVM and GPU in CI * skip CUDA	2023-12-08 10:30:10 -08:00
qazal	a29538a094	green more dtypes tests (#2656 ) * universal test cast * disable div * midcast fixup * add 64-bit types * hack maximum * use Metal precise::sin instead of default This is because the default sin function defaults to single-percision math: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf#page=164 * LLVM code_for_op support for var_dtype * comment out maximum for now with a TODO explaining it * Revert "hack maximum" This reverts commit d170048c5fc029eab41f8472dd53f44c448370a1. * make the comment more specific * slightly more forgiving * ok does this fail in all backends? * weird its only Metal CI * add graph * skip sin of nan for CUDACPU This is only happening in the CUDACPU runtime and not CUDA itself. https://github.com/tinygrad/tinygrad/actions/runs/7128973726/job/19412000385#step:16:36 * METAL and CUDACPU behave differently in overflows with numpy running on CI * that skip is wrong * skip fp16 tests on LLVM similar to test_dtype original commit that skipped LLVM in CI `1826ff6b89` * remove all of sin from CUDACPU * limit range of values in CUDACPU and METAL CI * Revert "use Metal precise::sin instead of default" This reverts commit d960094d4a22fe69a9b6cb23ff7cd88e86a3c675. * change atol and rtol for Metal sin * METAL CI is more imprecise * cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-08 10:29:20 -08:00
George Hotz	4164d0ebbd	multitensor start (#2676 ) * multitensor work * early gen fixes the tests * atol for flaky test	2023-12-07 17:07:05 -08:00
Ahmed Harmouche	4b01839774	support vals on WebGPU, run more tests (#2668 ) * Vals on webgpu, run more tests * Skip slow tests, run symbolic ops tests * Balance out tests	2023-12-07 16:45:21 -08:00
geohotstan	d02ff21f1a	enable test_index and test_advancedindex (#2648 ) * enable test_index and test_advancedindex with pretty diff * removed contig * created set_ helper function * comment change * del empty line --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-07 19:44:39 -05:00
George Hotz	00d9eda961	FROM -> COPY, move vars_from_ast (#2675 )	2023-12-07 16:32:30 -08:00
chenyu	51af99367f	fix fuzz_linearizer using new device Buffer (#2674 )	2023-12-07 19:21:47 -05:00
nimlgen	650117a8f6	split large jit into several graphs (#2650 ) * jit graph split * update * that's fine, not all buffers are there now * use logariphmic tho, seems good * no keep it simple * add test * simplify * split graph when jit item cannot be graphed	2023-12-07 10:58:25 -08:00
qazal	29f2653d8d	add graph (#2670 )	2023-12-07 10:53:31 -08:00
chenyu	539b00a645	move llama getenv("JIT") from models to examples (#2671 ) Transformer class has a jit param so we should use that in the caller	2023-12-07 12:43:22 -05:00
chenyu	fd21eced74	reduce gpt2 kernel count in test_real_world (#2663 )	2023-12-06 21:57:04 -05:00
chenyu	371005cb2d	use one kvcache tensor in gpt2 instead of two separate caches (#2662 ) * use one kvcache tensor in gpt2 * test case * is None * better test cases	2023-12-06 20:59:17 -05:00
George Hotz	5a7b2ff1b2	masked shapetrackers (#2657 )	2023-12-06 11:22:26 -08:00
chenyu	b931a20882	minor shapetracker cleanup (#2652 )	2023-12-06 11:43:52 -05:00
qazal	c704a77ca0	green dtypes ALU tests (#2617 ) * dtypes alu test * those types don't exist in torch * floats * more tests * disable those * a couple unary tests * skip float16 tests in CI for GPU * fix LLVM bool add True+True=1+1=2 which truncates to False in native LLVM * remove hardcoded float for LLVM ALU fns * less sensitive atol for fp32, 1e-10 is flaky and sometimes failed even if you revert the merge commit for non-fp32 math, nothing has changed in our kernels for fp32. * return on overflows * fix CUDA exp2 * compute results of op regardless of bounds in a python backend * skip fp16 in GPU and CUDACPU * fuzz a smaller range in the float_midcast_int32 test I sampled this and we overflow ~70% of the time. because numpy behaves differently on different devices for overflows and Metal seems to do the same, I'm opting to eliminate the non-determinism here * remove CUDA exp2 overload it's already there now --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-06 08:15:46 -08:00
Amrit Sahu	71d989b476	adding test to cover #2644 failure (#2645 )	2023-12-06 11:00:30 -05:00
Ahmed Harmouche	50dcd532d5	Get all WEBGPU test_ops passing (#2646 ) * Get all WEBGPU tests passing * Custom render store is not needed in wgsl	2023-12-06 07:40:37 -08:00
chenyu	0978c24b8e	fast gpt2 embedding with variable bs=1 (#2596 )	2023-12-05 23:01:17 -05:00
chenyu	229ada5fe5	Gpt2 benchmark with HALF and BEAM (#2636 ) * benchmark gpt2 with half and beam * BEAM=4 * optional validation * green is good * we care	2023-12-05 22:15:16 -05:00
George Hotz	a73579919f	mlx benchmark, a lil slower than tg	2023-12-05 19:00:43 -08:00
Oleg Rybalko	7c427d738c	don't apply padding on script call (#2585 ) * don't apply padding on script call * no need for new param because batch_size value can be utilized to check * fixed argument naming	2023-12-05 16:34:10 -08:00
George Hotz	9d7ead84e1	hotfix: no need for model cache in examples/coder.py	2023-12-05 16:27:36 -08:00
qazal	be09cc87c1	Bitcast support / fast bf16 load (#2011 ) * bitcast renderers * fast llama load * make it one kernel * regression testing p1: re-enable test_dtype for all backends fix GPU * regression testing p2: fuzz all possible cases against numpy remove hancoded tests since the fuzzer covers them * define ushort * fix indent, probably need flake8 back for CI to catch --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-12-05 16:19:28 -08:00

1 2 3 4 5 ...

3056 Commits All Branches Search

3056 Commits

All Branches