tinygrad

Commit Graph

Author	SHA1	Message	Date
chenyu	0fb1d47aa0	two linearizer fuzzer failed test case for webgpu (#2685 ) * add a linearizer fuzzer failed for webgpu * CI specific	2023-12-08 22:52:34 -05:00
chenyu	fae5394845	validate llama output (#2681 ) * validate llama output * does not work with quantize	2023-12-08 16:42:01 -05:00
nickovaras	182d067407	Update yolov3.py (#2680 ) The current yolov3 example is broken with the current implementation of of fetch in the helpers. I was tempted to fix the helpers instead but that could have just as well broken other examples.	2023-12-08 12:59:38 -08:00
qazal	73b067f5ce	Bitcast p2 bfloat16 tests + clang fix (#2635 ) * add bf16 test support this model takes me almost a minute to download though: https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded/resolve/main/pytorch_model-00001-of-00014.bin?download=true: 100%\|█████████████████████████████\| 981M/981M [00:40<00:00, 24.2MB/s] * ensure we first load if it is bitcast to avoid taking the address of an rvalue * tiny bf16 in the cloud skip GPU * should skip torch lint * Revert "ensure we first load if it is bitcast to avoid taking the address of an rvalue" This reverts commit b86a28ab84bc1173764b2d480218e8de41a32390. * break the kernel * skip LLVM and GPU in CI * skip CUDA	2023-12-08 10:30:10 -08:00
qazal	a29538a094	green more dtypes tests (#2656 ) * universal test cast * disable div * midcast fixup * add 64-bit types * hack maximum * use Metal precise::sin instead of default This is because the default sin function defaults to single-percision math: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf#page=164 * LLVM code_for_op support for var_dtype * comment out maximum for now with a TODO explaining it * Revert "hack maximum" This reverts commit d170048c5fc029eab41f8472dd53f44c448370a1. * make the comment more specific * slightly more forgiving * ok does this fail in all backends? * weird its only Metal CI * add graph * skip sin of nan for CUDACPU This is only happening in the CUDACPU runtime and not CUDA itself. https://github.com/tinygrad/tinygrad/actions/runs/7128973726/job/19412000385#step:16:36 * METAL and CUDACPU behave differently in overflows with numpy running on CI * that skip is wrong * skip fp16 tests on LLVM similar to test_dtype original commit that skipped LLVM in CI `1826ff6b89` * remove all of sin from CUDACPU * limit range of values in CUDACPU and METAL CI * Revert "use Metal precise::sin instead of default" This reverts commit d960094d4a22fe69a9b6cb23ff7cd88e86a3c675. * change atol and rtol for Metal sin * METAL CI is more imprecise * cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-08 10:29:20 -08:00
George Hotz	4164d0ebbd	multitensor start (#2676 ) * multitensor work * early gen fixes the tests * atol for flaky test	2023-12-07 17:07:05 -08:00
Ahmed Harmouche	4b01839774	support vals on WebGPU, run more tests (#2668 ) * Vals on webgpu, run more tests * Skip slow tests, run symbolic ops tests * Balance out tests	2023-12-07 16:45:21 -08:00
geohotstan	d02ff21f1a	enable test_index and test_advancedindex (#2648 ) * enable test_index and test_advancedindex with pretty diff * removed contig * created set_ helper function * comment change * del empty line --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-07 19:44:39 -05:00
George Hotz	00d9eda961	FROM -> COPY, move vars_from_ast (#2675 )	2023-12-07 16:32:30 -08:00
chenyu	51af99367f	fix fuzz_linearizer using new device Buffer (#2674 )	2023-12-07 19:21:47 -05:00
nimlgen	650117a8f6	split large jit into several graphs (#2650 ) * jit graph split * update * that's fine, not all buffers are there now * use logariphmic tho, seems good * no keep it simple * add test * simplify * split graph when jit item cannot be graphed	2023-12-07 10:58:25 -08:00
qazal	29f2653d8d	add graph (#2670 )	2023-12-07 10:53:31 -08:00
chenyu	539b00a645	move llama getenv("JIT") from models to examples (#2671 ) Transformer class has a jit param so we should use that in the caller	2023-12-07 12:43:22 -05:00
chenyu	fd21eced74	reduce gpt2 kernel count in test_real_world (#2663 )	2023-12-06 21:57:04 -05:00
chenyu	371005cb2d	use one kvcache tensor in gpt2 instead of two separate caches (#2662 ) * use one kvcache tensor in gpt2 * test case * is None * better test cases	2023-12-06 20:59:17 -05:00
George Hotz	5a7b2ff1b2	masked shapetrackers (#2657 )	2023-12-06 11:22:26 -08:00
chenyu	b931a20882	minor shapetracker cleanup (#2652 )	2023-12-06 11:43:52 -05:00
qazal	c704a77ca0	green dtypes ALU tests (#2617 ) * dtypes alu test * those types don't exist in torch * floats * more tests * disable those * a couple unary tests * skip float16 tests in CI for GPU * fix LLVM bool add True+True=1+1=2 which truncates to False in native LLVM * remove hardcoded float for LLVM ALU fns * less sensitive atol for fp32, 1e-10 is flaky and sometimes failed even if you revert the merge commit for non-fp32 math, nothing has changed in our kernels for fp32. * return on overflows * fix CUDA exp2 * compute results of op regardless of bounds in a python backend * skip fp16 in GPU and CUDACPU * fuzz a smaller range in the float_midcast_int32 test I sampled this and we overflow ~70% of the time. because numpy behaves differently on different devices for overflows and Metal seems to do the same, I'm opting to eliminate the non-determinism here * remove CUDA exp2 overload it's already there now --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-06 08:15:46 -08:00
Amrit Sahu	71d989b476	adding test to cover #2644 failure (#2645 )	2023-12-06 11:00:30 -05:00
Ahmed Harmouche	50dcd532d5	Get all WEBGPU test_ops passing (#2646 ) * Get all WEBGPU tests passing * Custom render store is not needed in wgsl	2023-12-06 07:40:37 -08:00
chenyu	0978c24b8e	fast gpt2 embedding with variable bs=1 (#2596 )	2023-12-05 23:01:17 -05:00
chenyu	229ada5fe5	Gpt2 benchmark with HALF and BEAM (#2636 ) * benchmark gpt2 with half and beam * BEAM=4 * optional validation * green is good * we care	2023-12-05 22:15:16 -05:00
George Hotz	a73579919f	mlx benchmark, a lil slower than tg	2023-12-05 19:00:43 -08:00
Oleg Rybalko	7c427d738c	don't apply padding on script call (#2585 ) * don't apply padding on script call * no need for new param because batch_size value can be utilized to check * fixed argument naming	2023-12-05 16:34:10 -08:00
George Hotz	9d7ead84e1	hotfix: no need for model cache in examples/coder.py	2023-12-05 16:27:36 -08:00
qazal	be09cc87c1	Bitcast support / fast bf16 load (#2011 ) * bitcast renderers * fast llama load * make it one kernel * regression testing p1: re-enable test_dtype for all backends fix GPU * regression testing p2: fuzz all possible cases against numpy remove hancoded tests since the fuzzer covers them * define ushort * fix indent, probably need flake8 back for CI to catch --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-12-05 16:19:28 -08:00
George Hotz	232ed2af3f	more test cleanups (#2631 ) * more test cleanups * move test example back	2023-12-05 16:17:57 -08:00
chenyu	a63f48d3db	gpt2 half for kvcache and output logits (#2630 ) * gpt2 more half * hlaf is fine after softmax	2023-12-05 16:54:56 -05:00
George Hotz	0be5d16950	only 62 gflops (#2629 )	2023-12-05 13:28:24 -08:00
wozeparrot	6d58c19736	binaryops xor (#2627 ) * feat: initial xor * feat: numpy xor * feat: llvm xor * feat: quick test for xor * feat: slightly working xor in torch * feat: xor in tensor * feat: slightly better test	2023-12-05 13:21:42 -08:00
George Hotz	c53e854687	cast image doesn't work on nvidia (#2626 ) * cast image doesn't work on nvidia * hmm, interpreteds use buffer size 0 * fix type * no lru	2023-12-05 12:48:19 -08:00
George Hotz	8c67eb1c92	GPT bugfixes (#2624 ) * simple fixes * fix exp2 * fixed * parallel beam for CUDA * fix image dtypes	2023-12-05 11:42:28 -08:00
chenyu	8903a40541	update the onnx test so cuda local run passes (#2623 )	2023-12-05 14:04:17 -05:00
George Hotz	ec594cf03c	hotfix: tasteful ctrl-c in parallel beam	2023-12-05 18:20:10 +00:00
George Hotz	35b5e95097	parallel beam search (#2610 ) * better print * fix beam search with vars * cleanups * parallel is not default * restore that * bugfix * cleanups * bugfix	2023-12-05 10:09:45 -08:00
chenyu	9996f1adf9	no document prs (#2622 )	2023-12-05 13:05:36 -05:00
chenyu	dd8b4632a4	regression test for reshape fix #2616 (#2620 )	2023-12-05 11:46:33 -05:00
chenyu	c257a0dd99	minor reshape cleanups (#2619 ) * minor reshape cleanups * mea culpa	2023-12-05 11:23:17 -05:00
Amrit Sahu	a6b68e8e40	fix for false merge (#2616 )	2023-12-05 10:47:18 -05:00
geohotstan	fc00da538d	helper functions for test_indexing.py (#2615 ) * add some helpers * I think it should all work.. * fixed get_set_tensor * done * del import * bye bye typing * style * remove empty lines lol * deleted dtype arg * del trailing space	2023-12-05 02:00:41 -05:00
chenyu	7322ab8dfd	onnx tests with different dtypes (#2612 )	2023-12-05 00:04:08 -05:00
geohotstan	f12bcccb87	[ready] refactor getitem round 2 :D (#2568 ) * new getitem * go * add temporary simple tests * better * comments * WOW that took awhile * save 1 line lol * work * still need to add comprehensive tests, but i think getitem looks nice :D * GIMME GREEN CI CHECKMARK PLS * try.. * k idk * added tests for errors * fixed small hack * added tests * almost good * try no contig? * yay no more contig + comments and spacing * finishing touches (comments) * revert regex unittests lol * add suggested change * oops I fell asleep yesterday	2023-12-04 22:36:32 -05:00
chenyu	6ba6349c97	JIT=0 llama.py should not jit (#2609 )	2023-12-04 20:21:07 -05:00
George Hotz	41d696145d	hotfix: forking works okay in HIP now	2023-12-04 21:59:18 +00:00
George Hotz	09b6e254a3	hip compile speed (#2606 )	2023-12-04 13:47:40 -08:00
nimlgen	19a0a839db	fix used resources in metal graph (#2604 )	2023-12-04 13:45:51 -08:00
Yixiang Gao	fde44aed76	update hip_matmul with new abstraction (#2605 )	2023-12-04 13:37:10 -08:00
George Hotz	5540f6e966	hotfix: make_half4	2023-12-04 09:58:34 -08:00
Amrit Sahu	e8d6a6ef2e	view.reshape without symbolic (#2218 ) * handle reshape of contiguous subparts with explicit mask * remove the add/remove ones logic in reshape * accomodate ones in accumulate logic * make multiply commutative * fix linting * make mypy happy * add test for commutative mul * merge dimensions in shape_strides for 1 range masks * add offsets for merging * fix linting * add back explicit 1 reshapes * fix mypy errors * fix accumulate by includng state * include non-zero stride dimension in acc * small cleanup * more compact to_shape_strides * more logical cleanup * compress more * compress reshape mask * adding some comments * small bug fix * improve test coverage * remove explicit add remove ones * small bug in test * enable test_reshape_splitting_combining * small fix * 10 lines less to_shape_strides * shorten reshape mask * some more cleanup * more cleanup * introduce some symbols for compactness * more symbols * more cleaner * lessen symbols, it became less readable * remove merge_views from view.reshape * change to_shape_strides to _merge_dims * improve readability * fix corner case * cleanup * better handling of 1 <= Variable('i',1,10) & new_dim = Variable('i',1,10) * rewrite _reshape_mask for readability * fix white space * add comment * nice shorthands for readability * add proof in docs * small nit --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-04 12:46:53 -05:00
George Hotz	664475f247	vals is an argument (#2599 ) * vals is an argument * don't even know how that's legal python	2023-12-03 21:50:43 -08:00

1 2 3 4 5 ...

3032 Commits All Branches Search

3032 Commits

All Branches