tinygrad

Commit Graph

Author	SHA1	Message	Date
George Hotz	94599c0637	fixup ast in kernel to be MetaOps.SINK [run_process_replay] (#5424 ) * fixup ast in kernel to be MetaOps.SINK [run_process_replay] * fix tests * fix more tests	2024-07-12 14:01:03 -07:00
uuuvn	3cb94a0a15	Rename tinygrad/runtime/driver to support (#5413 )	2024-07-12 11:06:42 -07:00
wozeparrot	a02b38c0ac	download openimages by running it (#5396 )	2024-07-11 16:06:13 -07:00
wozeparrot	fa873df9c1	bring tinychat more inline with tinyos' version (#5358 )	2024-07-10 13:13:52 -07:00
George Hotz	c13da83f12	tests from lowerer branch (#5339 ) * tests from lowerer branch * Update test_image_dtype.py * Update test_image_dtype.py * Update test_image_dtype.py	2024-07-08 21:23:19 -07:00
nimlgen	51d6f372e4	nv get classes based on device (#5325 ) * nv get classes * support in mockgpu * choose sm based on gpu * fix * fix * fix arch	2024-07-08 18:25:05 +03:00
Tobias Fischer	0c3a35e5c2	Stable Diffusion v2 Inference (#5283 ) * model implementation * clip fix, more qol options	2024-07-03 22:47:10 -04:00
chenyu	b2c3a28a5e	nn.RMSNorm (#5272 ) the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize	2024-07-02 21:39:01 -04:00
Tobias Fischer	8c9c1cf62f	Pulled CLIP and UNet into Seperate Files (#5253 ) * pulled clip and unet into seperate files * reference cleanup, lru cache fix * better pool indexing	2024-07-01 22:33:01 -04:00
nimlgen	57e89645cd	hcq spec test (#5226 ) * start hcq spec test * more test * fixes * run on amd as well * test amdgpu exec * fix amd * amd mockgpu support sdma timestamp	2024-07-01 17:36:37 +03:00
George Hotz	14980f79dd	hotfix: unbreak llama	2024-06-30 15:27:54 -07:00
George Hotz	3df47bc21e	OpenELM + repeat_interleave (#5234 ) * start writing openelm * progress...hit bug * repeat_interleave support * gqa * add rotary embedding * spp * i think it runs correctly * broken * output is good now * cleanups * no io_uring on android	2024-06-30 15:18:39 -07:00
nimlgen	dd7eef7d71	libc defs to autogen (#5217 ) * libc defs to autogen * amd import libc * linter * better a bit * remove comment, check this * not hardcoded path	2024-06-29 14:37:33 +03:00
qazal	3e56c8422c	remu err handling (#5208 ) * add error handling * use pre release * minor * works	2024-06-28 13:15:18 +03:00
reddyn12	f1c7944c44	Fix batchnorm shapes for resnet.load_pretrained (#5167 ) * Fix batchnorm shapes * make it general reshape	2024-06-26 18:44:10 -04:00
nimlgen	69f116a7e1	nv/amd profiler (#4718 ) * nv/amd profiler * fix * fix * profile copies * profile logger * fixes * more fixes * less lines and fixes * fixes * some linter * back sync, no related change * fix gpu2cpu time def * simpler * linter * linter * docs * add add_event api	2024-06-23 17:10:12 +03:00
chenyu	e356807696	tinytqdm.set_description and tinytrange (#5101 )	2024-06-22 14:45:06 -04:00
chenyu	8080298739	s/tinytqdm/tqdm (#5103 ) except in unit test where tqdm is imported	2024-06-22 14:18:26 -04:00
chenyu	e468601226	update llama attention casting (#5096 ) * update llama attention casting updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention. * fix that	2024-06-22 10:57:17 -04:00
chenyu	8bd6cb9511	update llama model RMSNorm casting (#5095 ) following the original implementation, cast back to input dtype before multiplying weight. slightly faster https://github.com/meta-llama/llama/blob/main/llama/model.py	2024-06-21 23:02:04 -04:00
chenyu	0c857ae2d6	some onnx_ops cleanups (#5094 )	2024-06-21 22:01:32 -04:00
nimlgen	fb1bf48cfe	io_uring for copies from disk (#5035 ) * exp uring * fixes and old version * nv * cleaner * cmp vs aio * fix * no lib * fix nv * linter * disk_speed_test now runs default * fixes * uring -> io_uring * linter happy * get_temp_buf comment added * tiny nits * put wait back * test runs everywhere * remove consts * remove mmap consts * do not require iouring to run test, they are generic	2024-06-21 11:36:51 +03:00
chenyu	f6d6760f71	don't cast tuple to list before creating Tensor (#5071 ) Tensor constructor supports creating from tuple now	2024-06-20 13:32:56 -04:00
chenyu	e2c5054bdd	update resnet.load_from_pretrained (#5040 )	2024-06-18 16:29:22 -04:00
chenyu	a3ed4176c8	use tinytqdm in active tests and examples (#5038 ) * use tinytqdm in active tests and examples stress test this before 0.9.1 * no set_description	2024-06-18 16:01:19 -04:00
Junjun Dong	c8cd6e725c	Remove BinaryOps.SUB. Replace SUB by ADD and NEG in all tests. Regenerate dataset (#4977 ) * feat: remove BinaryOps.SUB * remove SUB in test_early_end_local * regenerate dataset. remove SUB in test_linearizer_* * reenable overflow tests * simplify tensor.sub function by returning a+(-b) * remove whitespaces --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-18 09:06:13 -04:00
chenyu	67e8df4969	remove numpy from dtype (#4969 ) replaced all dtype.np with _to_np_dtype defined in tensor.py. after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer	2024-06-14 15:38:45 -04:00
George Hotz	9823752397	make uops.add private (#4950 ) * make uops.add private * modernize all tests	2024-06-14 03:23:25 -07:00
Jhenner Tigreros	dc9e9e4363	Convert BinaryOps.DIV to UnaryOps.RECIP and BinaryOps.IDIV (#4887 ) * Create UnaryOps.RECIP and BinaryOps.IDIV and changing uses of BinaryOps.DIV * Delete unused import * Add cstyle renderer * Fix formatting text * Fix test error due to bad implementation of renderer * Add PTX support * Add RECIP to LLVMIR * Remove BinaryOps.DIV from symbolic test * Change some test and fix C floor division * Change references to DIV for the RECIP or IDIV * Add mimic idiv for symbolic test * Restore floor * Mimic idiv * cast to int * Fix some test and renderer * Remove DIV for render nodes * Resolve issue with div * Add TestRenderer * Fix test * fix error * Fix PAD test * Fix div implementation * Remove DIV * Add upcast to rshift, due to use of MUL and RECIP on DIV * Fix linter * Remove complete BinaryOps.DIV * Fix lint * Fix some test * Revert mul modification * Fix tests * Fix CLANG for uops * Revert IDIV function * Minor fix * modify pattern matching rule to support nan * Fix UNSAFE_PADS_OPS to add UnaryOps.RECIP * Remove const folding for IDIV and fix PTX * Complete remove IDIV from extra * Remove test_div from TestFloatUOps due to test on recip * Fix linearizer * fix * Fix test_22 * Fix llvm * Apply trunc function for llvmlit * use floor instead of trunc * Use correct type * Generate new fuzz db * Fix rshift, do not cast to float to support idiv * Return upcast=false to rshift * Add to unsafepad BinaryOps.IDIV * Remove RECIP override for CUDA * add atol / rtol for the test * Remove cast to int on IDIV * Regenerate sops * delete sops.gz * regenerate * regenerate * regenerate * Reduce margins * pass atol and rtol as parametersg for _test_metrics * regenerated dataset * Regenerate * Remove duplicated * Revert changes on extra * Remove changes extra and NOQA for test * Remove E501 * Remove and change line * Remove E501 * Fix atan2 * Revert import and E501 * Remove E501 * Add hrcp to halp ops * Remove 1 of hrcp * Remove last DIV and add type check on uops for IDIV * Fix new tests * Fix tests and custom function * Regenerate dataset * Regenerate dataset * Revert dataset * Change generate dataset script * Remove line * Change IDIV, type checker validate if x,y and z are int --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-06-14 02:43:46 -07:00
George Hotz	e63701fbd4	RDNA3 assembly support (#3637 ) * amazing that i can use comgr for this * compile empty kernel * cleanups * tiny_add compiles * ugh * more work * put that in extra	2024-06-13 09:09:24 +02:00
nimlgen	fd071ba27e	amd mockgpu correct timer resolution (#4942 ) * amd mockgpu correct timer resolution * test it	2024-06-13 10:07:34 +03:00
Elias Wahl	d2e3c391e8	Residual in MLM loss + Change default steps (#4935 ) * Residual in mlm loss * Reduce default steps to 160K * 24 * oops * comment	2024-06-12 16:09:18 -04:00
nimlgen	58cf6eaba9	add missing dir level for amd mockgpu (#4911 )	2024-06-11 18:35:04 +02:00
nimlgen	654a8b9ef7	retire hsa (#4885 ) * retire hsa * EMULATE_AMD	2024-06-09 11:33:03 +03:00
Nik	085c0bbf6b	add mlperf train subset of openimages (#4841 )	2024-06-05 10:10:11 -04:00
Elias Wahl	04e237328b	Refactor to class style (#4804 )	2024-06-04 14:08:31 -07:00
chenyu	3afc914617	CMPEQ -> CMPNE and make it safe to pad (#4818 ) * CMPNE * new dataset	2024-06-03 18:02:15 -04:00
nimlgen	7384ee08a0	amd cleanup sdma (#4796 ) * amd cleanup sdma * faster enqueue for sdma * typo * remove commnted lines * fix overrun check * flushhdp better command	2024-06-01 17:06:44 +03:00
nimlgen	bd2e7c8b31	amd registers from file (#4778 ) * amd registers from file * remove commentes * linetr * no off	2024-05-31 18:48:57 +03:00
chenyu	e614b7c696	docs: showcase remove mnist_gan and add conversation.py (#4757 ) fixed both examples, and i think it's better to show conversation	2024-05-28 11:09:26 -04:00
nimlgen	50e95b8212	nv qmd sync (#4740 ) * qmd sync * better hcq * mockgpu support chain qmd * fix mockgpu & linter	2024-05-27 18:51:30 +03:00
nimlgen	c87b066b66	optimize nv sync (#4729 ) * optimize nv sync * sdma signal without wfi * nv mockgou support * sep change	2024-05-25 23:10:41 +03:00
chenyu	31358cbea5	change Tensor.stack to method (#4719 )	2024-05-24 17:04:19 -04:00
qazal	c170ddceaf	fix commavq benchmark (#4712 ) * fix _slice and assert explicit device * with _slice	2024-05-24 19:40:57 +03:00
chenyu	47aba47f64	update Torch.gather api (#4692 ) * update Torch.gather api gather(self, dim, index) to match torch * fix that	2024-05-22 21:54:06 -04:00
chenyu	792a494eb8	fix various examples (#4691 ) * fix examples that used ax1 and ax2 for transpose * fix that * update those	2024-05-22 20:43:21 -04:00
chenyu	225dcab3be	prepend `_` to broadcast_shape and deepwalk (#4683 ) * prepend `_` to broadcast_shape and deepwalk internal only * that too	2024-05-22 16:39:05 -04:00
chenyu	ae861325ce	update llama sample for mac 32 input buffer limit (#4662 ) set default sampling params to function call to 0, and top k in llama3 to 25.	2024-05-20 17:23:39 -04:00
wozeparrot	b144d4b460	new llama3 example (#4576 )	2024-05-19 22:42:23 -07:00
nimlgen	daf57af3eb	move tc to renderers (#4631 ) * move tc to renderers * missed import * fix typo * fix * fix imports * remove from tests * fix 4607 * nv emulate timestamp * time is int * correct time	2024-05-18 00:36:29 +03:00

1 2 3 4 5 ...

711 Commits