tinygrad

Commit Graph

Author	SHA1	Message	Date
cloud11665	e8a23d4331	there is a better way to do that! (#950 )	2023-06-06 15:23:30 -07:00
Diogo	3bb38c3518	limit split to 1 due to windows path containing : (#944 )	2023-06-06 10:27:54 -07:00
George Hotz	b78addf2f8	Whisper (#919 ) * no whispering yet * whispering * live whisper * small support	2023-06-03 18:55:14 -07:00
George Hotz	ed1963b899	Fast DiskTensor to other Tensor (#916 ) * make disktensors fast * loading * loader for sd and llama	2023-06-03 12:25:41 -07:00
George Hotz	d58586bb17	safetensors! (#903 ) * safetensors test * safe_save * load back with real safetensors * bugfix in device name. add simple torch_load * it works for llama, but it's slower... * mmap * no intermediate * load mmaped * readinto speed * not ready yet * revert that	2023-06-02 13:41:09 -07:00
George Hotz	8a928ed2f3	nn init matches torch (#901 )	2023-06-01 21:24:11 -07:00
Peter Ross	27845fd3a3	train_efficientnet: only import datasets.imagenet when IMAGENET is set (#899 ) make it work out of the box for new users. the default configuration of train_efficientnet is to use the smaller cifar dataset. import datasets.imagenet tries to open imagenet_class_index.json and will fail, unless user has already downloaded it.	2023-06-01 19:19:52 -07:00
George Hotz	1b42b4e1b8	fix examples/hlb_cifar10.py	2023-06-01 19:03:17 -07:00
wozeparrot	0fc4cf72a2	feat: add train scaffolding (#859 )	2023-05-30 07:10:40 -07:00
Jacky Lee	5d212864b5	Add MLPerf UNet3D model (#775 ) * Add ResNet inference test and cannon * Test with ResNet50 * test_car works with resnet fix * Add KiTS19 dataset * KiTS19: Implement iterate * No batch load for this dataset * Save results on iterate * Implement dice score * Add data prep and eval functions * Resolve shape issue * Conversion works but wrong values * Segfaults when load_from_pretrained is called * Fix segfault and assign properly * Final result generated, though very slow * Store and load final result to save time * Fix typo in finalize * Score computes * More bug fixes, dice score is very low * Working broken code * Assign output values to result * Getting a much higher score now * Fix dataset preprocessing * Mean DICE score of 88.5 * Ugh, typo * Attempt to reimplement model * Rename layers * Tiny model works, kinda * Accuracy? gone * Implement InstanceNorm and match torch * Test instance norm 2d and 3d * Combined input block with downsample block * Tiny model works, support strided convtranspose * Commands to download dataset * Clean up a bit * unet3d_v2 -> unet3d * Remove duplicated code * Oops, put tests back	2023-05-28 20:38:19 -07:00
Sohaib	65d09031f2	add retinanet with resnet backbone (#813 ) * add retinanet with resnet backbone * adds resnext to support loading retinanet pretrained on openimages * object detection post processing with numpy * data is downloaded and converted to coco format with fiftyone * data loading and mAP evaluation with pycocotools * remove fiftyone dep * * eval freq * fix model timing * del jit for last batch * faster accumulate	2023-05-28 20:20:16 -07:00
wozeparrot	67de3aa1de	Add mlperf bert model (#803 ) * feat: add mlperf bert model * feat: switch to nn.Embedding * clean+fix: fix formatting * feat: add simple downloader * feat: metrics * feat: don't actually need exact match * feat: doing a run * feat: set eps on the layernorms * clean+fix: cleaner impl + hopefully fixed * feat: move dataset initialization into iterate * feat: move tokenizer out of iterate * clean+fix: cleaner + working * clean: cleanup * fix: fix metrics * feat: need to use original bert gelu + download vocab * feat: make directory if it doesn't exist yet * feat: jit go brrr	2023-05-27 14:53:32 -07:00
wozeparrot	0dc333cfab	Promote Embedding to `nn` (#798 ) * feat: promote Embedding to nn * fix: fix failing test * feat: add test with jit * feat: rewrite embedding to no longer need stacked for loops * clean+fix: don't know how that happened	2023-05-25 18:39:45 -07:00
George Hotz	a968c4c3a4	Cleanup mlperf (#797 ) * improve factorization * cleanups	2023-05-25 11:36:43 -07:00
wozeparrot	01ae45a43c	Add mlperf RNN-T model (#782 ) * feat: initial rnn-t * feat: working with BS>1 * feat: add lstm test * feat: test passing hidden * clean: cleanup * feat: specify start * feat: way faster lstm & model * fix: default batch size * feat: optimization * fix: fix metrics * fix: fix feature splicing * feat: cleaner stacktime * clean: remove unused import * clean: remove extra prints * fix: fix tests and happy llvm * feat: have the librispeech dataset in its own dir * clean: unused variable * feat: no longer need numpy for the embedding + slightly more memory efficient lstm * fix: forgot to remove something that broke tests * feat: use relative paths * feat: even faster * feat: remove pointless transposes in StackTime * fix: correct forward * feat: switch to soundfile for loading and fix some leaks * feat: add comment about initial dataset setup * feat: jit more things * feat: default batch size back to 1 larger than 1 is broken again :( and even in the reference implementation it gives worse results	2023-05-25 00:41:21 -07:00
George Hotz	e0b2035023	fast imagenet eval, gets 76.14% across the set	2023-05-13 21:18:31 -07:00
George Hotz	b705510d5c	getting 77% on imagenet eval	2023-05-13 07:46:27 -07:00
George Hotz	810f03dafa	conv3d + unet3d (#772 ) * conv3d, needs test * test passes, padding wrong on unet * unet3d * no conv3d on images	2023-05-12 13:54:07 -07:00
George Hotz	46d419060b	start on mlperf models	2023-05-10 16:30:49 -07:00
Jacky Lee	d13629cb26	ResNet: match implementation with Nvidia and PyTorch (#770 ) * Match ResNet implementation with pytorch and nvidia * Reduce number of Epochs	2023-05-10 09:01:22 -07:00
George Hotz	e4db0c820f	hlb_cifar10 init from torch weights	2023-04-18 19:09:13 -07:00
George Hotz	732884653c	osx in hlb_cifar10_torch	2023-04-14 13:12:08 -07:00
George Hotz	584ee6f616	don't graph consts	2023-04-14 03:32:20 -07:00
George Hotz	9a39ebefde	hlb_cifar10_torch gets 80%	2023-04-14 02:47:03 -07:00
Jacky Lee	06ed958abd	Fix train_resnet example (#744 ) * Fix ResNet example * Scientific notation	2023-04-12 13:48:39 +05:30
Jacky Lee	7a45b989a1	Device: make GPU default and METAL/CUDA if possible (#732 ) * Make GPU the default device * Compile EfficientNet with CPU * don't print device * use METAL and CUDA if possible * Revert some changes to workflow * Fix import error when checking device availability * device lookup is now optional * hopefully fix linter and tests * fix workflow * Skip device if not available * don't change default if CPU=1 * simplify device selection * Default to CPU if no GPU * don't print device name... * No need to change default in llama * Make GPU the default device * Compile EfficientNet with CPU * don't print device * use METAL and CUDA if possible * Revert some changes to workflow * Fix import error when checking device availability * device lookup is now optional * hopefully fix linter and tests * fix workflow * Skip device if not available * don't change default if CPU=1 * simplify device selection * Default to CPU if no GPU * don't print device name... * No need to change default in llama * run github workflow * Fix logic to select default * pass if an error occurs * use separate function for try except	2023-04-04 09:41:52 +05:30
Jacky Lee	156640e90d	Permute examples (#731 ) * examples: use permute instead of transpose * Use transpose but change args	2023-03-29 05:07:06 +04:00
George Hotz	b12b60af20	fix binop, other tests failure (#723 ) * fix binop, other tests failure * that was a bad idea * better layernorm * inference kernel count tests * new style reshape pushing * fixup replacement * 199 kernels is okay. fix flops * push reshape through unaryops only * GRAPH=2 draws the phantom ops * found resnet issue * non working test * mul is cheaper than div * OPT inflation * SHUFFLE_PAD_OPS in OPT=2	2023-03-22 18:15:07 -07:00
Fernando Vidal	73bd0b217b	add int64 as supported dtype from numpy (#699 ) * add int64 as supported dtype from numpy Without this, examples/transformer.py didn't run. With this change it runs successfully. * Update helpers.py * Update transformer.py * Update training.py	2023-03-18 17:15:04 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
Kirill	26a3888ab8	Fix llama 13B RAM usage (#710 )	2023-03-18 13:50:09 -07:00
Kirill	0fe5014b1f	Use pathlib (#711 ) * Use pathlib in llama * Use pathlib in stablediffusion	2023-03-18 13:49:21 -07:00
Kirill	0532025b04	Fix llama 13B weights loading (#700 ) * Fix llama 13B weights loading * refactor more * add test * test storage offset * fix spacing * fix strides * llama 13B working? * yolo? * better test for seeks	2023-03-15 08:59:52 -07:00
Ayushman Kumar	e28bd11ff1	Cast Tensor data to float32 (#703 ) * Cast Tensor data to float32 * astype('float32') --> Tensor.randn()	2023-03-14 23:09:41 -07:00
Jacky Lee	5e820818e9	Cast image to float32 (#702 )	2023-03-14 08:13:19 -07:00
George Hotz	fe0e8a306f	jittable llama	2023-03-12 14:15:04 -07:00
George Hotz	15e0b56e39	compile works (#688 ) * compile works * runtimes * line count * fix custom, to tg dtype * meh, that's fine with lazy import	2023-03-12 11:01:25 -07:00
Kirill	af7745073f	Add comments to SD (#686 ) * Add explanation for empty lambdas * Fix my_unpickle if pytorch_lightning is installed * oops	2023-03-12 10:56:49 -07:00
George Hotz	046b3952c3	get_state_dict	2023-03-11 23:46:53 -08:00
George Hotz	803b0aef28	track memory for numpy/torch	2023-03-11 20:39:10 -08:00
George Hotz	61071f881a	fix bug, and add unit test to catch failure	2023-03-11 16:57:25 -08:00
George Hotz	3ec457248c	failing llama test	2023-03-11 16:28:10 -08:00
George Hotz	8aa63847c7	llama: up max tokens to 1000	2023-03-11 13:39:33 -08:00
George Hotz	5ea44cefcc	llama: add lexie personality	2023-03-11 10:23:33 -08:00
George Hotz	c908f911a7	llama defaults to metal on osx	2023-03-11 09:30:13 -08:00
George Hotz	5e1380df6a	profiling llama + cache is_contiguous	2023-03-11 08:23:21 -08:00
George Hotz	f3ac52aee8	Mypyc (#680 ) * building shapetracker * default ENABLE_METHOD_CACHE * symbolic compiles * improve types * tensor compiles * oops, that's a bug * best of both worlds * find legit typing bugs * pad2d can take list or tuple * sub 200ms when compiled	2023-03-11 07:33:30 -08:00
George Hotz	b1206bcb18	third try at torch loading (#677 ) * third try at torch loading * numpy fixed * fix enet compile * load_single_weight supports empty weights * oops, CPU wasn't the default * so many bugs	2023-03-10 19:11:29 -08:00
George Hotz	8bf75a7fdd	fix stable diffusion and CI	2023-03-10 17:48:12 -08:00
George Hotz	4780f9a6df	llama runs (slowly) in master	2023-03-10 17:36:51 -08:00

1 2 3 4 5 ...

300 Commits