Commit Graph

484 Commits

Author SHA1 Message Date
George Hotz 30f8132646 reorder ops in ops cpu 2020-12-30 11:00:01 -05:00
George Hotz e5b2803b5d ops in readme 2020-12-30 10:48:55 -05:00
George Hotz 2d44bf7f1a Dot -> Matmul 2020-12-30 10:41:51 -05:00
George Hotz 10fc3ff5b9 cleaner syntax 2020-12-30 10:35:37 -05:00
George Hotz fcfe3dae01 write slice for CPU 2020-12-30 10:32:53 -05:00
George Hotz 47504004fd ane ops 2020-12-29 18:00:53 -05:00
George Hotz 1f5c9618ef refactor in readme and issue #225 2020-12-29 17:30:04 -05:00
George Hotz f9170505b3 if you like your transformers twice as slow, use the GPU 2020-12-29 17:14:23 -05:00
George Hotz 6a6a82e999 support multidot on GPU 2020-12-29 16:56:30 -05:00
George Hotz 27208d729b add GPU max thanks to marcelbischoff 2020-12-29 16:44:14 -05:00
George Hotz 4bbad11afe link to papers 2020-12-29 14:15:46 -05:00
George Hotz 3f8e137b6f extra/transformer 2020-12-29 14:14:00 -05:00
George Hotz c4e7a1ae59 accessors are dumb 2020-12-29 14:10:26 -05:00
George Hotz fb6aaefb9b save 2 lines 2020-12-29 14:02:50 -05:00
George Hotz ea341c84fe logsoftmax good, div bad 2020-12-29 13:59:39 -05:00
George Hotz f18801c7db simple pool. swimming is very easy now 2020-12-29 13:48:50 -05:00
George Hotz 8f9232d59b readmee 2020-12-29 13:40:34 -05:00
George Hotz 837aaacfbf Unpad2D on GPU: 2020-12-29 13:16:14 -05:00
George Hotz 02655c07d5 break maxpool2d on GPU 2020-12-29 13:05:57 -05:00
George Hotz 061e37de39 touchups 2020-12-29 12:41:21 -05:00
George Hotz a2e6562330 fix max op, less lines 2020-12-29 10:47:04 -05:00
Marcel Bischoff dc8fa7999c
Transpose on GPU (#221)
* 2serious

* load/save

* fixing GPU

* added DEBUG

* needs BatchNorm or doesn't learn anything

* old file not needed

* added conv biases

* added extra/training.py and checkpoint

* assert in test only

* save

* padding

* num_classes

* checkpoint

* checkpoints for padding

* training was broken

* merge

* rotation augmentation

* more aug

* needs testing

* streamline augment, augment is fast thus bicubic

* tidying up

* transformer eval

* axis=-1

* transpose

* test for permutation using torch.movedims

* another test

* line
2020-12-29 10:40:11 -05:00
George Hotz 36579f66bf max op 2020-12-28 23:54:52 -05:00
George Hotz bcb3ceeca3 set training in functions 2020-12-28 22:45:46 -05:00
George Hotz 51bf164b72 dropout, training 2020-12-28 22:12:23 -05:00
George Hotz 7b8fee038d it works! forgot the sqrt 2020-12-28 16:23:52 -05:00
George Hotz 1faf05ef67 ahh, it's better if i don't train the embedding 2020-12-28 16:07:02 -05:00
George Hotz c3832e1bde hmm, fix layernorm to not be batchnorm and it breaks 2020-12-28 13:06:21 -05:00
George Hotz 2e89e75dcb layernorm fixes transformer instability 2020-12-28 12:58:15 -05:00
George Hotz 628d21f899 doc touchup 2020-12-28 10:45:26 -05:00
George Hotz fafece9db7 avgpool2d is a second class op 2020-12-28 10:41:59 -05:00
George Hotz 593233b668 log and exp are first class ops 2020-12-28 10:00:30 -05:00
Marcel Bischoff ffff98db78
Evaluation in Transformers (#218)
* 2serious

* load/save

* fixing GPU

* added DEBUG

* needs BatchNorm or doesn't learn anything

* old file not needed

* added conv biases

* added extra/training.py and checkpoint

* assert in test only

* save

* padding

* num_classes

* checkpoint

* checkpoints for padding

* training was broken

* merge

* rotation augmentation

* more aug

* needs testing

* streamline augment, augment is fast thus bicubic

* tidying up

* transformer eval
2020-12-28 09:24:51 -05:00
George Hotz 65b07d2f4f fix onehot embed 2020-12-27 18:50:38 -05:00
George Hotz d864e1c71a transformer is training 2020-12-27 18:46:32 -05:00
George Hotz a361ef6861 fixup training loop 2020-12-27 18:35:56 -05:00
George Hotz f15bec6dbc make multidot work on CPU 2020-12-27 17:25:37 -05:00
George Hotz 131e04c90c cpu only decorator 2020-12-27 17:18:55 -05:00
George Hotz 2f1b2c0a3b add transpose, start on transformer 2020-12-27 16:59:12 -05:00
gamwe6 d379502c04
Cleaning (#211)
* Cleaned

* Brought the lines into line

Co-authored-by: gamwe6 <gamwe6@users.noreply.github.com>
2020-12-27 09:58:51 -05:00
George Hotz 8a335f03ad clock speed 32x32 2020-12-22 18:18:52 -05:00
George Hotz aae2e35208 benchmarking 512x512 GEMM 2020-12-22 18:01:36 -05:00
George Hotz bd18e03138 conv from weights works 2020-12-22 17:42:17 -05:00
George Hotz b3cf53e39b more docs 2020-12-22 17:14:38 -05:00
George Hotz 4065eae0fb docs for tensor stride 2020-12-22 17:06:36 -05:00
George Hotz 6fb127d5c7 l2 cache note 2020-12-22 16:48:19 -05:00
George Hotz 78a06a1285 more readme 2020-12-22 16:23:08 -05:00
George Hotz 0ab951f21c better readme 2020-12-22 15:57:33 -05:00
George Hotz 6ca449afd2 sum works 2020-12-22 12:53:20 -05:00
George Hotz ebc7f8305c 3x3 gemm in conv 2020-12-22 12:00:44 -05:00