Marcel Bischoff
e2f833f58f
max to behave on ties like torch ( #229 )
...
* checkpoint
* fixing pow
* undo pow
* backward max on GPU and CPU rewrite
* indentation
* changing seed for curiosity
* max replaced equality
* undo seed
* rebase
* fixed tests
* merge error
2020-12-30 18:52:50 -05:00
George Hotz
30f8132646
reorder ops in ops cpu
2020-12-30 11:00:01 -05:00
George Hotz
e5b2803b5d
ops in readme
2020-12-30 10:48:55 -05:00
George Hotz
2d44bf7f1a
Dot -> Matmul
2020-12-30 10:41:51 -05:00
George Hotz
10fc3ff5b9
cleaner syntax
2020-12-30 10:35:37 -05:00
George Hotz
fcfe3dae01
write slice for CPU
2020-12-30 10:32:53 -05:00
George Hotz
47504004fd
ane ops
2020-12-29 18:00:53 -05:00
George Hotz
1f5c9618ef
refactor in readme and issue #225
2020-12-29 17:30:04 -05:00
George Hotz
f9170505b3
if you like your transformers twice as slow, use the GPU
2020-12-29 17:14:23 -05:00
George Hotz
6a6a82e999
support multidot on GPU
2020-12-29 16:56:30 -05:00
George Hotz
27208d729b
add GPU max thanks to marcelbischoff
2020-12-29 16:44:14 -05:00
George Hotz
4bbad11afe
link to papers
2020-12-29 14:15:46 -05:00
George Hotz
3f8e137b6f
extra/transformer
2020-12-29 14:14:00 -05:00
George Hotz
c4e7a1ae59
accessors are dumb
2020-12-29 14:10:26 -05:00
George Hotz
fb6aaefb9b
save 2 lines
2020-12-29 14:02:50 -05:00
George Hotz
ea341c84fe
logsoftmax good, div bad
2020-12-29 13:59:39 -05:00
George Hotz
f18801c7db
simple pool. swimming is very easy now
2020-12-29 13:48:50 -05:00
George Hotz
8f9232d59b
readmee
2020-12-29 13:40:34 -05:00
George Hotz
837aaacfbf
Unpad2D on GPU:
2020-12-29 13:16:14 -05:00
George Hotz
02655c07d5
break maxpool2d on GPU
2020-12-29 13:05:57 -05:00
George Hotz
061e37de39
touchups
2020-12-29 12:41:21 -05:00
George Hotz
a2e6562330
fix max op, less lines
2020-12-29 10:47:04 -05:00
Marcel Bischoff
dc8fa7999c
Transpose on GPU ( #221 )
...
* 2serious
* load/save
* fixing GPU
* added DEBUG
* needs BatchNorm or doesn't learn anything
* old file not needed
* added conv biases
* added extra/training.py and checkpoint
* assert in test only
* save
* padding
* num_classes
* checkpoint
* checkpoints for padding
* training was broken
* merge
* rotation augmentation
* more aug
* needs testing
* streamline augment, augment is fast thus bicubic
* tidying up
* transformer eval
* axis=-1
* transpose
* test for permutation using torch.movedims
* another test
* line
2020-12-29 10:40:11 -05:00
George Hotz
36579f66bf
max op
2020-12-28 23:54:52 -05:00
George Hotz
bcb3ceeca3
set training in functions
2020-12-28 22:45:46 -05:00
George Hotz
51bf164b72
dropout, training
2020-12-28 22:12:23 -05:00
George Hotz
7b8fee038d
it works! forgot the sqrt
2020-12-28 16:23:52 -05:00
George Hotz
1faf05ef67
ahh, it's better if i don't train the embedding
2020-12-28 16:07:02 -05:00
George Hotz
c3832e1bde
hmm, fix layernorm to not be batchnorm and it breaks
2020-12-28 13:06:21 -05:00
George Hotz
2e89e75dcb
layernorm fixes transformer instability
2020-12-28 12:58:15 -05:00
George Hotz
628d21f899
doc touchup
2020-12-28 10:45:26 -05:00
George Hotz
fafece9db7
avgpool2d is a second class op
2020-12-28 10:41:59 -05:00
George Hotz
593233b668
log and exp are first class ops
2020-12-28 10:00:30 -05:00
Marcel Bischoff
ffff98db78
Evaluation in Transformers ( #218 )
...
* 2serious
* load/save
* fixing GPU
* added DEBUG
* needs BatchNorm or doesn't learn anything
* old file not needed
* added conv biases
* added extra/training.py and checkpoint
* assert in test only
* save
* padding
* num_classes
* checkpoint
* checkpoints for padding
* training was broken
* merge
* rotation augmentation
* more aug
* needs testing
* streamline augment, augment is fast thus bicubic
* tidying up
* transformer eval
2020-12-28 09:24:51 -05:00
George Hotz
65b07d2f4f
fix onehot embed
2020-12-27 18:50:38 -05:00
George Hotz
d864e1c71a
transformer is training
2020-12-27 18:46:32 -05:00
George Hotz
a361ef6861
fixup training loop
2020-12-27 18:35:56 -05:00
George Hotz
f15bec6dbc
make multidot work on CPU
2020-12-27 17:25:37 -05:00
George Hotz
131e04c90c
cpu only decorator
2020-12-27 17:18:55 -05:00
George Hotz
2f1b2c0a3b
add transpose, start on transformer
2020-12-27 16:59:12 -05:00
gamwe6
d379502c04
Cleaning ( #211 )
...
* Cleaned
* Brought the lines into line
Co-authored-by: gamwe6 <gamwe6@users.noreply.github.com>
2020-12-27 09:58:51 -05:00
George Hotz
8a335f03ad
clock speed 32x32
2020-12-22 18:18:52 -05:00
George Hotz
aae2e35208
benchmarking 512x512 GEMM
2020-12-22 18:01:36 -05:00
George Hotz
bd18e03138
conv from weights works
2020-12-22 17:42:17 -05:00
George Hotz
b3cf53e39b
more docs
2020-12-22 17:14:38 -05:00
George Hotz
4065eae0fb
docs for tensor stride
2020-12-22 17:06:36 -05:00
George Hotz
6fb127d5c7
l2 cache note
2020-12-22 16:48:19 -05:00
George Hotz
78a06a1285
more readme
2020-12-22 16:23:08 -05:00
George Hotz
0ab951f21c
better readme
2020-12-22 15:57:33 -05:00
George Hotz
6ca449afd2
sum works
2020-12-22 12:53:20 -05:00