George Hotz
2cc1d970c6
updates from the chonker branch
2022-11-07 21:12:08 -08:00
George Hotz
d878065ece
Gemm ( #416 )
...
* gemm
* off by factor of 5
* 50 GFLOPS
* works
* 91 gflops
* working at 50G
* works
* iy
* 150 GFLOPS
* 150 GFLOPS
* N=2048 is still fast
* threading soon
* multithread
* pinning
* throttling is sad
* Align matrices to cacheline width (#361 )
Co-authored-by: cloud <Cloud11665@gmail.com>
2022-11-06 10:07:28 -08:00
George Hotz
6a8fb53304
move ops.py into lazy.py ( #402 )
...
* move ops.py into lazy.py
* fix graph and linter
* ugh, didn't add
2022-10-25 13:58:03 -07:00
George Hotz
8e22d5ee67
replace networkx with defaultdict
2022-10-20 19:36:43 -07:00
George Hotz
63f9c55156
really dumb bug
2022-10-20 17:07:47 -07:00
George Hotz
1bec4651b3
fix nonstatic weights
2022-10-20 17:04:14 -07:00
George Hotz
bb288e6938
safe_numpy and warning for broken matmul
2022-10-20 15:40:22 -07:00
George Hotz
50c95c7d9a
add assert to catch issue in attention
2022-10-20 15:13:00 -07:00
George Hotz
26c78ccf7d
remove useless buffer
2022-10-20 14:07:28 -07:00
George Hotz
a18c1f3178
zero out the inputs
2022-10-20 13:46:52 -07:00
George Hotz
ace8db29f8
ReduceSum
2022-10-20 12:48:14 -07:00
George Hotz
c400ee0beb
refactoring thneed ( #400 )
...
* refactoring thneed
* continue
* minor update
* looks like it's working
* big refactor
* confirm thneed got the right output
* code is there but it's broken
* works now
* always OPTWG, input -> dat
* fix type issue
2022-10-20 12:35:59 -07:00
YassineYousfi
ae0f9b17df
openpilot: new models and onnx ops ( #401 )
...
* ngrl stuff
* fngrl
* fix typo in compile script
* workflow dispatch
* new models in tests
* dont need to up this threshold
Co-authored-by: HaraldSchafer <harald.the.engineer@gmail.com>
2022-10-20 11:49:19 -07:00
George Hotz
ff11c4316b
move get_parameters to optim.py
2022-09-25 13:16:58 -04:00
Jacky Lee
2c01a66265
Reshape dataset from fetch_mnist ( #390 )
2022-09-24 21:16:29 -04:00
George Hotz
271446e3eb
set requires_grad to None ( #387 )
...
* set requires_grad to None
* some things need gradients
* hmm, why was get_parameters filtering
2022-09-21 11:16:02 -04:00
YassineYousfi
2f0f91ba3d
support float16 onnx weights ( #384 )
2022-09-15 09:12:18 -04:00
YassineYousfi
1a7bdc51f8
support more onnx ops ( #376 )
...
* broadcast from right to left
* add another broadcasted add test
* more onnx ops
* use float32 range in clip
2022-09-07 15:15:24 -07:00
George Hotz
0516359af8
fix stupid OPENCL=1 OOM
2022-09-06 14:29:23 -07:00
George Hotz
4dadd95e3c
fix tests hopefully, more stable diffusion
2022-09-03 10:38:31 -07:00
George Hotz
c01a8c5c2d
stable diffusion start
2022-09-03 10:08:42 -07:00
George Hotz
a3fc64a585
fix batchnorm folding in openpilot compile
2022-08-31 13:04:49 -07:00
George Hotz
dc7af8c3ac
thneed run float32
2022-08-28 11:03:35 -07:00
George Hotz
b132de677d
tinygrad.nn ( #367 )
...
* tinygrad.nn
* flake8
* working on pylint
* more pylint
* more pylint
* pylint passes
* networkx
* mypy can't infer that type
* junk
2022-08-18 07:41:00 -07:00
George Hotz
f76d41812b
prune graph
2022-07-17 15:38:43 -07:00
George Hotz
eda6f071b2
default opt level 2
2022-07-17 14:54:40 -07:00
George Hotz
73b0471b25
join expands
2022-07-17 13:42:05 -07:00
George Hotz
d04b274cd2
noop removal can replace with reshape
2022-07-16 08:32:42 -07:00
George Hotz
2720ef49ca
extra and test and tuple
2022-07-07 10:01:33 -07:00
George Hotz
81b73f97a3
Optiimzation ( #355 )
...
* constant folding into kernels
* that opt worth it?
* fix mypy
* ast one kernel
* save 2 lines in conv kernel
* debug print kernel count
* cl debugging
* early realize inputs
* refactor Device
2022-07-04 08:58:57 -07:00
George Hotz
7276f8d6bf
improve constant folding, detach before moving tensor
2022-07-02 15:29:40 -07:00
George Hotz
8cf1aed0f4
don't track_running_stats, parameters must require_grad
2022-07-02 14:38:45 -07:00
George Hotz
49c954b389
comments
2022-06-26 17:20:25 -07:00
George Hotz
83d50e2687
move to extra.onnx
2022-06-21 19:43:44 -07:00
George Hotz
9b27ba650b
load new torch files
2022-06-07 10:06:48 -07:00
George Hotz
233c71a7ba
support requires_grad
2022-06-06 07:47:31 -07:00
George Hotz
d8d19ed468
wikimedia wasn't returning 200
2022-01-15 19:09:29 -08:00
George Hotz
e28cdfb0cf
clean up resnet
2021-11-30 16:14:54 -05:00
George Hotz
58ed46963e
fix broadcastdot
2021-11-29 18:54:57 -05:00
George Hotz
dca076dbf1
remove dumb nn ops
2021-11-29 18:05:31 -05:00
George Hotz
30eb3afbe1
add bias term to transformer
2021-11-29 12:45:27 -05:00
George Hotz
e2a8961a18
less lines, fix bug
2021-11-17 12:52:17 -08:00
George Hotz
ba28761894
move yolo into examples/yolo
2021-10-30 19:46:00 -07:00
George Hotz
63f50cff45
move back again
2021-10-30 16:13:29 -07:00
Evan Mays
285621aeda
Cherry backprop for conv2d ( #281 )
...
* quick math: 0 + x = x.
* gradient w.r.t. x using cherry for conv
* gradient w.r.t. w for conv on cherry but doing vector dot products
* small optimization
* [cherry] optimize conv backpass for large channel count
* get rid of numpy einsum
2021-10-30 16:12:19 -07:00
George Hotz
3d646272d6
move back
2021-10-30 16:12:12 -07:00
George Hotz
ac8afd24fa
refactor accel
2021-10-30 16:10:59 -07:00
Guglielmo Camporese
2b7589db64
Added ResNet-{18, 34, 50, 101, 152} ( #271 )
...
* added resnets
* fix minor
* fix minor
* resnet in models
* added resnet test
* added resnet train test
* added linear, conv2d nn tests
* fix minor in extra/training
* resnet in models
* fix minor
* fix tolerance for linear in nn test
* fix eval, this causes cpu and gpu UT failing
* revert transformer test
* fix minor for CPU test
* improved model get_params for sequential layer
* fix minor for params counting
* commented broken ops tests
* improved train for resnet
2021-06-21 09:37:24 -07:00
George Hotz
89798d2f43
some flags
2021-06-19 11:46:31 -07:00
George Hotz
d81eae8288
debug cherry crash
2021-06-19 11:41:20 -07:00