George Hotz
0994705166
contrib more
2022-11-08 19:14:37 -08:00
George Hotz
c0bba9649a
more that
2022-11-08 19:13:11 -08:00
George Hotz
5143da6a9f
contributing
2022-11-08 19:12:12 -08:00
Daniel Davis
4998bf49b3
Basic editorconfig support ( #422 )
...
Almost every IDE or texteditor supports
[editorconfig](https://editorconfig.org/ ).
I've set it up to just enforce the 2 space python indents for now.
2022-11-08 10:34:25 -08:00
marcojob
c3d9c9b24c
Fix issue where batch_invstd not being set ( #421 )
...
batch_invstd can be falsely assumed to be set, even though it is None
since hasattr will not return false in this case
BatchNorm2D a reshape will be attempted then, which causes an exception
2022-11-08 09:24:53 -08:00
Liam
8dc28dd733
Create python-publish.yml ( #163 )
2022-11-08 08:45:01 -08:00
George Hotz
92ed87b0a5
bump version to 0.4.0
2022-11-08 08:44:42 -08:00
George Hotz
9781b4c3af
rename test functions to helper_
2022-11-07 21:27:56 -08:00
George Hotz
9884be2ad5
ugh, that too
2022-11-07 21:21:35 -08:00
George Hotz
537a9eb414
fix termcolor import
2022-11-07 21:19:08 -08:00
George Hotz
2cc1d970c6
updates from the chonker branch
2022-11-07 21:12:08 -08:00
George Hotz
d878065ece
Gemm ( #416 )
...
* gemm
* off by factor of 5
* 50 GFLOPS
* works
* 91 gflops
* working at 50G
* works
* iy
* 150 GFLOPS
* 150 GFLOPS
* N=2048 is still fast
* threading soon
* multithread
* pinning
* throttling is sad
* Align matrices to cacheline width (#361 )
Co-authored-by: cloud <Cloud11665@gmail.com>
2022-11-06 10:07:28 -08:00
George Hotz
caea34c529
1s are always mergable
2022-11-03 10:50:48 -07:00
George Hotz
c48fc47d01
fix type error
2022-10-31 09:56:56 -07:00
George Hotz
9585b6c0cf
comments and readability in lazy.py
2022-10-30 19:50:48 -07:00
George Hotz
db2da22a04
stop blowing up floats
2022-10-30 16:47:16 -07:00
George Hotz
8afc643bb1
fix bug in ops test, it was cheating somehow
2022-10-30 16:43:24 -07:00
George Hotz
b7a115e5e5
rewrite some strideds into reshapes
2022-10-30 16:31:27 -07:00
George Hotz
8c849e637c
that was in there twice, DEBUG>=4 to see loop opt
2022-10-30 15:31:39 -07:00
George Hotz
cfdf803b52
fix llvm vectorization by add analysis passes from the target machine
2022-10-30 15:28:36 -07:00
George Hotz
2f602a92ff
seperate STRIDED and EXPAND
2022-10-30 13:23:58 -07:00
George Hotz
544cb0a069
oops, remove while(1)
2022-10-29 14:05:13 -07:00
George Hotz
4b6097f81d
more amx notes
2022-10-29 14:04:10 -07:00
George Hotz
fdb43fe553
gemm is 1.7 TFLOPS on a single M1 core
2022-10-29 13:42:33 -07:00
George Hotz
52bfbc31be
vectorization
2022-10-29 12:47:52 -07:00
George Hotz
e473d35f90
llvm doesn't vectorize
2022-10-29 11:59:48 -07:00
George Hotz
86eb06eb76
accurate flop estimation
2022-10-28 19:13:20 -07:00
George Hotz
7909786dbf
one more opt test
2022-10-28 18:37:53 -07:00
George Hotz
dd543fbc7a
MovementOps is unused
2022-10-28 18:26:08 -07:00
George Hotz
71b336503f
no RESHAPEs in the AST
2022-10-28 18:25:30 -07:00
George Hotz
294ab9e2f8
more test opt
2022-10-28 18:04:12 -07:00
George Hotz
f885ceb695
test speed w/o bias
2022-10-28 11:22:15 -07:00
George Hotz
3735e26492
very minor
2022-10-28 09:39:30 -07:00
George Hotz
c0050fab8f
clean up movement_op in cpu and torch
2022-10-28 09:29:12 -07:00
George Hotz
df31dde174
hasattr and DeviceBuffer type fixups
2022-10-28 09:05:45 -07:00
George Hotz
e6b65f8e01
fix graph in openpilot/compile.py
2022-10-28 08:55:34 -07:00
George Hotz
1013540370
fix flake8
2022-10-28 08:52:53 -07:00
George Hotz
804b2dd001
move into graph.py
2022-10-28 08:50:11 -07:00
George Hotz
8517b69bfb
lazy cleanups
2022-10-28 08:43:43 -07:00
George Hotz
d02f8f9bc0
can we lose the lines with E701 still there?
2022-10-28 08:36:03 -07:00
George Hotz
ef62db3186
cleanups, remove E701
2022-10-28 08:28:56 -07:00
George Hotz
b65b70812a
Exec AST ( #404 )
...
* working exec ast
* exec_ast is staticmethod
* GenericExecAST
* fold that sometimes
* ExplicitExecAST
* exec_ast for GPU
* gpu working
* get_lazyop_shape
* now gpubuffer is ExplicitExecAST
* dedup
* add a type
* RESHAPE in opencl code
* fix linter
* that too for linter
* cleanups
* remove dead code
* GenericShape is less lines
* add ALLOWED_KERNEL_COUNT to tests
* fix mypy
* that's gotta be recursive
* fix opencl shape processing
* remove unneeded lambda
2022-10-28 08:27:03 -07:00
George Hotz
6a15fd3844
LLVM Backend take 2 ( #403 )
...
* take 2 llvm
* get_lazybuffers -> get_buffers
* llvm tests pass
* fix type issues and refactor LLVM
2022-10-26 20:32:31 -07:00
George Hotz
10921a60c4
more imports from llvm branch
2022-10-26 18:02:36 -07:00
George Hotz
463995e64f
relu simpler backward pass
2022-10-26 17:57:32 -07:00
George Hotz
6a8fb53304
move ops.py into lazy.py ( #402 )
...
* move ops.py into lazy.py
* fix graph and linter
* ugh, didn't add
2022-10-25 13:58:03 -07:00
George Hotz
8e22d5ee67
replace networkx with defaultdict
2022-10-20 19:36:43 -07:00
George Hotz
3b9b7eda48
remove run_thneed dead code
2022-10-20 17:24:18 -07:00
George Hotz
63f9c55156
really dumb bug
2022-10-20 17:07:47 -07:00
George Hotz
1bec4651b3
fix nonstatic weights
2022-10-20 17:04:14 -07:00