* pm4 kernel launch works
* disable USE_THREAD_DIMENSIONS
* add kernel code
* work on real pm4
* pm4 signal
* same
* gate pm4
* hcq tests pass
* ops passes
* pm4 is closer
* pm4 debug (#4165)
* start debug tests passing
* prg
* smth
* hdp flush
* cleaner 1
* do not need this
* logs not need
* small things
* linter
* remove AQL
* test hcq
* fix tests
* it's subtracting, it shouldn't be -1
* pm4 changes (#4251)
* not need this anymore
* sdma signal with non atomic
---------
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* kfd driver wip
* cleanups
* kfd almost ready to ring doorbell
* ding dong?
* issues with signals
* something
* works
* ops kfd
* add amd_signal_t
* works...sometimes
* program runs
* _gpu_alloc cleanup
* cleanups
* work
* header + enable profiling (#3959)
* header + enable profiling
* just cleaner
* measure
* only local time domain
* remove old comments
* fix with master
* elf parsing (#3965)
* elf parsing
* fix kernels with private
* not used
* clean up
* clean up 2
* add flags
* kfd sdma (#3970)
* working sdma
* remove driver, shorter
* all commands we might need
* svm
* kfd remove hardcoded values (#4007)
* remove hardcoded values
* match above line
* 7k lines + revert hsa
* update that from origin
* fix sdma reg gen
* not the updated SDMA
* compiler_opts
* don't require kfd_ioctl
* get ioctls from python
* get ioctls from python
* remove build_sdma_command
* merge into 64-bit fields
* shorter
* fix property spelling and off by one
---------
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>