gem5
v21.0.1.0
|
#include <gpu_command_processor.hh>
Classes | |
class | MQDDmaEvent |
Perform a DMA read of the MQD that corresponds to a hardware queue descriptor (HQD). More... | |
class | ReadDispIdOffsetDmaEvent |
Perform a DMA read of the read_dispatch_id_field_base_byte_offset field, which follows directly after the read_dispatch_id (the read pointer) in the amd_hsa_queue_t struct (aka memory queue descriptor (MQD)), to find the base address of the MQD. More... | |
Public Types | |
enum | AgentCmd { Nop = 0, Steal = 1 } |
typedef GPUCommandProcessorParams | Params |
![]() | |
typedef HSADeviceParams | Params |
typedef std::function< void(const uint64_t &)> | HsaSignalCallbackFunction |
![]() | |
typedef DmaDeviceParams | Params |
![]() | |
using | Params = PioDeviceParams |
![]() | |
using | Params = ClockedObjectParams |
Parameters of ClockedObject. More... | |
![]() | |
typedef SimObjectParams | Params |
Public Member Functions | |
GPUCommandProcessor ()=delete | |
GPUCommandProcessor (const Params &p) | |
void | setShader (Shader *shader) |
Shader * | shader () |
void | submitAgentDispatchPkt (void *raw_pkt, uint32_t queue_id, Addr host_pkt_addr) override |
submitAgentDispatchPkt() is for accepting agent dispatch packets. More... | |
void | submitDispatchPkt (void *raw_pkt, uint32_t queue_id, Addr host_pkt_addr) override |
submitDispatchPkt() is the entry point into the CP from the HSAPP and is only meant to be used with AQL kernel dispatch packets. More... | |
void | submitVendorPkt (void *raw_pkt, uint32_t queue_id, Addr host_pkt_addr) override |
submitVendorPkt() is for accepting vendor-specific packets from the HSAPP. More... | |
void | attachDriver (HSADriver *driver) override |
void | dispatchPkt (HSAQueueEntry *task) |
Once the CP has finished extracting all relevant information about a task and has initialized the ABI state, we send a description of the task to the dispatcher. More... | |
void | signalWakeupEvent (uint32_t event_id) |
Tick | write (PacketPtr pkt) override |
Pure virtual function that the device must implement. More... | |
Tick | read (PacketPtr pkt) override |
Pure virtual function that the device must implement. More... | |
AddrRangeList | getAddrRanges () const override |
Every PIO device is obliged to provide an implementation that returns the address ranges the device responds to. More... | |
System * | system () |
void | updateHsaSignal (Addr signal_handle, uint64_t signal_value) override |
uint64_t | functionalReadHsaSignal (Addr signal_handle) override |
Addr | getHsaSignalValueAddr (Addr signal_handle) |
Addr | getHsaSignalMailboxAddr (Addr signal_handle) |
Addr | getHsaSignalEventAddr (Addr signal_handle) |
![]() | |
HSADevice (const Params &p) | |
HSAPacketProcessor & | hsaPacketProc () |
void | dmaReadVirt (Addr host_addr, unsigned size, DmaCallback *cb, void *data, Tick delay=0) |
void | dmaWriteVirt (Addr host_addr, unsigned size, DmaCallback *cb, void *data, Tick delay=0) |
![]() | |
DmaDevice (const Params &p) | |
virtual | ~DmaDevice ()=default |
void | dmaWrite (Addr addr, int size, Event *event, uint8_t *data, uint32_t sid, uint32_t ssid, Tick delay=0) |
void | dmaWrite (Addr addr, int size, Event *event, uint8_t *data, Tick delay=0) |
void | dmaRead (Addr addr, int size, Event *event, uint8_t *data, uint32_t sid, uint32_t ssid, Tick delay=0) |
void | dmaRead (Addr addr, int size, Event *event, uint8_t *data, Tick delay=0) |
bool | dmaPending () const |
void | init () override |
init() is called after all C++ SimObjects have been created and all ports are connected. More... | |
unsigned int | cacheBlockSize () const |
Port & | getPort (const std::string &if_name, PortID idx=InvalidPortID) override |
Get a port with a given name and index. More... | |
![]() | |
PioDevice (const Params &p) | |
virtual | ~PioDevice () |
void | init () override |
init() is called after all C++ SimObjects have been created and all ports are connected. More... | |
Port & | getPort (const std::string &if_name, PortID idx=InvalidPortID) override |
Get a port with a given name and index. More... | |
![]() | |
ClockedObject (const ClockedObjectParams &p) | |
void | serialize (CheckpointOut &cp) const override |
Serialize an object. More... | |
void | unserialize (CheckpointIn &cp) override |
Unserialize an object. More... | |
![]() | |
const Params & | params () const |
SimObject (const Params &p) | |
virtual | ~SimObject () |
virtual const std::string | name () const |
virtual void | loadState (CheckpointIn &cp) |
loadState() is called on each SimObject when restoring from a checkpoint. More... | |
virtual void | initState () |
initState() is called on each SimObject when not restoring from a checkpoint. More... | |
virtual void | regProbePoints () |
Register probe points for this object. More... | |
virtual void | regProbeListeners () |
Register probe listeners for this object. More... | |
ProbeManager * | getProbeManager () |
Get the probe manager for this object. More... | |
virtual void | startup () |
startup() is the final initialization call before simulation. More... | |
DrainState | drain () override |
Provide a default implementation of the drain interface for objects that don't need draining. More... | |
virtual void | memWriteback () |
Write back dirty buffers to memory using functional writes. More... | |
virtual void | memInvalidate () |
Invalidate the contents of memory buffers. More... | |
void | serialize (CheckpointOut &cp) const override |
Serialize an object. More... | |
void | unserialize (CheckpointIn &cp) override |
Unserialize an object. More... | |
![]() | |
EventQueue * | eventQueue () const |
void | schedule (Event &event, Tick when) |
void | deschedule (Event &event) |
void | reschedule (Event &event, Tick when, bool always=false) |
void | schedule (Event *event, Tick when) |
void | deschedule (Event *event) |
void | reschedule (Event *event, Tick when, bool always=false) |
void | wakeupEventQueue (Tick when=(Tick) -1) |
This function is not needed by the usual gem5 event loop but may be necessary in derived EventQueues which host gem5 on other schedulers. More... | |
void | setCurTick (Tick newVal) |
EventManager (EventManager &em) | |
Event manger manages events in the event queue. More... | |
EventManager (EventManager *em) | |
EventManager (EventQueue *eq) | |
![]() | |
Serializable () | |
virtual | ~Serializable () |
void | serializeSection (CheckpointOut &cp, const char *name) const |
Serialize an object into a new section. More... | |
void | serializeSection (CheckpointOut &cp, const std::string &name) const |
void | unserializeSection (CheckpointIn &cp, const char *name) |
Unserialize an a child object. More... | |
void | unserializeSection (CheckpointIn &cp, const std::string &name) |
![]() | |
DrainState | drainState () const |
Return the current drain state of an object. More... | |
virtual void | notifyFork () |
Notify a child process of a fork. More... | |
![]() | |
Group (Group *parent, const char *name=nullptr) | |
Construct a new statistics group. More... | |
virtual | ~Group () |
virtual void | regStats () |
Callback to set stat parameters. More... | |
virtual void | resetStats () |
Callback to reset stats. More... | |
virtual void | preDumpStats () |
Callback before stats are dumped. More... | |
void | addStat (Stats::Info *info) |
Register a stat with this group. More... | |
const std::map< std::string, Group * > & | getStatGroups () const |
Get all child groups associated with this object. More... | |
const std::vector< Info * > & | getStats () const |
Get all stats associated with this object. More... | |
void | addStatGroup (const char *name, Group *block) |
Add a stat block as a child of this block. More... | |
const Info * | resolveStat (std::string name) const |
Resolve a stat by its name within this group. More... | |
void | mergeStatGroup (Group *block) |
Merge the contents (stats & children) of a block to this block. More... | |
Group ()=delete | |
Group (const Group &)=delete | |
Group & | operator= (const Group &)=delete |
![]() | |
void | updateClockPeriod () |
Update the tick to the current tick. More... | |
Tick | clockEdge (Cycles cycles=Cycles(0)) const |
Determine the tick when a cycle begins, by default the current one, but the argument also enables the caller to determine a future cycle. More... | |
Cycles | curCycle () const |
Determine the current cycle, corresponding to a tick aligned to a clock edge. More... | |
Tick | nextCycle () const |
Based on the clock of the object, determine the start tick of the first cycle that is at least one cycle in the future. More... | |
uint64_t | frequency () const |
Tick | clockPeriod () const |
double | voltage () const |
Cycles | ticksToCycles (Tick t) const |
Tick | cyclesToTicks (Cycles c) const |
Private Member Functions | |
void | initABI (HSAQueueEntry *task) |
The CP is responsible for traversing all HSA-ABI-related data structures from memory and initializing the ABI state. More... | |
Private Attributes | |
Shader * | _shader |
GPUDispatcher & | dispatcher |
HSADriver * | driver |
Additional Inherited Members | |
![]() | |
static void | serializeAll (CheckpointOut &cp) |
Serialize all SimObjects in the system. More... | |
static SimObject * | find (const char *name) |
Find the SimObject with the given name and return a pointer to it. More... | |
![]() | |
static const std::string & | currentSection () |
Gets the fully-qualified name of the active section. More... | |
static void | serializeAll (const std::string &cpt_dir) |
Serializes all the SimObjects. More... | |
static void | unserializeGlobals (CheckpointIn &cp) |
![]() | |
PowerState * | powerState |
![]() | |
typedef void(DmaDevice::* | DmaFnPtr) (Addr, int, Event *, uint8_t *, Tick) |
![]() | |
void | dmaVirt (DmaFnPtr, Addr host_addr, unsigned size, DmaCallback *cb, void *data, Tick delay=0) |
void | translateOrDie (Addr vaddr, Addr &paddr) |
HSADevices will perform DMA operations on VAs, and because page faults are not currently supported for HSADevices, we must be able to find the pages mapped for the process. More... | |
![]() | |
Drainable () | |
virtual | ~Drainable () |
virtual void | drainResume () |
Resume execution after a successful drain. More... | |
void | signalDrainDone () const |
Signal that an object is drained. More... | |
![]() | |
Clocked (ClockDomain &clk_domain) | |
Create a clocked object and set the clock domain based on the parameters. More... | |
Clocked (Clocked &)=delete | |
Clocked & | operator= (Clocked &)=delete |
virtual | ~Clocked () |
Virtual destructor due to inheritance. More... | |
void | resetClock () const |
Reset the object's clock using the current global tick value. More... | |
virtual void | clockPeriodUpdated () |
A hook subclasses can implement so they can do any extra work that's needed when the clock rate is changed. More... | |
![]() | |
HSAPacketProcessor * | hsaPP |
![]() | |
DmaPort | dmaPort |
![]() | |
System * | sys |
PioPort< PioDevice > | pioPort |
The pioPort that handles the requests for us and provides us requests that it sees. More... | |
![]() | |
const SimObjectParams & | _params |
Cached copy of the object parameters. More... | |
![]() | |
EventQueue * | eventq |
A pointer to this object's event queue. More... | |
Definition at line 57 of file gpu_command_processor.hh.
typedef GPUCommandProcessorParams GPUCommandProcessor::Params |
Definition at line 60 of file gpu_command_processor.hh.
Enumerator | |
---|---|
Nop | |
Steal |
Definition at line 68 of file gpu_command_processor.hh.
|
delete |
GPUCommandProcessor::GPUCommandProcessor | ( | const Params & | p | ) |
Definition at line 44 of file gpu_command_processor.cc.
References dispatcher, and GPUDispatcher::setCommandProcessor().
|
overridevirtual |
Reimplemented from HSADevice.
Definition at line 195 of file gpu_command_processor.cc.
void GPUCommandProcessor::dispatchPkt | ( | HSAQueueEntry * | task | ) |
Once the CP has finished extracting all relevant information about a task and has initialized the ABI state, we send a description of the task to the dispatcher.
The dispatcher will create and dispatch WGs to the CUs.
Definition at line 280 of file gpu_command_processor.cc.
References GPUDispatcher::dispatch(), and dispatcher.
Referenced by GPUCommandProcessor::MQDDmaEvent::process().
|
overridevirtual |
Reimplemented from HSADevice.
Definition at line 151 of file gpu_command_processor.cc.
References getHsaSignalValueAddr(), system(), and System::threads.
Referenced by GPUDispatcher::notifyWgCompl().
|
overridevirtual |
Every PIO device is obliged to provide an implementation that returns the address ranges the device responds to.
Implements PioDevice.
Definition at line 317 of file gpu_command_processor.cc.
Definition at line 102 of file gpu_command_processor.hh.
Referenced by updateHsaSignal().
Definition at line 97 of file gpu_command_processor.hh.
Referenced by updateHsaSignal().
Definition at line 92 of file gpu_command_processor.hh.
Referenced by functionalReadHsaSignal(), and updateHsaSignal().
|
private |
The CP is responsible for traversing all HSA-ABI-related data structures from memory and initializing the ABI state.
Information provided by the MQD, AQL packet, and code object metadata will be used to initialze register file state.
Definition at line 298 of file gpu_command_processor.cc.
References HSADevice::dmaReadVirt(), HSAPacketProcessor::getQueueDesc(), HSAQueueDescriptor::hostReadIndexPtr, HSADevice::hsaPP, and HSAQueueEntry::queueId().
Referenced by submitDispatchPkt().
Pure virtual function that the device must implement.
Called when a read command is recieved by the port.
pkt | Packet describing this request |
Implements PioDevice.
Definition at line 84 of file gpu_command_processor.hh.
void GPUCommandProcessor::setShader | ( | Shader * | shader | ) |
Definition at line 324 of file gpu_command_processor.cc.
Shader * GPUCommandProcessor::shader | ( | ) |
Definition at line 330 of file gpu_command_processor.cc.
References _shader.
Referenced by setShader().
void GPUCommandProcessor::signalWakeupEvent | ( | uint32_t | event_id | ) |
Definition at line 286 of file gpu_command_processor.cc.
References driver, and HSADriver::signalWakeupEvent().
Referenced by updateHsaSignal().
|
overridevirtual |
submitAgentDispatchPkt() is for accepting agent dispatch packets.
These packets will control the dispatch of Wg on the device, and inform the host when a specified number of Wg have been executed on the device.
For now it simply finishes the pkt.
Reimplemented from HSADevice.
Definition at line 233 of file gpu_command_processor.cc.
References _hsa_agent_dispatch_packet_s::arg, HSAQueueEntry::completionSignal(), dispatcher, HSADevice::dmaWriteVirt(), DPRINTF, HSAPacketProcessor::finishPkt(), HSADevice::hsaPP, GPUDispatcher::hsaTask(), Nop(), panic, _hsa_agent_dispatch_packet_s::return_address, and _hsa_agent_dispatch_packet_s::type.
|
overridevirtual |
submitDispatchPkt() is the entry point into the CP from the HSAPP and is only meant to be used with AQL kernel dispatch packets.
After the HSAPP receives and extracts an AQL packet, it sends it to the CP, which is responsible for gathering all relevant information about a task, initializing CU state, and sending it to the dispatcher for WG creation and dispatch.
First we need capture all information from the the AQL pkt and the code object, then store it in an HSAQueueEntry. Once the packet and code are extracted, we extract information from the queue descriptor that the CP needs to perform state initialization on the CU. Finally we call dispatch() to send the task to the dispatcher. When the task completely finishes, we call finishPkt() on the HSA packet processor in order to remove the packet from the queue, and notify the runtime that the task has completed.
we need to read a pointer in the application's address space to pull out the kernel code descriptor.
The kernel_object is a pointer to the machine code, whose entry point is an 'amd_kernel_code_t' type, which is included in the kernel binary, and describes various aspects of the kernel. The desired entry is the 'kernel_code_entry_byte_offset' field, which provides the byte offset (positive or negative) from the address of the amd_kernel_code_t to the start of the machine instructions.
BLIT kernels don't have symbol names. BLIT kernels are built-in compute kernels issued by ROCm to handle DMAs for dGPUs when the SDMA hardware engines are unavailable or explicitly disabled. They can also be used to do copies that ROCm things would be better performed by the shader than the SDMA engines. They are also sometimes used on APUs to implement asynchronous memcopy operations from 2 pointers in host memory. I have no idea what BLIT stands for.
Reimplemented from HSADevice.
Definition at line 68 of file gpu_command_processor.cc.
References HSAQueueEntry::codeAddr(), _hsa_dispatch_packet_s::completion_signal, DPRINTF, _hsa_dispatch_packet_s::grid_size_x, _hsa_dispatch_packet_s::grid_size_y, _hsa_dispatch_packet_s::grid_size_z, initABI(), _hsa_dispatch_packet_s::kernarg_address, AMDKernelCode::kernel_code_entry_byte_offset, _hsa_dispatch_packet_s::kernel_object, HSAQueueEntry::numScalarRegs(), HSAQueueEntry::numVectorRegs(), AMDKernelCode::runtime_loader_kernel_symbol, PioDevice::sys, System::threads, _hsa_dispatch_packet_s::workgroup_size_x, _hsa_dispatch_packet_s::workgroup_size_y, and _hsa_dispatch_packet_s::workgroup_size_z.
|
overridevirtual |
submitVendorPkt() is for accepting vendor-specific packets from the HSAPP.
Vendor-specific packets may be used by the runtime to send commands to the HSA device that are specific to a particular vendor. The vendor-specific packets should be defined by the vendor in the runtime. TODO: For now we simply tell the HSAPP to finish the packet, however a future patch will update this method to provide the proper handling of any required vendor-specific packets. In the version of ROCm that is currently supported (1.6) the runtime will send packets that direct the CP to invalidate the GPUs caches. We do this automatically on each kernel launch in the CU, so this is safe for now.
Reimplemented from HSADevice.
Definition at line 219 of file gpu_command_processor.cc.
References HSAPacketProcessor::finishPkt(), and HSADevice::hsaPP.
System * GPUCommandProcessor::system | ( | ) |
Definition at line 311 of file gpu_command_processor.cc.
References PioDevice::sys.
Referenced by functionalReadHsaSignal(), and updateHsaSignal().
|
overridevirtual |
Reimplemented from HSADevice.
Definition at line 160 of file gpu_command_processor.cc.
References HSADevice::dmaWriteVirt(), DPRINTF, getHsaSignalEventAddr(), getHsaSignalMailboxAddr(), getHsaSignalValueAddr(), signalWakeupEvent(), system(), and System::threads.
Referenced by GPUDispatcher::notifyWgCompl().
Pure virtual function that the device must implement.
Called when a write command is recieved by the port.
pkt | Packet describing this request |
Implements PioDevice.
Definition at line 83 of file gpu_command_processor.hh.
|
private |
Definition at line 108 of file gpu_command_processor.hh.
Referenced by setShader(), and shader().
|
private |
Definition at line 109 of file gpu_command_processor.hh.
Referenced by dispatchPkt(), GPUCommandProcessor(), and submitAgentDispatchPkt().
|
private |
Definition at line 110 of file gpu_command_processor.hh.
Referenced by attachDriver(), and signalWakeupEvent().