gem5  v20.1.0.0
Classes | Public Types | Public Member Functions | Private Member Functions | Private Attributes | List of all members
GPUCommandProcessor Class Reference

#include <gpu_command_processor.hh>

Inheritance diagram for GPUCommandProcessor:
HSADevice DmaDevice PioDevice ClockedObject SimObject Clocked EventManager Serializable Drainable Stats::Group

Classes

class  MQDDmaEvent
 Perform a DMA read of the MQD that corresponds to a hardware queue descriptor (HQD). More...
 
class  ReadDispIdOffsetDmaEvent
 Perform a DMA read of the read_dispatch_id_field_base_byte_offset field, which follows directly after the read_dispatch_id (the read pointer) in the amd_hsa_queue_t struct (aka memory queue descriptor (MQD)), to find the base address of the MQD. More...
 

Public Types

typedef GPUCommandProcessorParams Params
 
- Public Types inherited from HSADevice
typedef HSADeviceParams Params
 
- Public Types inherited from DmaDevice
typedef DmaDeviceParams Params
 
- Public Types inherited from PioDevice
typedef PioDeviceParams Params
 
- Public Types inherited from ClockedObject
typedef ClockedObjectParams Params
 Parameters of ClockedObject. More...
 
- Public Types inherited from SimObject
typedef SimObjectParams Params
 

Public Member Functions

 GPUCommandProcessor ()=delete
 
 GPUCommandProcessor (const Params *p)
 
void setShader (Shader *shader)
 
Shadershader ()
 
void submitDispatchPkt (void *raw_pkt, uint32_t queue_id, Addr host_pkt_addr) override
 submitDispatchPkt() is the entry point into the CP from the HSAPP and is only meant to be used with AQL kernel dispatch packets. More...
 
void submitVendorPkt (void *raw_pkt, uint32_t queue_id, Addr host_pkt_addr) override
 submitVendorPkt() is for accepting vendor-specific packets from the HSAPP. More...
 
void dispatchPkt (HSAQueueEntry *task)
 Once the CP has finished extracting all relevant information about a task and has initialized the ABI state, we send a description of the task to the dispatcher. More...
 
Tick write (PacketPtr pkt) override
 Pure virtual function that the device must implement. More...
 
Tick read (PacketPtr pkt) override
 Pure virtual function that the device must implement. More...
 
AddrRangeList getAddrRanges () const override
 Every PIO device is obliged to provide an implementation that returns the address ranges the device responds to. More...
 
Systemsystem ()
 
- Public Member Functions inherited from HSADevice
 HSADevice (const Params *p)
 
HSAPacketProcessorhsaPacketProc ()
 
void dmaReadVirt (Addr host_addr, unsigned size, DmaCallback *cb, void *data, Tick delay=0)
 
void dmaWriteVirt (Addr host_addr, unsigned size, DmaCallback *cb, void *data, Tick delay=0)
 
- Public Member Functions inherited from DmaDevice
 DmaDevice (const Params *p)
 
virtual ~DmaDevice ()
 
void dmaWrite (Addr addr, int size, Event *event, uint8_t *data, uint32_t sid, uint32_t ssid, Tick delay=0)
 
void dmaWrite (Addr addr, int size, Event *event, uint8_t *data, Tick delay=0)
 
void dmaRead (Addr addr, int size, Event *event, uint8_t *data, uint32_t sid, uint32_t ssid, Tick delay=0)
 
void dmaRead (Addr addr, int size, Event *event, uint8_t *data, Tick delay=0)
 
bool dmaPending () const
 
void init () override
 init() is called after all C++ SimObjects have been created and all ports are connected. More...
 
unsigned int cacheBlockSize () const
 
PortgetPort (const std::string &if_name, PortID idx=InvalidPortID) override
 Get a port with a given name and index. More...
 
- Public Member Functions inherited from PioDevice
 PioDevice (const Params *p)
 
virtual ~PioDevice ()
 
const Paramsparams () const
 
void init () override
 init() is called after all C++ SimObjects have been created and all ports are connected. More...
 
PortgetPort (const std::string &if_name, PortID idx=InvalidPortID) override
 Get a port with a given name and index. More...
 
- Public Member Functions inherited from ClockedObject
 ClockedObject (const ClockedObjectParams *p)
 
const Paramsparams () const
 
void serialize (CheckpointOut &cp) const override
 Serialize an object. More...
 
void unserialize (CheckpointIn &cp) override
 Unserialize an object. More...
 
- Public Member Functions inherited from SimObject
const Paramsparams () const
 
 SimObject (const Params *_params)
 
virtual ~SimObject ()
 
virtual const std::string name () const
 
virtual void loadState (CheckpointIn &cp)
 loadState() is called on each SimObject when restoring from a checkpoint. More...
 
virtual void initState ()
 initState() is called on each SimObject when not restoring from a checkpoint. More...
 
virtual void regProbePoints ()
 Register probe points for this object. More...
 
virtual void regProbeListeners ()
 Register probe listeners for this object. More...
 
ProbeManagergetProbeManager ()
 Get the probe manager for this object. More...
 
virtual void startup ()
 startup() is the final initialization call before simulation. More...
 
DrainState drain () override
 Provide a default implementation of the drain interface for objects that don't need draining. More...
 
virtual void memWriteback ()
 Write back dirty buffers to memory using functional writes. More...
 
virtual void memInvalidate ()
 Invalidate the contents of memory buffers. More...
 
void serialize (CheckpointOut &cp) const override
 Serialize an object. More...
 
void unserialize (CheckpointIn &cp) override
 Unserialize an object. More...
 
- Public Member Functions inherited from EventManager
EventQueueeventQueue () const
 
void schedule (Event &event, Tick when)
 
void deschedule (Event &event)
 
void reschedule (Event &event, Tick when, bool always=false)
 
void schedule (Event *event, Tick when)
 
void deschedule (Event *event)
 
void reschedule (Event *event, Tick when, bool always=false)
 
void wakeupEventQueue (Tick when=(Tick) -1)
 This function is not needed by the usual gem5 event loop but may be necessary in derived EventQueues which host gem5 on other schedulers. More...
 
void setCurTick (Tick newVal)
 
 EventManager (EventManager &em)
 Event manger manages events in the event queue. More...
 
 EventManager (EventManager *em)
 
 EventManager (EventQueue *eq)
 
- Public Member Functions inherited from Serializable
 Serializable ()
 
virtual ~Serializable ()
 
void serializeSection (CheckpointOut &cp, const char *name) const
 Serialize an object into a new section. More...
 
void serializeSection (CheckpointOut &cp, const std::string &name) const
 
void unserializeSection (CheckpointIn &cp, const char *name)
 Unserialize an a child object. More...
 
void unserializeSection (CheckpointIn &cp, const std::string &name)
 
- Public Member Functions inherited from Drainable
DrainState drainState () const
 Return the current drain state of an object. More...
 
virtual void notifyFork ()
 Notify a child process of a fork. More...
 
- Public Member Functions inherited from Stats::Group
 Group (Group *parent, const char *name=nullptr)
 Construct a new statistics group. More...
 
virtual ~Group ()
 
virtual void regStats ()
 Callback to set stat parameters. More...
 
virtual void resetStats ()
 Callback to reset stats. More...
 
virtual void preDumpStats ()
 Callback before stats are dumped. More...
 
void addStat (Stats::Info *info)
 Register a stat with this group. More...
 
const std::map< std::string, Group * > & getStatGroups () const
 Get all child groups associated with this object. More...
 
const std::vector< Info * > & getStats () const
 Get all stats associated with this object. More...
 
void addStatGroup (const char *name, Group *block)
 Add a stat block as a child of this block. More...
 
const InforesolveStat (std::string name) const
 Resolve a stat by its name within this group. More...
 
 Group ()=delete
 
 Group (const Group &)=delete
 
Groupoperator= (const Group &)=delete
 
- Public Member Functions inherited from Clocked
void updateClockPeriod ()
 Update the tick to the current tick. More...
 
Tick clockEdge (Cycles cycles=Cycles(0)) const
 Determine the tick when a cycle begins, by default the current one, but the argument also enables the caller to determine a future cycle. More...
 
Cycles curCycle () const
 Determine the current cycle, corresponding to a tick aligned to a clock edge. More...
 
Tick nextCycle () const
 Based on the clock of the object, determine the start tick of the first cycle that is at least one cycle in the future. More...
 
uint64_t frequency () const
 
Tick clockPeriod () const
 
double voltage () const
 
Cycles ticksToCycles (Tick t) const
 
Tick cyclesToTicks (Cycles c) const
 

Private Member Functions

void initABI (HSAQueueEntry *task)
 The CP is responsible for traversing all HSA-ABI-related data structures from memory and initializing the ABI state. More...
 

Private Attributes

Shader_shader
 
GPUDispatcherdispatcher
 

Additional Inherited Members

- Static Public Member Functions inherited from SimObject
static void serializeAll (CheckpointOut &cp)
 Serialize all SimObjects in the system. More...
 
static SimObjectfind (const char *name)
 Find the SimObject with the given name and return a pointer to it. More...
 
- Static Public Member Functions inherited from Serializable
static const std::string & currentSection ()
 Gets the fully-qualified name of the active section. More...
 
static void serializeAll (const std::string &cpt_dir)
 Serializes all the SimObjects. More...
 
static void unserializeGlobals (CheckpointIn &cp)
 
- Public Attributes inherited from ClockedObject
PowerStatepowerState
 
- Protected Types inherited from HSADevice
typedef void(DmaDevice::* DmaFnPtr) (Addr, int, Event *, uint8_t *, Tick)
 
- Protected Member Functions inherited from HSADevice
void dmaVirt (DmaFnPtr, Addr host_addr, unsigned size, DmaCallback *cb, void *data, Tick delay=0)
 
void translateOrDie (Addr vaddr, Addr &paddr)
 HSADevices will perform DMA operations on VAs, and because page faults are not currently supported for HSADevices, we must be able to find the pages mapped for the process. More...
 
- Protected Member Functions inherited from Drainable
 Drainable ()
 
virtual ~Drainable ()
 
virtual void drainResume ()
 Resume execution after a successful drain. More...
 
void signalDrainDone () const
 Signal that an object is drained. More...
 
- Protected Member Functions inherited from Clocked
 Clocked (ClockDomain &clk_domain)
 Create a clocked object and set the clock domain based on the parameters. More...
 
 Clocked (Clocked &)=delete
 
Clockedoperator= (Clocked &)=delete
 
virtual ~Clocked ()
 Virtual destructor due to inheritance. More...
 
void resetClock () const
 Reset the object's clock using the current global tick value. More...
 
virtual void clockPeriodUpdated ()
 A hook subclasses can implement so they can do any extra work that's needed when the clock rate is changed. More...
 
- Protected Attributes inherited from HSADevice
HSAPacketProcessorhsaPP
 
- Protected Attributes inherited from DmaDevice
DmaPort dmaPort
 
- Protected Attributes inherited from PioDevice
Systemsys
 
PioPort< PioDevicepioPort
 The pioPort that handles the requests for us and provides us requests that it sees. More...
 
- Protected Attributes inherited from SimObject
const SimObjectParams * _params
 Cached copy of the object parameters. More...
 
- Protected Attributes inherited from EventManager
EventQueueeventq
 A pointer to this object's event queue. More...
 

Detailed Description

Definition at line 57 of file gpu_command_processor.hh.

Member Typedef Documentation

◆ Params

typedef GPUCommandProcessorParams GPUCommandProcessor::Params

Definition at line 60 of file gpu_command_processor.hh.

Constructor & Destructor Documentation

◆ GPUCommandProcessor() [1/2]

GPUCommandProcessor::GPUCommandProcessor ( )
delete

◆ GPUCommandProcessor() [2/2]

GPUCommandProcessor::GPUCommandProcessor ( const Params p)

Definition at line 43 of file gpu_command_processor.cc.

References dispatcher, and GPUDispatcher::setCommandProcessor().

Member Function Documentation

◆ dispatchPkt()

void GPUCommandProcessor::dispatchPkt ( HSAQueueEntry task)

Once the CP has finished extracting all relevant information about a task and has initialized the ABI state, we send a description of the task to the dispatcher.

The dispatcher will create and dispatch WGs to the CUs.

Definition at line 176 of file gpu_command_processor.cc.

References GPUDispatcher::dispatch(), and dispatcher.

Referenced by GPUCommandProcessor::MQDDmaEvent::process().

◆ getAddrRanges()

AddrRangeList GPUCommandProcessor::getAddrRanges ( ) const
overridevirtual

Every PIO device is obliged to provide an implementation that returns the address ranges the device responds to.

Returns
a list of non-overlapping address ranges

Implements PioDevice.

Definition at line 207 of file gpu_command_processor.cc.

◆ initABI()

void GPUCommandProcessor::initABI ( HSAQueueEntry task)
private

The CP is responsible for traversing all HSA-ABI-related data structures from memory and initializing the ABI state.

Information provided by the MQD, AQL packet, and code object metadata will be used to initialze register file state.

Definition at line 188 of file gpu_command_processor.cc.

References HSADevice::dmaReadVirt(), HSAPacketProcessor::getQueueDesc(), HSAQueueDescriptor::hostReadIndexPtr, HSADevice::hsaPP, and HSAQueueEntry::queueId().

Referenced by submitDispatchPkt().

◆ read()

Tick GPUCommandProcessor::read ( PacketPtr  pkt)
inlineoverridevirtual

Pure virtual function that the device must implement.

Called when a read command is recieved by the port.

Parameters
pktPacket describing this request
Returns
number of ticks it took to complete

Implements PioDevice.

Definition at line 75 of file gpu_command_processor.hh.

◆ setShader()

void GPUCommandProcessor::setShader ( Shader shader)

Definition at line 214 of file gpu_command_processor.cc.

References _shader, and shader().

◆ shader()

Shader * GPUCommandProcessor::shader ( )

Definition at line 220 of file gpu_command_processor.cc.

References _shader.

Referenced by setShader().

◆ submitDispatchPkt()

void GPUCommandProcessor::submitDispatchPkt ( void *  raw_pkt,
uint32_t  queue_id,
Addr  host_pkt_addr 
)
overridevirtual

submitDispatchPkt() is the entry point into the CP from the HSAPP and is only meant to be used with AQL kernel dispatch packets.

After the HSAPP receives and extracts an AQL packet, it sends it to the CP, which is responsible for gathering all relevant information about a task, initializing CU state, and sending it to the dispatcher for WG creation and dispatch.

First we need capture all information from the the AQL pkt and the code object, then store it in an HSAQueueEntry. Once the packet and code are extracted, we extract information from the queue descriptor that the CP needs to perform state initialization on the CU. Finally we call dispatch() to send the task to the dispatcher. When the task completely finishes, we call finishPkt() on the HSA packet processor in order to remove the packet from the queue, and notify the runtime that the task has completed.

we need to read a pointer in the application's address space to pull out the kernel code descriptor.

The kernel_object is a pointer to the machine code, whose entry point is an 'amd_kernel_code_t' type, which is included in the kernel binary, and describes various aspects of the kernel. The desired entry is the 'kernel_code_entry_byte_offset' field, which provides the byte offset (positive or negative) from the address of the amd_kernel_code_t to the start of the machine instructions.

BLIT kernels don't have symbol names. BLIT kernels are built-in compute kernels issued by ROCm to handle DMAs for dGPUs when the SDMA hardware engines are unavailable or explicitly disabled. They can also be used to do copies that ROCm things would be better performed by the shader than the SDMA engines. They are also sometimes used on APUs to implement asynchronous memcopy operations from 2 pointers in host memory. I have no idea what BLIT stands for.

Reimplemented from HSADevice.

Definition at line 67 of file gpu_command_processor.cc.

References HSAQueueEntry::codeAddr(), _hsa_dispatch_packet_s::completion_signal, DPRINTF, _hsa_dispatch_packet_s::grid_size_x, _hsa_dispatch_packet_s::grid_size_y, _hsa_dispatch_packet_s::grid_size_z, initABI(), _hsa_dispatch_packet_s::kernarg_address, AMDKernelCode::kernel_code_entry_byte_offset, _hsa_dispatch_packet_s::kernel_object, HSAQueueEntry::numScalarRegs(), HSAQueueEntry::numVectorRegs(), AMDKernelCode::runtime_loader_kernel_symbol, PioDevice::sys, System::threads, _hsa_dispatch_packet_s::workgroup_size_x, _hsa_dispatch_packet_s::workgroup_size_y, and _hsa_dispatch_packet_s::workgroup_size_z.

◆ submitVendorPkt()

void GPUCommandProcessor::submitVendorPkt ( void *  raw_pkt,
uint32_t  queue_id,
Addr  host_pkt_addr 
)
overridevirtual

submitVendorPkt() is for accepting vendor-specific packets from the HSAPP.

Vendor-specific packets may be used by the runtime to send commands to the HSA device that are specific to a particular vendor. The vendor-specific packets should be defined by the vendor in the runtime. TODO: For now we simply tell the HSAPP to finish the packet, however a future patch will update this method to provide the proper handling of any required vendor-specific packets. In the version of ROCm that is currently supported (1.6) the runtime will send packets that direct the CP to invalidate the GPUs caches. We do this automatically on each kernel launch in the CU, so this is safe for now.

Reimplemented from HSADevice.

Definition at line 163 of file gpu_command_processor.cc.

References HSAPacketProcessor::finishPkt(), and HSADevice::hsaPP.

◆ system()

System * GPUCommandProcessor::system ( )

Definition at line 201 of file gpu_command_processor.cc.

References PioDevice::sys.

Referenced by GPUDispatcher::notifyWgCompl().

◆ write()

Tick GPUCommandProcessor::write ( PacketPtr  pkt)
inlineoverridevirtual

Pure virtual function that the device must implement.

Called when a write command is recieved by the port.

Parameters
pktPacket describing this request
Returns
number of ticks it took to complete

Implements PioDevice.

Definition at line 74 of file gpu_command_processor.hh.

Member Data Documentation

◆ _shader

Shader* GPUCommandProcessor::_shader
private

Definition at line 80 of file gpu_command_processor.hh.

Referenced by setShader(), and shader().

◆ dispatcher

GPUDispatcher& GPUCommandProcessor::dispatcher
private

Definition at line 81 of file gpu_command_processor.hh.

Referenced by dispatchPkt(), and GPUCommandProcessor().


The documentation for this class was generated from the following files:

Generated on Wed Sep 30 2020 14:02:25 for gem5 by doxygen 1.8.17