Compilation Framework For Hardware Configuration Generation Drepper; Ulrich ; et al. [Red Hat, Inc.]

Compilation Framework For Hardware Configuration Generation

Drepper; Ulrich ; et al.

Patent Application Summary

U.S. patent application number 17/126988 was filed with the patent office on 2022-06-23 for compilation framework for hardware configuration generation. The applicant listed for this patent is Red Hat, Inc.. Invention is credited to Ulrich Drepper, Ahmed Sanaullah.

Application Number	20220197616 17/126988
Document ID	/
Family ID
Filed Date	2022-06-23

United States Patent Application	20220197616
Kind Code	A1
Drepper; Ulrich ; et al.	June 23, 2022

COMPILATION FRAMEWORK FOR HARDWARE CONFIGURATION GENERATION

Abstract

Systems and methods for supporting a compilation framework for hardware configuration generation. In one implementation, a processing device executing a compilation workflow for a target architecture may receive an input source code associated with a configuration for the target hardware architecture; identify a first configuration information associated with at least a portion of the configuration for the target hardware architecture; perform a first set of compilation passes of the input source code in view of the first configuration information; generate, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture; and generate, in view of the first internal representation component, a binary bitstream component for the target hardware architecture

Inventors:

Drepper; Ulrich; (Grasbrumn, DE) ; Sanaullah; Ahmed; (Boston, MA)

Applicant:

Name	City	State	Country	Type
Red Hat, Inc.	Raleigh	NC	US

Appl. No.:

17/126988

Filed:

December 18, 2020

International Class:

G06F 8/41 20060101 G06F008/41; G06N 20/00 20060101 G06N020/00

Claims

1. A method comprising: receiving, by a processing device executing a compilation workflow for a target hardware architecture, an input source code associated with a configuration for the target hardware architecture; identifying a first configuration information associated with at least a portion of the configuration for the target hardware architecture; performing a first set of compilation passes of the input source code in view of the first configuration information; generating, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture; and generating, in view of the first internal representation component, a binary bitstream component for the target hardware architecture.

2. The method of claim 1, further comprising: generating, in view of the first set of compilation passes, a second configuration information associated with at least an additional portion of the configuration for the target hardware architecture.

3. The method of claim 2, further comprising: performing a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information; and generating, in view of the second set of compilation passes of the first internal representation component, a second internal representation component for the target hardware architecture.

4. The method of claim 2, wherein the second configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, the method further comprising: determining a performance metric associated with the algorithmic representation of the functionality in the second configuration information in view of the target hardware architecture; and responsive to determining that the performance metric satisfies a threshold, performing a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information.

5. The method of claim 2, wherein the second configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, the method further comprising: determining a performance metric associated with the algorithmic representation of the functionality in the second configuration information in view of the target hardware architecture; and responsive to determining that the performance metric does not satisfy a threshold, bypassing performing an additional set of compilation passes of the first internal representation component.

6. The method of claim 1, wherein the first configuration information comprises an algorithmic representation of a functionality associated with the target hardware configuration.

7. The method of claim 1, wherein the first configuration information comprises data representing a machine learning model trained to perform at least one of determining whether to identify an additional configuration information, or determining a number of compilation passes in the set of compilation passes.

8. A system comprising: a memory; and a processing device operatively coupled to the memory, the processing device to: receive, using a compilation workflow for a target hardware architecture, an input source code associated with a configuration for the target hardware architecture; retrieve a first configuration information generated by a previous execution of the compilation workflow, wherein the first configuration information is associated with at least a portion of the configuration for the target hardware architecture; perform a first set of compilation passes of the input source code in view of the first configuration information from the previous execution of the compilation workflow; generate, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture; and generate, in view of the first internal representation component, a binary bitstream component for the target hardware architecture.

9. The system of claim 8, wherein the configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, the processing device further to: determine a performance metric associated with the algorithmic representation of the functionality in the configuration information in view of the target hardware architecture; and responsive to determining that the performance metric satisfies a threshold, perform a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information.

10. The system of claim 8, wherein the configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, the processing device further to: determine a performance metric associated with the algorithmic representation of the functionality in the configuration information in view of the target hardware architecture; and responsive to determining that the performance metric does not satisfy a threshold, bypass performing an additional set of compilation passes of the first internal representation component.

11. The system of claim 8, wherein the configuration information comprises information associated with a previous version of the input source code.

12. The system of claim 11, wherein the processing device is further to: determine a set of differences between the input source code and the previous version of the input source code; determine whether the set of differences between the input source code and the previous version of the input source code satisfies a threshold; and responsive to determining that the set of differences satisfies the threshold, perform an additional set of compilation passes of the input source code.

13. The system of claim 11, wherein the processing device is further to: determine a set of differences between the input source code and the previous version of the input source code; determine whether the set of differences between the input source code and the previous version of the input source code satisfies a threshold; and responsive to determining that the set of differences does not the threshold, bypass performing an additional set of compilation passes of the first internal representation component.

14. A non-transitory computer-readable media storing instructions that, when executed, cause a processing device to: receive an input source code associated with a configuration for a target hardware architecture; identify a first configuration information associated with at least a portion of the configuration for the target hardware architecture; perform a first set of compilation passes of the input source code in view of the first configuration information; generate, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture; and generate, in view of the first internal representation component, a binary bitstream component for the target hardware architecture.

15. The non-transitory computer-readable media of claim 14, wherein the processing device is further to: generate, in view of the first set of compilation passes, a second configuration information associated with at least an additional portion of the configuration for the target hardware architecture.

16. The non-transitory computer-readable media of claim 15, wherein the processing device is further to: perform a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information; and generate, in view of the second set of compilation passes of the first internal representation component, a second internal representation component for the target hardware architecture.

17. The non-transitory computer-readable media of claim 15, wherein the second configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, and wherein the processing device is further to: determine a performance metric associated with the algorithmic representation of the functionality in the second configuration information in view of the target hardware architecture; and responsive to determining that the performance metric satisfies a threshold, perform a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information.

18. The non-transitory computer-readable media of claim 15, wherein the second configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, and wherein the processing device is further to: determine a performance metric associated with the algorithmic representation of the functionality in the second configuration information in view of the target hardware architecture; and responsive to determining that the performance metric does not satisfy a threshold, bypass performing an additional set of compilation passes of the first internal representation component.

19. The non-transitory computer-readable media of claim 14, wherein the first configuration information comprises an algorithmic representation of a functionality associated with the target hardware configuration.

20. The non-transitory computer-readable media of claim 14, wherein the first configuration information comprises data representing a machine learning model trained to perform at least one of determining whether to identify an additional configuration information, or determining a number of compilation passes in the set of compilation passes.

Description

TECHNICAL FIELD

[0001] The present disclosure is generally related to computer systems, and more particularly, to a compilation framework for hardware configuration generation in computer systems.

BACKGROUND

[0002] A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing. The FPGA configuration is typically specified using a Hardware Description Language (HDL), similar to that used for an application-specific integrated circuit (ASIC). FPGA configuration can be managed using electronic design automation tools that work together in a design workflow that chip designers can use to design and analyze entire semiconductor chips.

BRIEF DESCRIPTION OF THE DRAWINGS

[0003] The present disclosure is illustrated by way of example, and not by way of limitation, and can be more fully understood with reference to the following detailed description when considered in connection with the figures in which:

[0004] FIG. 1 depicts a high-level diagram of an example system architecture operating in accordance with one or more aspects of the present disclosure.

[0005] FIG. 2 depicts an illustration of a hardware configuration generation workflow that utilizes configuration data, in accordance with one or more aspects of the present disclosure.

[0006] FIG. 3 depicts an illustration of a hardware configuration generation workflow that utilizes configuration data from a different workflow, in accordance with one or more aspects of the present disclosure.

[0007] FIG. 4 depicts a block diagram illustrating an example of a hardware configuration generator component for facilitating a compilation framework for hardware configuration generation, in accordance with one or more aspects of the present disclosure.

[0008] FIG. 5 depicts a block diagram illustrating an example of a hardware configuration generator component for facilitating a compilation framework for hardware configuration generation using information generated by previously executed compilation workflows, in accordance with one or more aspects of the present disclosure.

[0009] FIG. 6 is a flow diagram of an example method of facilitating a compilation framework for hardware configuration generation, in accordance with one or more aspects of the present disclosure.

[0010] FIG. 7 is a flow diagram of an example method of facilitating a compilation framework for hardware configuration generation using information generated by previously executed compilation workflows, in accordance with one or more aspects of the present disclosure.

[0011] FIG. 8 depicts a block diagram of an illustrative computer system operating in accordance with one or more aspects of the present disclosure.

DETAILED DESCRIPTION

[0012] Described herein are methods and systems for facilitating a compilation framework for hardware configuration generation.

[0013] In conventional hardware, integrated circuits are implemented on a chip. With these types of implementations, the circuits are baked into the chip, and therefore are typically fixed after manufacturing. With these chips, a user can develop software that loads onto a chip and execute functions. With an FPGA, the user programs the hardware circuits that can involve one or more complex functionalities that can act as a comprehensive processor. FPGAs can contain an array of programmable logic blocks, and a hierarchy of configurable connections that allow the blocks to be coupled or "wired" together. Logic blocks can be configured to perform various functions, including, but not limited to, complex combinational functions, sequential logic, buffers, phased locked loops (PLLs) with digital and analog components, or simple logic gates such as AND gates and XOR gates.

[0014] FPGAs can be valuable computing platforms for a variety of implementations, including both data center and edge computing platforms. These types of hardware implementations can provide an effective blend of high performance, low power profile coupled with generalized applicability that cannot be readily achieved with alternative technologies. In conventional systems, hardware configuration generation can be performed by starting with high level language (HLL) code (e.g., C, C++, python, etc.). This code can then be translated into a target/vendor specific hardware description language (HDL) code, a process referred to as High Level Synthesis (HLS). Alternatively, a developer can start directly with HDL code. The HDL code is converted into a logical netlist through a process called Synthesis. A logical netlist can specify the various FPGA components used to construct the circuit (e.g., look up tables, flip flops, etc.) and how these components are connected. The logical netlist can then be converted to a physical netlist using a process called Place and Route, which involves mapping components in the logical netlist to physical resources on the target architecture (e.g., target chip structure), and using the target's reprogrammable interconnect to couple these components together. The physical netlist is then converted to a binary code using a process called bitstream generation, that maps the physical netlist to the specific bits in the chip's configuration memory that is to be set in order to create and initialize the target circuit.

[0015] While conventional systems that implement tools in this way provide the ability for developers to implement circuitry for FPGAs based on HLL or HDL source, these types of frameworks have many deficiencies and lag far behind those tools available for software development. In particular, the three phases mentioned above are disjointed and do not share information readily with each other, leading to inefficiency in the compilation process as well as a lack of portability of both the individual compilation stages (e.g., tools) and the objects/binaries/internal representations generated by each stage. Additionally, the amount of reuse of derived information (e.g., using the same source construct in multiple stages) is severely limited, leading to increases in turn-around time for development and optimization iterations as well as low performance of the resulting binary. Moreover, since information is not shared readily between the phases, between the different tools within a phase, or between different executions of the phases, the ability to control the compilation process as well as the ability to customize the workflow are both severely limited.

[0016] Aspects of the present disclosure address the above noted and other deficiencies by implementing technology to facilitate a compilation framework for hardware configuration generation. In various implementations, aspects of the present disclosure bridge the gap between hardware and software development tools by configuring a compilation workflow that leverages software development techniques. The workflow can be configured as a series of compilation passes that operate using intermediate representations and data interchange file formats to produce a final binary bitstream. The compilation process incorporates features of the three phases (Synthesis, High Level Synthesis, and Place and Route) in a repeatable, reorderable and modifiable set of compiler passes. Additionally, configuration information can be injected into any of the workflow stages in order to direct the compilation process of that stage to improve the efficiency of that stage as well as the reliability of the object/internal representation produced by the stage. This configuration information can be provided by the output of previous compiler stages within the same workflow as well as from previous executions of the same or different workflows.

[0017] In one implementation, a hardware configuration generator component can receive input source (HLL or HDL) associated with the configuration of a target hardware architecture (e.g., FPGA circuitry). The hardware configuration generator component can identify configuration information associated with a portion of the target hardware architecture. In various implementations, this configuration information can include algorithmic representations of functionality for the FPGA (or other architecture) circuitry as well as machine learning models that can direct the compilation passes of a particular compilation stage. The hardware configuration generator component can then perform a set of compilation passes of the input source using the injected configuration information and generate an internal representation for the workflow stage. Multiple workflow stages can be executed, each receiving configuration information to direct that stage, until a final binary bitstream component is generated for the FPGA circuitry.

[0018] In another implementation, a hardware configuration generator component can receive an input source associated with the configuration of a target hardware architecture (e.g., FPGA circuitry). In this implementation, the hardware configuration generator component can retrieve configuration information from a previous workflow execution to direct one or more compilation stages of the current workflow. In particular, this embodiment can analyze differences between current versions of source, object, binary, or internal representation components to previous versions to identify changes, and configure the compilation passes and/or workflow stages based on the determined differences. Additionally, the hardware configuration generator component can link the source, object, binary, or internal representations from a previously executed workflow to a currently executing workflow, incorporating design information to the current workflow.

[0019] Aspects of the present disclosure present advantages over conventional solutions to the issues noted above. First, since the entire compilation process is structured as repeatable, reorderable, and modifiable compiler passes, the boundaries for the compilation process are more fluid. For example, technology independent optimization passes that are typically performed in the Synthesis phase can now be performed after some Place and Route passes which add technology specific information to an intermediate representation. Additionally, the data and control flow graphs can be optimized to bypass some unnecessary workflow stages, and can be directly translated into a logical netlist without generating HDL. The process of generating HDL is inefficient, since information is typically lost when creating HDL from HLL code (due to limitations and/or abstractions of HDL), which is often needed by later stages of the workflow. Thus, by connecting the different phases together and injecting configuration information into various stages, more information is preserved for reuse in later stages, which dramatically improves performance of the overall compilation process as well as the reliability and performance of the final generated binary bitstream. Moreover, removing unnecessary workflow stages can also speed up the process. Additionally, the ability to reuse configuration from the current or previous workloads can reduce turnaround times for the compilation

[0020] FIG. 1 depicts an illustrative architecture of elements of a host computer system 110, in accordance with an embodiment of the present disclosure. It should be noted that other architectures for host computer system 110 are possible, and that the implementation of a computer system utilizing embodiments of the disclosure are not necessarily limited to the specific architecture depicted by FIG. 1. Host computer system 110 may comprise one or more processors communicatively coupled to memory devices and input/output (I/O) devices. Host computer system 110 runs a host operating system (OS) 120, which can comprise software that manages the hardware resources of the computer system and that provides functions such as inter-process communication, scheduling, virtual memory management, and so forth.

[0021] Host OS 120 may additionally include a hardware configuration generator component 128 to facilitate a compilation framework for hardware configuration generation. In some implementations, hardware configuration generator component 128 can receive an input source 135 and execute the compilation stages of target architecture workflow 140 using the input source 135 to generate binary bitstream component 160. As described in further detail below, at various compilation stages of target architecture workflow 140, hardware configuration generator 128 can receive configuration data 150 to direct various aspects of the compilation transformation passes executed during each stage of the workflow 140.

[0022] In some implementations, input source 135 can be source code that includes instructions associated with the target architecture. As noted above, in some implementations, the target architecture can be an FPGA circuitry architecture. In other implementations, the target architecture can be another hardware architecture (e.g., ASIC). In various implementations, the input source 135 can be HLL source code, HDL source code, or the like. Hardware configuration generator component 128 can detect the type of source code included in input source 135 and execute the target architecture workflow 140 at an appropriate starting point compilation stage based on the type of source code in input source 135. For example, if the input source 135 is HLL, then hardware configuration generator component 128 can initiate an initial stage of a High Level Synthesis phase of the workflow 140 to generate technology independent and/or technology dependent internal representations. In contrast, if the input source 135 is HDL, then the initial stages of the workflow may be bypassed to proceed to a stage in a Synthesis phase of the workflow that generates a logical netlist from the HDL code.

[0023] At each compilation stage of the workflow 140, hardware configuration generator component 128 can receive additional configuration information (e.g., configuration data 150) to inform and/or direct the compilation passes for that stage of the workflow. In various implementations, configuration data 150 can include algorithmic representations of functionality performed by the target hardware architecture, information describing changes in the source code or internal representation component in relation to a previous version, or data representing a machine learning model (e.g., an identifier of a machine learning model and configuration parameters for the machine learning model) that can regulate the compilation passes. Notably, in various implementations, configuration data 150 can be configured in a self-describing format such that any changes can be made dynamically without reconfiguring and/or recompiling hardware configuration generator component 128. Thus, new configuration data 150 can be added to the process seamlessly without disrupting the execution of the workflow.

[0024] In some implementations, configuration data 150 can be generated and stored separately from workflow 140. In other words, configuration information associated with functionality that is common across various workflow can be stored in configuration data 150 and used in multiple workflows. For example, logic associated with a matrix multiplication operation can be stored in configuration data 150 so that it can be accessed by any compilation stage within any workflow. Additionally, configuration data 150 can be generated as output by various stages of a workflow that can be used by later stages of that workflow.

[0025] In some implementations, hardware configuration generator component 128 can retrieve configuration data 150 generated by one workflow to be used to direct the execution of a different workflow or a subsequent execution of the same workflow. For example, a subsequent execution of a particular workflow can access configuration information that describes changes to source and/or internal representations to control and direct the compiler passes in a particular stage. For example, if a particular portion of input source (or a particular portion of an internal representation component generated by an intermediate workflow stage) includes significant amount of changes, then hardware configuration generator component 128 can determine that additional compiler passes shall be executed or additional stages of the workflow shall be executed. Similarly, if the configuration data 150 indicates minimal differences between the versions, hardware configuration generator component 128 can reduce the number of compilation passes and/or bypass subsequent compilation stages if the changes do not have an effect on the result.

[0026] Hardware configuration generator component 128 is described in further detail below with respect to FIGS. 2-5.

[0027] FIG. 2 depicts an illustration of a hardware configuration generation workflow that utilizes configuration data, in accordance with one or more aspects of the present disclosure. As shown in FIG. 2, hardware configuration generator component 128 is invoked to execute a workflow 200 that can include multiple compilation phases to transform input source into a binary bitstream component. It should be noted that for ease of illustration, workflow 200 depicts two compilation stages (compiler passes 201, 202) that generate two internal representation components within the workflow 200 (internal representation 210, logical netlist 211), but in other implementations workflow 200 can include more compilation stages that produce additional internal representation components within the workflow to produce a resulting binary bitstream component for a target hardware architecture (e.g., FPGA circuitry).

[0028] As shown in FIG. 2, hardware configuration generator component 128 can receive input source 135 and initiate workflow 200. In an illustrative example, input source 135 can include HLL source code, which can cause hardware configuration generator component 128 to begin execution of the workflow 200 from the beginning of the High Level Synthesis phase to generate in internal representation (IR) 210 (e.g., a technology independent IR, technology dependent IR, etc.). As shown, hardware configuration generator component can receive configuration data 230 as input to the first compilation stage (compiler passes 201) to generate the IR 210. As described above, the configuration data 230 can be a self-describing data file that can include algorithmic representation of a functionality associated with the target hardware configuration, data representing a machine learning model, or other information that can be used to direct the execution of compiler passes 201.

[0029] The execution of the first compilation stage 201 can generate internal representation 210 which is forwarded to the next stage of the workflow (e.g., compiler passes 202). As shown, compiler passes 201 can also generate additional configuration data (e.g., configuration data 231) that can be used by subsequent compilation stages within workflow 200. As shown, configuration data 231 is used as additional input to the next compilation stage of workflow 200. Although for simplicity, configuration data 231 is used as input to the next stage in the workflow, in other implementations, configuration data 231 can be used in stages of workflow 200 that occur later in the workflow. Alternatively, configuration data 231 may not be used again in workflow 200.

[0030] As shown, the second compilation stage (compiler passes 202) generates logical netlist 211. As noted above, workflow 200 can include additional stages after the generation of logical netlist 211 to generate a resulting binary bitstream component for the target hardware architecture. In some implementations, hardware configuration generator component 128 can initiate workflow 200 at different starting point stages based on the type of input source received. For example, input source 135 may be HLL source, in which case hardware configuration generator 128 can initiate the workflow at an earlier compilation stage. In contrast, hardware configuration generator 128 can receive input source 136, and initiate the workflow at a later stage if appropriate. For example, if input source 136 is HDL code, then the generation of internal representation 210 can be bypassed to execute compiler passes 202 for generation of logical netlist 211.

[0031] FIG. 3 depicts an illustration of a hardware configuration generation workflow that utilizes configuration data from a different workflow, in accordance with one or more aspects of the present disclosure. As shown in FIG. 3, hardware component generator 128 can execute the various compilation stages of workflows 300-A and 300-B, where configuration data generated by workflow 300-A can direct the subsequent execution of workflow 300-B. As shown in FIG. 3, hardware configuration generator component 128 can be invoked to execute workflows 300-A, 300-B that can each include multiple compilation phases to transform input source into a binary bitstream component. It should be noted that for ease of illustration, workflows 300-A, 300-B each depict two compilation stages (compiler passes 301-A, 302-A, 301-B, 302-B) that generate corresponding internal representation components within the workflows (internal representation 310-A, 310-B, logical netlist 311-A, 311-B), but in other implementations workflows 300-A, 300-B can include more compilation stages that produce additional internal representation components within the workflow to produce a resulting binary bitstream component for a target hardware architecture (e.g., FPGA circuitry).

[0032] As shown in FIG. 3, hardware configuration generator component 128 receives input source 135-A and executes the compilation stages of workflow 300-A in a similar manner as that described above with respect to FIG. 2. As shown, compiler passes 302-A can generate configuration data 330 that can be used to direct one or more compilation stages of workflow 330-B. In an illustrative example, configuration data 330 can include information that can be used to determine the differences between input source 135-A and input source 135-B. Additionally or alternatively, configuration data 330 can include information that can be used to determine the differences between the generated internal representation 310-A and internal representation 310-B. In either instance, the configuration data 330 can be used within various stages of workflow 300-B to direct the applicable compilation stage (e.g., compiler passes 301-B, 302-B) to reduce overhead and improve performance.

[0033] For example, if the differences between input source 135-A and 135-B are significant, then configuration data 330 can direct hardware component 128 to execute additional compiler passes during compiler passes 301-B. Similarly, if the differences between the generated internal representation 310-A and 310-B are significant, then configuration data 330 can direct hardware component 128 to execute additional compiler passes during compiler passes 302-B. In contrast, if in either case the changes are not significant (e.g., the changes do not cause a difference in the resulting binary bitstream), then the configuration data 300 can direct hardware component 128 to execute fewer compilation passes or bypass a compilation stage entirely. Various examples of this determination are described below with respect to FIG. 5.

[0034] FIG. 4 depicts a block diagram illustrating an example of a hardware configuration generator component 410 for facilitating a compilation framework for hardware configuration generation. In some implementations, hardware configuration generator component 410 may correspond to hardware configuration generator component 128 of FIG. 1. As shown in FIG. 4, hardware configuration generator component 410 may be a component of a computing apparatus 400 that includes a processing device 405, operatively coupled to a memory 401, to execute consensus component 410. In some implementations, processing device 405 and memory 401 may correspond to processing device 802 and main memory 804 respectively as described below with respect to FIG. 8. In various implementations, computing apparatus 400 can be a standalone host computing device that communicates with various tenant computing devices. Alternatively, computing apparatus 400 can be a component of a cloud computing environment made up of multiple host computing devices, an Internet of Things (IoT) edge device, or the like. In various implementations, the computing apparatus 400 can be a physical device hosted by a computer system. Alternatively, computing apparatus 400 can be a host computing system that hosts a virtual device that includes the hardware configuration generator component 410. As described in further detail below, hardware configuration generator component 410 can be invoked by apparatus 400 to execute a compilation workflow for a target hardware architecture.

[0035] Hardware configuration generator component 410 can include source receiver 411, configuration identifier 412, compiler module 413, internal representation generator 414, configuration generator 415, and binary bitstream generator 416. Alternatively, the functionality of one or more of source receiver 411, configuration identifier 412, compiler module 413, internal representation generator 414, configuration generator 415, and binary bitstream generator 416 may be combined into a single module or divided into multiple sub-modules.

[0036] Source receiver 411 is responsible for receiving an input source code (e.g., input source 402) associated with a configuration for a target hardware architecture. In various implementations, input source code 402 can include high level language code (e.g., C, C++, Python, etc.) that includes instructions associated with functionalities to be performed by the target hardware architecture. Alternatively, input source code 402 can include target and/or vendor specific Hardware Description Language (HDL) code. As noted above, hardware configuration generator component 410 can be invoked to execute a compilation workflow that includes a series of transformations to convert the received input source code 402 into a binary bitstream representation for the target hardware architecture. In some implementations, the target hardware architecture can include field-programmable gate arrays (FPGA). Alternatively, other target hardware architectures can be used.

[0037] In various implementations, source receiver 411 can identify attributes of the source code 402 and determine whether or not to perform transformations of the source code (by invoking compiler module 413) and what transformations should be performed. In some implementations, source receiver 411 can, after receiving the input source 402, access workflow 404 to identify a series of compilation transformations to be performed on the input source 402. For example, if input source 402 includes high level language code, the workflow 404 may specify a series of transformations that convert the source to technology independent internal representations, a technology dependent internal representation, a technology dependent logical netlist, a technology mapped logical netlist, an optimized technology mapped logical netlist, a packed logical netlist, a placed physical netlist, a routed physical netlist, a hardware architecture memory mapping, then a final binary bitstream associated with the hardware architecture. Note that this is only one workflow example. In other implementations, the workflow 404 can include more or fewer transformation phases than those included in this list.

[0038] In some implementations, source receiver 411 can determine a starting point in the workflow 404 in view of the received input source 402. For example, if the input source 402 include high level language code, the workflow 404 can be initiated at a particular starting transformation stage that transforms the high level language code into a technology independent internal representation. Alternatively, if input source code 402 includes HDL code, then the workflow 404 can be initiated a later stage, where the first transformation stage generates the technology independent logical netlist. In this instance, the initial transformation stages that operate on high level languages can be bypassed. Notably, in implementations where input source 402 includes high level language code, any transformation stage that converts high level language code into HDL code can also be bypassed to improve efficiency, since the HDL source may not be needed to generate logical netlist internal representations.

[0039] Configuration identifier 412 is responsible for identifying configuration information (e.g., configuration data 403) associated with at least a portion of the configuration for the target hardware architecture. In various implementations, configuration data 403 can include information that can be injected at various different transformation stages of the compilation workflow 404 to provide context for those stages. In such instances, the configuration data 403 can be used by a compilation stage of workflow 404 so that the compilation stage should not need to reconstruct information that might have otherwise been obscured by a previous compilation stage. For example, if input source 402 includes HDL code, some information associated with algorithmic functionality that would be properly represented in high level language code might be lost. In such instances, configuration data 403 can include algorithmic representations of functionality associated with the target hardware architecture.

[0040] In an illustrative example, the HDL code may have been transformed from high level source code that included a matrix multiplication operation for the target hardware architecture. In transforming the matrix multiplication operation to HDL code, the resulting HDL could represent this operation as a series of lower level multiplication operations and addition operations, thus losing the context of the matrix multiplication operation. Configuration data 403 could include the algorithmic information that describes the matrix multiplication operation so that a subsequent compilation stage would not need to reconstruct the functionality from the HDL code. In other words, a subsequent compilation stage that generates a technology independent logical netlist component (or other internal representation that may follow the HDL code) could refer to the configuration data 403 to use the algorithmic representation of the matrix multiplication functionality to drive the compilation transformation passes most efficiently.

[0041] In other implementations, configuration data 403 can include data representing a trained machine learning model to direct the execution of compilation transformation stages within the workflow 404. For example, configuration data 403 can include data representing a machine learning model that determines whether to identify additional configuration data for a particular compilation transformation stage within the workflow. In such instances, the machine learning model, when executed, can analyze the received source code (or an internal representation component generated by a prior compilation stage within the workflow) and determine what type of source code it is and whether algorithmic data is available to direct the compilation process (e.g., a machine learning model can be trained, based on previous workflow executions, implementations of similar source code, attributes associated with the target hardware architecture, or other similar information, to perform one or more determinations described herein). Additionally, the machine learning model can determine, in view of any identified configuration information associated with the source or target hardware, whether or not to apply that configuration data to any particular compilation transformation stage in the workflow.

[0042] In other implementations, the machine learning model can determine the compilation passes to be performed at any particular compilation stage in the workflow. In such cases, the machine learning model can utilize attributes of the input source or internal representation produced by a previous stage to determine how many compiler passes to perform in the current stage. Similarly, the machine learning model can examine the results of each compiler pass to determine whether to perform additional passes based on similarly performed passes in other workflows or previous executions of the current workflow. Moreover, since there may be multiple different compiler passes that can be individually executed, the machine learning model can determine which passes to execute and in which order to perform their execution. In various implementations, this can include repetition of individual passes or repetition of all passes selected for execution. As discussed above, the machine learning model can be trained, based on previous workflow executions, implementations of similar source code, attributes associated with the target hardware architecture, or other similar information, to perform one or more determinations described herein.

[0043] Compiler module 413 is responsible for performing a set of compilation passes for a particular stage of the compilation workflow 404. In the case of an initial compilation transformation stage, compiler module 413 can perform the set of compilation passes of the input source code (e.g., high level language code, HDL code, etc.). Alternatively, in later stages of the workflow (e.g., stages occurring in the workflow after the first transformation of received source code), compiler module 413 can perform the set of compilation passes to transform one internal representation component into another. For example, in a workflow that includes transforming source into a technology independent internal representation, then into a technology dependent internal representation, compiler module 413 can be invoked to perform the compiler passes to facilitate these transformation. In other words, compiler module 413 can be invoked to transform a high level source into the technology independent internal representation. Subsequently, compiler module 413 can be invoked again to transform the technology independent internal representation into a technology dependent internal representation.

[0044] In various implementations, compiler module 413 can perform each set of compilation passes in view of the configuration information identified by configuration identifier 412 (e.g., configuration data 403). As noted above, compiler module 413 can incorporate the configuration information to direct the compilation passes for each stage of the workflow to provide addition context for that stage. Thus, the compilation transformation process can execute more efficiently by avoiding any overhead involved in rediscovering algorithmic representations of functionality that may have otherwise been lost (or abstracted out) in the performance of compilation passes for a previous stage in the workflow. In instances where the configuration data includes algorithmic representations of functionalities, compiler module 413 can input that data to better direct the compilation passes for a particular stage. Additionally, this type of configuration data can be used to reduce the number of passes since the compiler would not need to rediscover/reconstruct the instructions necessary to perform the associated functionality. In such instances, by receiving the configuration data separately, compiler module 413 can be extended for various different workflows and target hardware architectures without needing to rewrite and/or reconfigure the compiler module 413 for each different implementation.

[0045] Similarly, in implementations where the configuration information includes machine learning models, compiler module 413 can invoke the model(s) or receive information produced by the models to direct the execution of the compiler passes. As noted above, compiler module 413 can determine the compiler passes for each stage of the workflow using information produced by the machine learning models. In such instances, the machine learning models can analyze previous executions of the current workflow, previous executions of similar stages in other workflows, or other similar information to direct the current compilation transformation stage. In some instances, this information can be used to determine the passes to be executed. Additionally, or alternatively, this information can be used to direct the compiler to retrieve additional algorithmic functionality configuration data that should be incorporated into the currently executing compilation transformation stage.

[0046] Internal representation generator 414 is responsible for generating an internal representation component for the target hardware in view of the compilation passes performed by compiler 413 in a particular stage of the workflow. In various implementations, compiler 413 can invoke internal representation generator 414 once a determination has been made that the compiler 413 has completed enough passes to produce the internal representation component for the executing workflow stage. For example, in a workflow with a stage that transforms high level source into a technology independent internal representation, internal representation generator 414 can generate the resulting technology independent internal representation component based on the output of the compiler 413. The resulting internal representation component can be subsequently passed to additional stages of the workflow.

[0047] Configuration generator 415 is responsible for generating additional configuration information in view of the compilation passes of a stage in the workflow. In other words, at any stage in a workflow, the compiler 413 can not only receive configuration information that can be used to direct that set of compiler passes, but it can also generate additional configuration information that can be used by later stages of the same workflow. In such instances, the configuration generator 415 can be invoked by compiler 413 to generate additional configuration data that may be produced by that set of compiler passes. For example, during a compilation stage that transforms source code into an internal representation component, the configuration generator 415 could store portions of the source code that are directed to particular algorithmic functionality that could be used by later stages of the workflow. Similarly, configuration generator 415 can either generate or update a machine learning model with information generated by one workflow stage to further inform a later workflow stage.

[0048] In some implementations, configuration generator 415 can use configuration information received from a previous workflow stage to generate new configuration information to use in the current stage. For example, compiler 413 can receive configuration information that includes algorithmic representation of functionality provided by received source code or internal representation component. The configuration generator 415 can determine performance metrics associated with the algorithmic representation in view of the target hardware architecture. In other words, given knowledge about the intended target hardware, configuration generator 415 can analyze a transformed internal representation generated by the current stage of the workflow and determine whether or not the resulting hardware structure achieves a particular level of performance. Configuration generator 415 can subsequently generate information that can be used to direct performance of additional compilation passes in the current stage of the workflow, perform one or more other stages of the workflow, or bypass one or more stages of the workflow.

[0049] For example, configuration generator 415 can determine whether a resulting circuit connection transfers data within a particular time or energy threshold. Similarly, configuration generator 415 can determine whether a the internal representation component associated with a algorithmic functionality (e.g., a matrix multiplication functionality) performs its operation within a particular processing and/or energy threshold. If the threshold(s) are satisfied, additional stages in the workflow can be performed in accordance with the workflow path. In other words, the above process can be repeated for subsequent compilation stages of the workflow. If, however, the performance threshold is not satisfied for a particular stage, configuration generator 415 can generate information that can be used to repeat a previous stage and/or bypass one or more subsequent stages of the workflow. For example, if a workflow stage results in an internal representation component with poor performance, configuration generator 415 can terminate the current stage of the workflow without performing any subsequent stages. Alternatively, configuration generator 415 can terminate the current stage and fall back to a previous state to repeat a previous compilation stage to regenerate the internal representation component. In addition to falling back or terminating, the configuration generator 415 can also generate configuration data that can be input to previous stages in the workflow in order to meet constraints.

[0050] Binary bitstream generator 416 is responsible for generating the binary bitstream component for the target hardware architecture. In various implementations, binary bitstream generator 416 can be invoked as the final stage of a workflow to generate the representation of the target hardware architecture circuitry to be used to implement the logic received in the first stage of the workflow. As noted above, the above processing may be repeated for each stage of a workflow until binary bitstream generator 416 is invoked to generate the final bitstream component for the target hardware. It should be noted that while for simplicity of illustration, particular workflow stages have been described above, in various implementations, aspects of the present disclosure can be applied to workflows with more or fewer stages that perform similar or differing compilation transformations for internal representations for a target hardware architecture.

[0051] FIG. 5 depicts a block diagram illustrating an example of a hardware configuration generator component 510 for facilitating a compilation framework for hardware configuration generation using information generated by previously executed compilation workflows. In some implementations, hardware configuration generator component 510 may correspond to hardware configuration generator component 128 of FIG. 1. As shown in FIG. 5, hardware configuration generator component 510 may be a component of a computing apparatus 500 that includes a processing device 505, operatively coupled to a memory 501, to execute consensus component 510. In some implementations, processing device 505 and memory 501 may correspond to processing device 802 and main memory 804 respectively as described below with respect to FIG. 8. In various implementations, computing apparatus 500 can be a standalone host computing device that communicates with various tenant computing devices. Alternatively, computing apparatus 500 can be a component of a cloud computing environment made up of multiple host computing devices, an Internet of Things (IoT) edge device, or the like. In various implementations, the computing apparatus 500 can be a physical device hosted by a computer system. Alternatively, computing apparatus 500 can be a host computing system that hosts a virtual device that includes the hardware configuration generator component 510. As described in further detail below, hardware configuration generator component 510 can be invoked by apparatus 500 to execute a compilation workflow (e.g., workflow 504) for a target hardware architecture.

[0052] Hardware configuration generator component 510 can include source receiver 511, configuration retriever 512, compiler module 513, internal representation generator 514, configuration analyzer 515, and binary bitstream generator 516. Alternatively, the functionality of one or more of include source receiver 511, configuration retriever 512, compiler module 513, internal representation generator 514, configuration analyzer 515, and binary bitstream generator 516 may be combined into a single module or divided into multiple sub-modules.

[0053] Source receiver 511 is responsible for receiving an input source code (e.g., input source 502) associated with a configuration for a target hardware architecture. In various implementations, input source code 502 can include high level language code (e.g., C, C++, Python, etc.) that includes instructions associated with functionalities to be performed by the target hardware architecture. Alternatively, input source code 502 can include target and/or vendor specific Hardware Description Language (HDL) code. As noted above, hardware configuration generator component 510 can be invoked to execute a compilation workflow (e.g., workflow 504) that includes a series of transformations to convert the received input source code 502 into a binary bitstream representation for the target hardware architecture. In some implementations, the target hardware architecture can include field-programmable gate arrays (FPGA). Alternatively, other target hardware architectures can be used. Source receiver 511 can perform operations as described above with respect to source receiver 411 of FIG. 4.

[0054] Configuration retriever 512 is responsible for retrieving configuration information associated with at least a portion of the configuration for the target hardware architecture (e.g., configuration data 503). In various implementations, the retrieved configuration information is generated by a previous execution of the compilation workflow. In other words, the configuration information retrieved by configuration retriever 512 can be generated by a compilation stage in a different workflow (e.g., workflow 504-B) or the same workflow that was executed at a previous time. In contrast to the process described in FIG. 4 which describes utilizing configuration data generated by previous stages of the same workflow, configuration retriever 512 can additionally retrieve configuration information for the workflow stage that was generated during a previous execution of the workflow, as well as configuration information generated during the execution of a different workflow (e.g., workflow 504-B).

[0055] As described above with respect to FIG. 4, configuration data 503 can include information that can be injected at various different transformation stages of the compilation workflow 504 to provide context for those stages. In some implementations, as described above, configuration data 503 can include algorithmic representations of functionality associated with the target hardware architecture. In some implementations, as described above, configuration data 503 can include data representing one or more machine learning models trained to direct the execution of compilation transformation stages within the workflow 504.

[0056] In some implementations, configuration data 503 can include information associated with a previous version of the input source code or internal representation component being generated by an executing compilation stage of the workflow 504. In such instances, the configuration data 503 can direct the performance of compilation passes for a particular stage of the compilation workflow 504 in view of any detected changes. For example, if changes made to the source or internal representation component for a particular workflow stage indicate that the changes should not affect the resulting binary bitstream component, subsequent compilation passes or workflow stages can be bypassed or modified accordingly. In contrast, if the detected changes result in a global change in the resulting bitstream, then additional passes or stages can be included to the workflow dynamically.

[0057] In such instances, the configuration data 503 can include a complete listing of a previous version of the input source code, HDL, or internal representation. In such instances, the workflow stage can retrieve this information and execute a comparison between the current version and the previous version to determine the differences. Alternatively, the retrieved configuration data 503 can include the changes which can be analyzed directly by the workflow stage. In these instances, the workflow can execute a pre-processing stage (not pictured) that analyzes the differences between the product(s) of the previous version of the workflow with respect to the current workflow to determine the differences prior to executing a particular compilation stage. The use of the information is described in further detail below with respect to configuration analyzer 515.

[0058] Compiler module 513 is responsible for performing a set of compilation passes for a particular stage of the compilation workflow 504. As described above, in the case of an initial compilation transformation stage, compiler module 513 can perform the set of compilation passes of the input source code (e.g., high level language code, HDL code, etc.). Alternatively, in later stages of the workflow (e.g., stages occurring in the workflow after the first transformation of received source code), compiler module 513 can perform the set of compilation passes to transform one internal representation component into another. For example, in a workflow that includes transforming source into a technology independent internal representation, then into a technology dependent internal representation, compiler module 513 can be invoked to perform the compiler passes to facilitate these transformation. In other words, compiler module 513 can be invoked to transform a high level source into the technology independent internal representation. Subsequently, compiler module 513 can be invoked again to transform the technology independent internal representation into a technology dependent internal representation.

[0059] Also as described above, compiler module 513 can perform each set of compilation passes in view of the configuration information identified by configuration retriever 512 (e.g., configuration data 503). As noted above, compiler module 513 can incorporate the configuration information to direct the compilation passes for each stage of the workflow to provide addition context for that stage. In instances where the configuration data includes algorithmic representations of functionalities, compiler module 513 can input that data to better direct the compilation passes for a particular stage. Similarly, in implementations where the configuration information includes data representing machine learning models, compiler module 513 can invoke the model(s) or receive information produced by the models to direct the execution of the compiler passes.

[0060] Internal representation generator 514 is responsible for generating an internal representation component for the target hardware in view of the compilation passes performed by compiler 513 in a particular stage of the workflow. In various implementations, compiler 513 can invoke internal representation generator 514 once a determination has been made that the compiler 513 has completed enough passes to produce the internal representation component for the executing workflow stage. For example, in a workflow with a stage that transforms high level source into a technology independent internal representation, internal representation generator 514 can generate the resulting technology independent internal representation component based on the output of the compiler 513. The resulting internal representation component can be subsequently passed to additional stages of the workflow.

[0061] Configuration analyzer 515 is responsible for analyzing configuration information that includes information associated with a previous version of input source code, HDL, or other internal representation to direct execution of compilation stages of the workflow 504. As noted above, the configuration information can include the previous version of the source/internal representation. In such instances, configuration analyzer 515 can compare the previous version to the current version to identify changes between the two. In some implementations, configuration analyzer 515 can be invoked as a pre-processor for the workflow so that the comparison can be conducted before the compilation stages of the workflow are invoked. Alternatively, configuration analyzer 515 can receive configuration information that is stored as a list of the changes between the versions (e.g., generated by a source code change management system or other similar system or component). In such instances, configuration analyzer 515 can analyze the changes directly to determine whether or not the identified changes indicate an impact to the resulting binary bitstream component for the target architecture.

[0062] In various implementations, configuration analyzer 515 can analyze the set of differences and determine whether the differences satisfies a threshold. In such instances, configuration analyzer 515 can determine whether the differences between the versions involve a portion of the source or internal representation that affect performance of the resulting hardware component. For example, configuration analyzer 515 can determine if the changes involve the addition of a new algorithmic functionality that may cause an increase in data transfer or processor resources. Similarly, configuration analyzer 515 can determine if the changes increase the complexity of existing functionality (e.g., by changing a simple mathematical operation to a matrix multiplication operation). Configuration analyzer 515 can determine whether the changes meet or exceed a threshold in terms of lines of code changed, complexity of operation changed, amount of resources required changed, amount of target hardware circuitry changed, or any other similar metric.

[0063] If the threshold is satisfied (e.g., the detected changes impact the resulting bitstream component), additional compilation passes for the compilation stage (or additional stages in the workflow) can be performed in accordance with the workflow path. In other words, the subsequent compilation stages of the workflow can be executed in full to regenerate the subsequent components in the workflow. If, however, the threshold is not satisfied for a particular stage with respect to the amount changes (e.g., the changes are minimal or do not affect the resulting bitstream component), configuration analyzer 515 can generate information that can be used to bypass one or more compilation passes of the current stage of the workflow or modify the portion of the internal representation that is regenerated for the current stage of the workflow. In such instances, the compiler 513 can be directed to regenerate only a portion of the internal representation rather than execute a complete transformation cycle from scratch. In other instances, configuration analyzer 515 can generate information that can be used to bypass subsequent stages of the workflow. For example, if a workflow stage results in an internal representation component with minimal changes, configuration analyzer 515 can determine that one or more subsequent stages do not need to be executed, and proceed to a later stage in the workflow.

[0064] As noted above, binary bitstream generator 516 is responsible for generating the binary bitstream component for the target hardware architecture. In various implementations, binary bitstream generator 516 can be invoked as the final stage of a workflow to generate the representation of the target hardware architecture circuitry to be used to implement the logic received in the first stage of the workflow. As noted above, the above processing may be repeated for each stage of a workflow until binary bitstream generator 516 is invoked to generate the final bitstream component for the target hardware. It should be noted that while for simplicity of illustration, particular workflow stages have been described above, in various implementations, aspects of the present disclosure can be applied to workflows with more or fewer stages that perform similar or differing compilation transformations for internal representations for a target hardware architecture.

[0065] FIG. 6 depicts a flow diagram of an example method 600 for facilitating a compilation framework for hardware configuration generation, in accordance with one or more aspects of the present disclosure. Method 600 may be performed by processing logic that includes hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processor to perform hardware simulation), or a combination thereof. Method 600 or each of its individual functions, routines, subroutines, or operations may be performed by one or more processors of a computer system (e.g., the host computer system 110 of FIG. 1) implementing the method. In an illustrative example, method 600 may be performed by hardware configuration generator component 128 in FIG. 1 or hardware configuration generator component 410 in FIG. 4. Alternatively, some or all of method 600 might be performed by another module or machine. Method 600 may be performed by a single processing thread. Alternatively, method 600 may be performed by two or more processing threads, each thread implementing one or more individual functions, routines, subroutines, or operations of the method. In an illustrative example, the processing threads implementing method 600 may be synchronized (e.g., using semaphores, critical sections, or other thread synchronization mechanisms). Alternatively, the processing threads implementing method 600 may be executed asynchronously with respect to each other. Therefore, while FIG. 6 and the associated description lists the operations of method 600 in certain order, various implementations of the method may perform at least some of the described operations in parallel or in arbitrary selected orders.

[0066] At block 605, processing logic receives an input source code associated with a configuration for a target hardware architecture. At block 610, processing logic identifies a first configuration information associated with at least a portion of the configuration for the target hardware architecture. At block 615, processing logic performs a first set of compilation passes of the input source code in view of the first configuration information. At block 620, processing logic generates, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture. At block 625, processing logic generates, in view of the first internal representation component, a binary bitstream component for the target hardware architecture.

[0067] FIG. 7 depicts a flow diagram of an example method 700 for facilitating a compilation framework for hardware configuration generation using information generated by previously executed compilation workflows, in accordance with one or more aspects of the present disclosure. Method 700 may be performed by processing logic that includes hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processor to perform hardware simulation), or a combination thereof. Method 700 or each of its individual functions, routines, subroutines, or operations may be performed by one or more processors of a computer system (e.g., the host computer system 110 of FIG. 1) implementing the method. In an illustrative example, method 700 may be performed by hardware configuration generator component 128 in FIG. 1 or hardware configuration generator component 510 in FIG. 5. Alternatively, some or all of method 700 might be performed by another module or machine. Method 700 may be performed by a single processing thread. Alternatively, method 700 may be performed by two or more processing threads, each thread implementing one or more individual functions, routines, subroutines, or operations of the method. In an illustrative example, the processing threads implementing method 700 may be synchronized (e.g., using semaphores, critical sections, or other thread synchronization mechanisms). Alternatively, the processing threads implementing method 700 may be executed asynchronously with respect to each other. Therefore, while FIG. 7 and the associated description lists the operations of method 700 in certain order, various implementations of the method may perform at least some of the described operations in parallel or in arbitrary selected orders.

[0068] At block 705, processing logic receives an input source code associated with a configuration for a target hardware architecture. At block 710, processing logic retrieves a first configuration information generated by a previous execution of the compilation workflow, wherein the first configuration information is associated with at least a portion of the configuration for the target hardware architecture. At block 715, processing logic performs a first set of compilation passes of the input source code in view of the first configuration information. At block 720, processing logic generates, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture. At block 725, processing logic generates, in view of the first internal representation component, a binary bitstream component for the target hardware architecture

[0069] FIG. 8 depicts a block diagram of a computer system operating in accordance with one or more aspects of the disclosure. In various illustrative examples, computer system 800 may correspond to a computing device within system architecture 100 of FIG. 1, apparatus 400 of FIG. 4, and/or apparatus 500 of FIG. 5. The computer system 800 may be included within a data center that supports virtualization. Virtualization within a data center results in a physical system being virtualized using VMs to consolidate the data center infrastructure and increase operational efficiencies. A VM may be a program-based emulation of computer hardware. For example, the VM may operate based on computer architecture and functions of computer hardware resources associated with hard disks or other such memory. The VM may emulate a physical computing environment, but requests for a hard disk or memory may be managed by a virtualization layer of a host system to translate these requests to the underlying physical computing hardware resources. This type of virtualization results in multiple VMs sharing physical resources.

[0070] In certain implementations, computer system 800 may be connected (e.g., via a network, such as a Local Area Network (LAN), an intranet, an extranet, or the Internet) to other computer systems. Computer system 800 may operate in the capacity of a server or a client computer in a client-server environment, or as a peer computer in a peer-to-peer or distributed network environment. Computer system 800 may be provided by a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a network router, switch or bridge, or any device capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that device. Further, the term "computer" shall include any collection of computers that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methods described herein.

[0071] In a further aspect, the computer system 800 may include a processing device 802, a volatile memory 804 (e.g., random access memory (RAM)), a non-volatile memory 806 (e.g., read-only memory (ROM) or electrically-erasable programmable ROM (EEPROM)), and a data storage device 818, which may communicate with each other, as well as with other components of the system via a bus 830.

[0072] Processing device 802 may be provided by one or more processors such as a general purpose processor (such as, for example, a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a microprocessor implementing other types of instruction sets, or a microprocessor implementing a combination of types of instruction sets) or a specialized processor (such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), a graphics processing unit (GPU), a network processor, or the like).

[0073] Computer system 800 may further include a network interface device 808. Computer system 800 also may include a video display unit 810 (e.g., an LCD), an alphanumeric input device 812 (e.g., a keyboard), a cursor control device 814 (e.g., a mouse), and a signal generation device 816.

[0074] Data storage device 818 may include a non-transitory computer-readable storage medium 828 on which may store instructions 822 embodying any one or more of the methodologies or functions described herein (e.g., hardware configuration generator component 128). Instructions 822 may also reside, completely or partially, within volatile memory 804 and/or within processing device 802 during execution thereof by computer system 800, hence, volatile memory 804 and processing device 802 may also constitute machine-readable storage media.

[0075] While computer-readable storage medium 828 is shown in the illustrative examples as a single medium, the term "computer-readable storage medium" shall include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of executable instructions. The term "computer-readable storage medium" shall also include any tangible medium that is capable of storing or encoding a set of instructions for execution by a computer that cause the computer to perform any one or more of the methods described herein. The term "computer-readable storage medium" shall include, but not be limited to, solid-state memories, optical media, and magnetic media.

[0076] The methods, components, and features described herein may be implemented by discrete hardware components or may be integrated in the functionality of other hardware components such as ASICS, FPGAs, DSPs or similar devices. In addition, the methods, components, and features may be implemented by firmware modules or functional circuitry within hardware devices. Further, the methods, components, and features may be implemented in any combination of hardware devices and computer program components, or in computer programs.

[0077] Unless specifically stated otherwise, terms such as "receiving," "identifying," "determining," "generating," or the like, refer to actions and processes performed or implemented by computer systems that manipulates and transforms data represented as physical (electronic) quantities within the computer system registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices. Also, the terms "first," "second," "third," "fourth," etc. as used herein are meant as labels to distinguish among different elements and may not have an ordinal meaning according to their numerical designation.

[0078] Examples described herein also relate to an apparatus for performing the methods described herein. This apparatus may be specially constructed for performing the methods described herein, or it may comprise a general purpose computer system selectively programmed by a computer program stored in the computer system. Such a computer program may be stored in a computer-readable tangible storage medium.

[0079] The methods and illustrative examples described herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used in accordance with the teachings described herein, or it may prove convenient to construct more specialized apparatus to perform method 300, and/or each of its individual functions, routines, subroutines, or operations. Examples of the structure for a variety of these systems are set forth in the description above.

[0080] The above description is intended to be illustrative, and not restrictive. Although the disclosure has been described with references to specific illustrative examples and implementations, it will be recognized that the present disclosure is not limited to the examples and implementations described. The scope of the disclosure should be determined with reference to the following claims, along with the full scope of equivalents to which the claims are entitled.

[0081] Other computer system designs and configurations may also be suitable to implement the systems and methods described herein. The following examples illustrate various implementations in accordance with one or more aspects of the present disclosure.

[0082] Example 1 is a method comprising: receiving, by a processing device executing a compilation workflow for a target hardware architecture, an input source code associated with a configuration for the target hardware architecture; identifying a first configuration information associated with at least a portion of the configuration for the target hardware architecture; performing a first set of compilation passes of the input source code in view of the first configuration information; generating, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture; and generating, in view of the first internal representation component, a binary bitstream component for the target hardware architecture.

[0083] Example 2 is the method of Example 1, further comprising: generating, in view of the first set of compilation passes, a second configuration information associated with at least an additional portion of the configuration for the target hardware architecture.

[0084] Example 3 is the method of Example 2, further comprising: performing a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information; and generating, in view of the second set of compilation passes of the first internal representation component, a second internal representation component for the target hardware architecture.

[0085] Example 4 is the method of Example 2, wherein the second configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, the method further comprising: determining a performance metric associated with the algorithmic representation of the functionality in the second configuration information in view of the target hardware architecture; and responsive to determining that the performance metric satisfies a threshold, performing a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information.

[0086] Example 5 is the method of Example 2, wherein the second configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, the method further comprising: determining a performance metric associated with the algorithmic representation of the functionality in the second configuration information in view of the target hardware architecture; and responsive to determining that the performance metric does not satisfy a threshold, bypassing performing an additional set of compilation passes of the first internal representation component.

[0087] Example 6 is the method of Example 1, wherein the first configuration information comprises an algorithmic representation of a functionality associated with the target hardware configuration.

[0088] Example 7 is the method of Example 1, wherein the first configuration information comprises data representing a machine learning model trained to perform at least one of determining whether to identify an additional configuration information, or determining a number of compilation passes in the set of compilation passes.

[0089] Example 8 is a system comprising: a memory; and a processing device operatively coupled to the memory, the processing device to: receive an input source code associated with a configuration for a target hardware architecture; identify a first configuration information associated with at least a portion of the configuration for the target hardware architecture; perform a first set of compilation passes of the input source code in view of the first configuration information; generate, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture; and generate, in view of the first internal representation component, a binary bitstream component for the target hardware architecture.

[0090] Example 9 is the system of Example 8, wherein the processing device is further to: generate, in view of the first set of compilation passes, a second configuration information associated with at least an additional portion of the configuration for the target hardware architecture.

[0091] Example 10 is the system of Example 9, wherein the processing device is further to: perform a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information; and generate, in view of the second set of compilation passes of the first internal representation component, a second internal representation component for the target hardware architecture.

[0092] Example 11 is the system of Example 9, wherein the second configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, and wherein the processing device is further to: determine a performance metric associated with the algorithmic representation of the functionality in the second configuration information in view of the target hardware architecture; and responsive to determining that the performance metric satisfies a threshold, perform a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information.

[0093] Example 12 is the system of Example 9, wherein the second configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, and wherein the processing device is further to: determine a performance metric associated with the algorithmic representation of the functionality in the second configuration information in view of the target hardware architecture; and responsive to determining that the performance metric does not satisfy a threshold, bypass performing an additional set of compilation passes of the first internal representation component.

[0094] Example 13 is the system of Example 8, wherein the first configuration information comprises data representing a machine learning model trained to perform at least one of determining whether to identify an additional configuration information, or determining a number of compilation passes in the set of compilation passes.

[0095] Example 14 is a non-transitory computer-readable media storing instructions that, when executed, cause a processing device to: receive an input source code associated with a configuration for a target hardware architecture; identify a first configuration information associated with at least a portion of the configuration for the target hardware architecture; perform a first set of compilation passes of the input source code in view of the first configuration information; generate, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture; and generate, in view of the first internal representation component, a binary bitstream component for the target hardware architecture.

[0096] Example 15 is the non-transitory computer-readable media of Example 14, wherein the processing device is further to: generate, in view of the first set of compilation passes, a second configuration information associated with at least an additional portion of the configuration for the target hardware architecture.

[0097] Example 16 is the non-transitory computer-readable media of Example 15, wherein the processing device is further to: perform a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information; and generate, in view of the second set of compilation passes of the first internal representation component, a second internal representation component for the target hardware architecture.

[0098] Example 17 is the non-transitory computer-readable media of Example 15, wherein the second configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, and wherein the processing device is further to: determine a performance metric associated with the algorithmic representation of the functionality in the second configuration information in view of the target hardware architecture; and responsive to determining that the performance metric satisfies a threshold, perform a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information.

[0099] Example 18 is the non-transitory computer-readable media of Example 15, wherein the second configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, and wherein the processing device is further to: determine a performance metric associated with the algorithmic representation of the functionality in the second configuration information in view of the target hardware architecture; and responsive to determining that the performance metric does not satisfy a threshold, bypass performing an additional set of compilation passes of the first internal representation component.

[0100] Example 19 is the non-transitory computer-readable media of Example 14, wherein the first configuration information comprises an algorithmic representation of a functionality associated with the target hardware configuration.

[0101] Example 20 is the non-transitory computer-readable media of Example 14, wherein the first configuration information comprises data representing a machine learning model trained to perform at least one of determining whether to identify an additional configuration information, or determining a number of compilation passes in the set of compilation passes.

[0102] Example 21 is a method comprising: receiving, by a processing device executing a compilation workflow for a target hardware architecture, an input source code associated with a configuration for the target hardware architecture; retrieving a first configuration information generated by a previous execution of the compilation workflow, wherein the first configuration information is associated with at least a portion of the configuration for the target hardware architecture; performing a first set of compilation passes of the input source code in view of the first configuration information from the previous execution of the compilation workflow; generating, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture; and generating, in view of the first internal representation component, a binary bitstream component for the target hardware architecture.

[0103] Example 22 is the method of Example 21, wherein the configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, the method further comprising: determining a performance metric associated with the algorithmic representation of the functionality in the configuration information in view of the target hardware architecture; and responsive to determining that the performance metric satisfies a threshold, performing a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information.

[0104] Example 23 is the method of Example 21, wherein the configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, the method further comprising: determining a performance metric associated with the algorithmic representation of the functionality in the configuration information in view of the target hardware architecture; and responsive to determining that the performance metric does not satisfy a threshold, bypassing performing an additional set of compilation passes of the first internal representation component.

[0105] Example 24 is the method of Example 21, wherein the configuration information comprises information associated with a previous version of the input source code.

[0106] Example 25 is the method of Example 24, further comprising: determining a set of differences between the input source code and the previous version of the input source code; determining whether the set of differences between the input source code and the previous version of the input source code satisfies a threshold; and responsive to determining that the set of differences satisfies the threshold, performing an additional set of compilation passes of the input source code.

[0107] Example 26 is the method of Example 24, further comprising: determining a set of differences between the input source code and the previous version of the input source code; determining whether the set of differences between the input source code and the previous version of the input source code satisfies a threshold; and responsive to determining that the set of differences does not the threshold, bypassing performing an additional set of compilation passes of the first internal representation component.

[0108] Example 27 is a system comprising: a memory; and a processing device operatively coupled to the memory, the processing device to: receive, by the processing device executing a compilation workflow for a target hardware architecture, an input source code associated with a configuration for the target hardware architecture; retrieve a first configuration information generated by a previous execution of the compilation workflow, wherein the first configuration information is associated with at least a portion of the configuration for the target hardware architecture; perform a first set of compilation passes of the input source code in view of the first configuration information from the previous execution of the compilation workflow; generate, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture; and generate, in view of the first internal representation component, a binary bitstream component for the target hardware architecture.

[0109] Example 28 is the system of Example 27, wherein the configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, the processing device further to: determine a performance metric associated with the algorithmic representation of the functionality in the configuration information in view of the target hardware architecture; and responsive to determining that the performance metric satisfies a threshold, perform a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information.

[0110] Example 29 is the system of Example 27, wherein the configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, the processing device further to: determine a performance metric associated with the algorithmic representation of the functionality in the configuration information in view of the target hardware architecture; and responsive to determining that the performance metric does not satisfy a threshold, bypass performing an additional set of compilation passes of the first internal representation component.

[0111] Example 30 is the system of Example 27, wherein the configuration information comprises information associated with a previous version of the input source code.

[0112] Example 31 is the system of Example 30, the processing device further to: determine a set of differences between the input source code and the previous version of the input source code; determine whether the set of differences between the input source code and the previous version of the input source code satisfies a threshold; and responsive to determining that the set of differences satisfies the threshold, perform an additional set of compilation passes of the input source code.

[0113] Example 32 is the system of Example 30, the processing device further to: determine a set of differences between the input source code and the previous version of the input source code; determine whether the set of differences between the input source code and the previous version of the input source code satisfies a threshold; and responsive to determining that the set of differences does not the threshold, bypass performing an additional set of compilation passes of the first internal representation component.

[0114] Example 33 is a non-transitory computer-readable media storing instructions that, when executed, cause a processing device to: receive, by the processing device executing a compilation workflow for a target hardware architecture, an input source code associated with a configuration for the target hardware architecture; retrieve a first configuration information generated by a previous execution of the compilation workflow, wherein the first configuration information is associated with at least a portion of the configuration for the target hardware architecture; perform a first set of compilation passes of the input source code in view of the first configuration information from the previous execution of the compilation workflow; generate, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture; and generate, in view of the first internal representation component, a binary bitstream component for the target hardware architecture.

[0115] Example 34 is the non-transitory computer-readable media of Example 33, wherein the configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, the processing device further to: determine a performance metric associated with the algorithmic representation of the functionality in the configuration information in view of the target hardware architecture; and responsive to determining that the performance metric satisfies a threshold, perform a second set of compilation passes of the first internal representation component in view of at least one of the first configuration information or the second configuration information.

[0116] Example 35 is the non-transitory computer-readable media of Example 33, wherein the configuration information comprises an algorithmic representation of a functionality provided by the first internal representation component, the processing device further to: determine a performance metric associated with the algorithmic representation of the functionality in the configuration information in view of the target hardware architecture; and responsive to determining that the performance metric does not satisfy a threshold, bypass performing an additional set of compilation passes of the first internal representation component.

[0117] Example 36 is the non-transitory computer-readable media of Example 33, wherein the configuration information comprises information associated with a previous version of the input source code.

[0118] Example 37 is the non-transitory computer-readable media of Example 36, the processing device further to: determine a set of differences between the input source code and the previous version of the input source code; determine whether the set of differences between the input source code and the previous version of the input source code satisfies a threshold; and responsive to determining that the set of differences satisfies the threshold, perform an additional set of compilation passes of the input source code.

[0119] Example 38 is the non-transitory computer-readable media of Example 36, the processing device further to: determine a set of differences between the input source code and the previous version of the input source code; determine whether the set of differences between the input source code and the previous version of the input source code satisfies a threshold; and responsive to determining that the set of differences does not the threshold, bypass performing an additional set of compilation passes of the first internal representation component.

[0120] Example 39 is an apparatus comprising: a means to receive, by a processing device executing a compilation workflow for a target hardware architecture, an input source code associated with a configuration for the target hardware architecture; a means to identify a first configuration information associated with at least a portion of the configuration for the target hardware architecture; a means to perform a first set of compilation passes of the input source code in view of the first configuration information; a means to generate, in view of the first set of compilation passes of the input source code, a first internal representation component for the target hardware architecture; and a means to generate, in view of the first internal representation component, a binary bitstream component for the target hardware architecture.

* * * * *