lib/Target/Blackfin/README.txt - llvm - Git at Google

 //===-- README.txt - Notes for Blackfin Target ------------------*- org -*-===//

 * Condition codes
 ** DONE Problem with asymmetric SETCC operations
 The instruction

   CC = R0 < 2

 is not symmetric - there is no R0 > 2 instruction. On the other hand, IF CC
 JUMP can take both CC and !CC as a condition. We cannot pattern-match (brcond
 (not cc), target), the DAG optimizer removes that kind of thing.

 This is handled by creating a pseudo-register NCC that aliases CC. Register
 classes JustCC and NotCC are used to control the inversion of CC.

 ** DONE CC as an i32 register
 The AnyCC register class pretends to hold i32 values. It can only represent the
 values 0 and 1, but we can copy to and from the D class. This hack makes it
 possible to represent the setcc instruction without having i1 as a legal type.

 In most cases, the CC register is set by a "CC = .." or BITTST instruction, and
 then used in a conditional branch or move. The code generator thinks it is
 moving 32 bits, but the value stays in CC. In other cases, the result of a
 comparison is actually used as am i32 number, and CC will be copied to a D
 register.

 * Stack frames
 ** TODO Use Push/Pop instructions
 We should use the push/pop instructions when saving callee-saved
 registers. The are smaller, and we may even use push multiple instructions.

 ** TODO requiresRegisterScavenging
 We need more intelligence in determining when the scavenger is needed. We
 should keep track of:
 - Spilling D16 registers
 - Spilling AnyCC registers

 * Assembler
 ** TODO Implement PrintGlobalVariable
 ** TODO Remove LOAD32sym
 It's a hack combining two instructions by concatenation.

 * Inline Assembly

 These are the GCC constraints from bfin/constraints.md:

 | Code  | Register class                            | LLVM |
 |-------+-------------------------------------------+------|
 | a     | P                                         | C    |
 | d     | D                                         | C    |
 | z     | Call clobbered P (P0, P1, P2)             | X    |
 | D     | EvenD                                     | X    |
 | W     | OddD                                      | X    |
 | e     | Accu                                      | C    |
 | A     | A0                                        | S    |
 | B     | A1                                        | S    |
 | b     | I                                         | C    |
 | v     | B                                         | C    |
 | f     | M                                         | C    |
 | c     | Circular I, B, L                          | X    |
 | C     | JustCC                                    | S    |
 | t     | LoopTop                                   | X    |
 | u     | LoopBottom                                | X    |
 | k     | LoopCount                                 | X    |
 | x     | GR                                        | C    |
 | y     | RET*, ASTAT, SEQSTAT, USP                 | X    |
 | w     | ALL                                       | C    |
 | Z     | The FD-PIC GOT pointer (P3)               | S    |
 | Y     | The FD-PIC function pointer register (P1) | S    |
 | q0-q7 | R0-R7 individually                        |      |
 | qA    | P0                                        |      |
 |-------+-------------------------------------------+------|
 | Code  | Constant                                  |      |
 |-------+-------------------------------------------+------|
 | J     | 1<<N, N<32                                |      |
 | Ks3   | imm3                                      |      |
 | Ku3   | uimm3                                     |      |
 | Ks4   | imm4                                      |      |
 | Ku4   | uimm4                                     |      |
 | Ks5   | imm5                                      |      |
 | Ku5   | uimm5                                     |      |
 | Ks7   | imm7                                      |      |
 | KN7   | -imm7                                     |      |
 | Ksh   | imm16                                     |      |
 | Kuh   | uimm16                                    |      |
 | L     | ~(1<<N)                                   |      |
 | M1    | 0xff                                      |      |
 | M2    | 0xffff                                    |      |
 | P0-P4 | 0-4                                       |      |
 | PA    | Macflag, not M                            |      |
 | PB    | Macflag, only M                           |      |
 | Q     | Symbol                                    |      |

 ** TODO Support all register classes
 * DAG combiner
 ** Create test case for each Illegal SETCC case
 The DAG combiner may someimes produce illegal i16 SETCC instructions.

 *** TODO SETCC (ctlz x), 5) == const
 *** TODO SETCC (and load, const) == const
 *** DONE SETCC (zext x) == const
 *** TODO SETCC (sext x) == const

 * Instruction selection
 ** TODO Better imediate constants
 Like ARM, build constants as small imm + shift.

 ** TODO Implement cycle counter
 We have CYCLES and CYCLES2 registers, but the readcyclecounter intrinsic wants
 to return i64, and the code generator doesn't know how to legalize that.

 ** TODO Instruction alternatives
 Some instructions come in different variants for example:

   D = D + D
   P = P + P

 Cross combinations are not allowed:

   P = D + D (bad)

 Similarly for the subreg pseudo-instructions:

  D16L = EXTRACT_SUBREG D16, bfin_subreg_lo16
  P16L = EXTRACT_SUBREG P16, bfin_subreg_lo16

 We want to take advantage of the alternative instructions. This could be done by
 changing the DAG after instruction selection.


 ** Multipatterns for load/store
 We should try to identify multipatterns for load and store instructions. The
 available instruction matrix is a bit irregular.

 Loads:

 | Addr       | D | P | D 16z | D 16s | D16 | D 8z | D 8s |
 |------------+---+---+-------+-------+-----+------+------|
 | P          | * | * | *     | *     | *   | *    | *    |
 | P++        | * | * | *     | *     |     | *    | *    |
 | P--        | * | * | *     | *     |     | *    | *    |
 | P+uimm5m2  |   |   | *     | *     |     |      |      |
 | P+uimm6m4  | * | * |       |       |     |      |      |
 | P+imm16    |   |   |       |       |     | *    | *    |
 | P+imm17m2  |   |   | *     | *     |     |      |      |
 | P+imm18m4  | * | * |       |       |     |      |      |
 | P++P       | * |   | *     | *     | *   |      |      |
 | FP-uimm7m4 | * | * |       |       |     |      |      |
 | I          | * |   |       |       | *   |      |      |
 | I++        | * |   |       |       | *   |      |      |
 | I--        | * |   |       |       | *   |      |      |
 | I++M       | * |   |       |       |     |      |      |

 Stores:

 | Addr       | D | P | D16H | D16L | D 8 |
 |------------+---+---+------+------+-----|
 | P          | * | * | *    | *    | *   |
 | P++        | * | * |      | *    | *   |
 | P--        | * | * |      | *    | *   |
 | P+uimm5m2  |   |   |      | *    |     |
 | P+uimm6m4  | * | * |      |      |     |
 | P+imm16    |   |   |      |      | *   |
 | P+imm17m2  |   |   |      | *    |     |
 | P+imm18m4  | * | * |      |      |     |
 | P++P       | * |   | *    | *    |     |
 | FP-uimm7m4 | * | * |      |      |     |
 | I          | * |   | *    | *    |     |
 | I++        | * |   | *    | *    |     |
 | I--        | * |   | *    | *    |     |
 | I++M       | * |   |      |      |     |

 * Workarounds and features
 Blackfin CPUs have bugs. Each model comes in a number of silicon revisions with
 different bugs. We learn about the CPU model from the -mcpu switch.

 ** Interpretation of -mcpu value
 - -mcpu=bf527 refers to the latest known BF527 revision
 - -mcpu=bf527-0.2 refers to silicon rev. 0.2
 - -mcpu=bf527-any refers to all known revisions
 - -mcpu=bf527-none disables all workarounds

 The -mcpu setting affects the __SILICON_REVISION__ macro and enabled workarounds:

 | -mcpu      | __SILICON_REVISION__ | Workarounds        |
 |------------+----------------------+--------------------|
 | bf527      | Def Latest           | Specific to latest |
 | bf527-1.3  | Def 0x0103           | Specific to 1.3    |
 | bf527-any  | Def 0xffff           | All bf527-x.y      |
 | bf527-none | Undefined            | None               |

 These are the known cores and revisions:

 | Core        | Silicon            | Processors              |
 |-------------+--------------------+-------------------------|
 | Edinburgh   | 0.3, 0.4, 0.5, 0.6 | BF531 BF532 BF533       |
 | Braemar     | 0.2, 0.3           | BF534 BF536 BF537       |
 | Stirling    | 0.3, 0.4, 0.5      | BF538 BF539             |
 | Moab        | 0.0, 0.1, 0.2      | BF542 BF544 BF548 BF549 |
 | Teton       | 0.3, 0.5           | BF561                   |
 | Kookaburra  | 0.0, 0.1, 0.2      | BF523 BF525 BF527       |
 | Mockingbird | 0.0, 0.1           | BF522 BF524 BF526       |
 | Brodie      | 0.0, 0.1           | BF512 BF514 BF516 BF518 |


 ** Compiler implemented workarounds
 Most workarounds are implemented in header files and source code using the
 __ADSPBF527__ macros. A few workarounds require compiler support.

 |  Anomaly | Macro                          | GCC Switch       |
 |----------+--------------------------------+------------------|
 |      Any | __WORKAROUNDS_ENABLED          |                  |
 | 05000074 | WA_05000074                    |                  |
 | 05000244 | __WORKAROUND_SPECULATIVE_SYNCS | -mcsync-anomaly  |
 | 05000245 | __WORKAROUND_SPECULATIVE_LOADS | -mspecld-anomaly |
 | 05000257 | WA_05000257                    |                  |
 | 05000283 | WA_05000283                    |                  |
 | 05000312 | WA_LOAD_LCREGS                 |                  |
 | 05000315 | WA_05000315                    |                  |
 | 05000371 | __WORKAROUND_RETS              |                  |
 | 05000426 | __WORKAROUND_INDIRECT_CALLS    | Not -micplb      |

 ** GCC feature switches
 | Switch                    | Description                            |
 |---------------------------+----------------------------------------|
 | -msim                     | Use simulator runtime                  |
 | -momit-leaf-frame-pointer | Omit frame pointer for leaf functions  |
 | -mlow64k                  |                                        |
 | -mcsync-anomaly           |                                        |
 | -mspecld-anomaly          |                                        |
 | -mid-shared-library       |                                        |
 | -mleaf-id-shared-library  |                                        |
 | -mshared-library-id=      |                                        |
 | -msep-data                | Enable separate data segment           |
 | -mlong-calls              | Use indirect calls                     |
 | -mfast-fp                 |                                        |
 | -mfdpic                   |                                        |
 | -minline-plt              |                                        |
 | -mstack-check-l1          | Do stack checking in L1 scratch memory |
 | -mmulticore               | Enable multicore support               |
 | -mcorea                   | Build for Core A                       |
 | -mcoreb                   | Build for Core B                       |
 | -msdram                   | Build for SDRAM                        |
 | -micplb                   | Assume ICPLBs are enabled at runtime.  |
	//===-- README.txt - Notes for Blackfin Target ------------------- org --===//

	* Condition codes
	** DONE Problem with asymmetric SETCC operations
	The instruction

	CC = R0 < 2

	is not symmetric - there is no R0 > 2 instruction. On the other hand, IF CC
	JUMP can take both CC and !CC as a condition. We cannot pattern-match (brcond
	(not cc), target), the DAG optimizer removes that kind of thing.

	This is handled by creating a pseudo-register NCC that aliases CC. Register
	classes JustCC and NotCC are used to control the inversion of CC.

	** DONE CC as an i32 register
	The AnyCC register class pretends to hold i32 values. It can only represent the
	values 0 and 1, but we can copy to and from the D class. This hack makes it
	possible to represent the setcc instruction without having i1 as a legal type.

	In most cases, the CC register is set by a "CC = .." or BITTST instruction, and
	then used in a conditional branch or move. The code generator thinks it is
	moving 32 bits, but the value stays in CC. In other cases, the result of a
	comparison is actually used as am i32 number, and CC will be copied to a D
	register.

	* Stack frames
	** TODO Use Push/Pop instructions
	We should use the push/pop instructions when saving callee-saved
	registers. The are smaller, and we may even use push multiple instructions.

	** TODO requiresRegisterScavenging
	We need more intelligence in determining when the scavenger is needed. We
	should keep track of:
	- Spilling D16 registers
	- Spilling AnyCC registers

	* Assembler
	** TODO Implement PrintGlobalVariable
	** TODO Remove LOAD32sym
	It's a hack combining two instructions by concatenation.

	* Inline Assembly

	These are the GCC constraints from bfin/constraints.md:

	\| Code \| Register class \| LLVM \|
	\|-------+-------------------------------------------+------\|
	\| a \| P \| C \|
	\| d \| D \| C \|
	\| z \| Call clobbered P (P0, P1, P2) \| X \|
	\| D \| EvenD \| X \|
	\| W \| OddD \| X \|
	\| e \| Accu \| C \|
	\| A \| A0 \| S \|
	\| B \| A1 \| S \|
	\| b \| I \| C \|
	\| v \| B \| C \|
	\| f \| M \| C \|
	\| c \| Circular I, B, L \| X \|
	\| C \| JustCC \| S \|
	\| t \| LoopTop \| X \|
	\| u \| LoopBottom \| X \|
	\| k \| LoopCount \| X \|
	\| x \| GR \| C \|
	\| y \| RET*, ASTAT, SEQSTAT, USP \| X \|
	\| w \| ALL \| C \|
	\| Z \| The FD-PIC GOT pointer (P3) \| S \|
	\| Y \| The FD-PIC function pointer register (P1) \| S \|
	\| q0-q7 \| R0-R7 individually \| \|
	\| qA \| P0 \| \|
	\|-------+-------------------------------------------+------\|
	\| Code \| Constant \| \|
	\|-------+-------------------------------------------+------\|
	\| J \| 1<<N, N<32 \| \|
	\| Ks3 \| imm3 \| \|
	\| Ku3 \| uimm3 \| \|
	\| Ks4 \| imm4 \| \|
	\| Ku4 \| uimm4 \| \|
	\| Ks5 \| imm5 \| \|
	\| Ku5 \| uimm5 \| \|
	\| Ks7 \| imm7 \| \|
	\| KN7 \| -imm7 \| \|
	\| Ksh \| imm16 \| \|
	\| Kuh \| uimm16 \| \|
	\| L \| ~(1<<N) \| \|
	\| M1 \| 0xff \| \|
	\| M2 \| 0xffff \| \|
	\| P0-P4 \| 0-4 \| \|
	\| PA \| Macflag, not M \| \|
	\| PB \| Macflag, only M \| \|
	\| Q \| Symbol \| \|

	** TODO Support all register classes
	* DAG combiner
	** Create test case for each Illegal SETCC case
	The DAG combiner may someimes produce illegal i16 SETCC instructions.

	*** TODO SETCC (ctlz x), 5) == const
	*** TODO SETCC (and load, const) == const
	*** DONE SETCC (zext x) == const
	*** TODO SETCC (sext x) == const

	* Instruction selection
	** TODO Better imediate constants
	Like ARM, build constants as small imm + shift.

	** TODO Implement cycle counter
	We have CYCLES and CYCLES2 registers, but the readcyclecounter intrinsic wants
	to return i64, and the code generator doesn't know how to legalize that.

	** TODO Instruction alternatives
	Some instructions come in different variants for example:

	D = D + D
	P = P + P

	Cross combinations are not allowed:

	P = D + D (bad)

	Similarly for the subreg pseudo-instructions:

	D16L = EXTRACT_SUBREG D16, bfin_subreg_lo16
	P16L = EXTRACT_SUBREG P16, bfin_subreg_lo16

	We want to take advantage of the alternative instructions. This could be done by
	changing the DAG after instruction selection.


	** Multipatterns for load/store
	We should try to identify multipatterns for load and store instructions. The
	available instruction matrix is a bit irregular.

	Loads:

	\| Addr \| D \| P \| D 16z \| D 16s \| D16 \| D 8z \| D 8s \|
	\|------------+---+---+-------+-------+-----+------+------\|
	\| P \| * \| * \| * \| * \| * \| * \| * \|
	\| P++ \| * \| * \| * \| * \| \| * \| * \|
	\| P-- \| * \| * \| * \| * \| \| * \| * \|
	\| P+uimm5m2 \| \| \| * \| * \| \| \| \|
	\| P+uimm6m4 \| * \| * \| \| \| \| \| \|
	\| P+imm16 \| \| \| \| \| \| * \| * \|
	\| P+imm17m2 \| \| \| * \| * \| \| \| \|
	\| P+imm18m4 \| * \| * \| \| \| \| \| \|
	\| P++P \| * \| \| * \| * \| * \| \| \|
	\| FP-uimm7m4 \| * \| * \| \| \| \| \| \|
	\| I \| * \| \| \| \| * \| \| \|
	\| I++ \| * \| \| \| \| * \| \| \|
	\| I-- \| * \| \| \| \| * \| \| \|
	\| I++M \| * \| \| \| \| \| \| \|

	Stores:

	\| Addr \| D \| P \| D16H \| D16L \| D 8 \|
	\|------------+---+---+------+------+-----\|
	\| P \| * \| * \| * \| * \| * \|
	\| P++ \| * \| * \| \| * \| * \|
	\| P-- \| * \| * \| \| * \| * \|
	\| P+uimm5m2 \| \| \| \| * \| \|
	\| P+uimm6m4 \| * \| * \| \| \| \|
	\| P+imm16 \| \| \| \| \| * \|
	\| P+imm17m2 \| \| \| \| * \| \|
	\| P+imm18m4 \| * \| * \| \| \| \|
	\| P++P \| * \| \| * \| * \| \|
	\| FP-uimm7m4 \| * \| * \| \| \| \|
	\| I \| * \| \| * \| * \| \|
	\| I++ \| * \| \| * \| * \| \|
	\| I-- \| * \| \| * \| * \| \|
	\| I++M \| * \| \| \| \| \|

	* Workarounds and features
	Blackfin CPUs have bugs. Each model comes in a number of silicon revisions with
	different bugs. We learn about the CPU model from the -mcpu switch.

	** Interpretation of -mcpu value
	- -mcpu=bf527 refers to the latest known BF527 revision
	- -mcpu=bf527-0.2 refers to silicon rev. 0.2
	- -mcpu=bf527-any refers to all known revisions
	- -mcpu=bf527-none disables all workarounds

	The -mcpu setting affects the __SILICON_REVISION__ macro and enabled workarounds:

	\| -mcpu \| __SILICON_REVISION__ \| Workarounds \|
	\|------------+----------------------+--------------------\|
	\| bf527 \| Def Latest \| Specific to latest \|
	\| bf527-1.3 \| Def 0x0103 \| Specific to 1.3 \|
	\| bf527-any \| Def 0xffff \| All bf527-x.y \|
	\| bf527-none \| Undefined \| None \|

	These are the known cores and revisions:

	\| Core \| Silicon \| Processors \|
	\|-------------+--------------------+-------------------------\|
	\| Edinburgh \| 0.3, 0.4, 0.5, 0.6 \| BF531 BF532 BF533 \|
	\| Braemar \| 0.2, 0.3 \| BF534 BF536 BF537 \|
	\| Stirling \| 0.3, 0.4, 0.5 \| BF538 BF539 \|
	\| Moab \| 0.0, 0.1, 0.2 \| BF542 BF544 BF548 BF549 \|
	\| Teton \| 0.3, 0.5 \| BF561 \|
	\| Kookaburra \| 0.0, 0.1, 0.2 \| BF523 BF525 BF527 \|
	\| Mockingbird \| 0.0, 0.1 \| BF522 BF524 BF526 \|
	\| Brodie \| 0.0, 0.1 \| BF512 BF514 BF516 BF518 \|


	** Compiler implemented workarounds
	Most workarounds are implemented in header files and source code using the
	__ADSPBF527__ macros. A few workarounds require compiler support.

	\| Anomaly \| Macro \| GCC Switch \|
	\|----------+--------------------------------+------------------\|
	\| Any \| __WORKAROUNDS_ENABLED \| \|
	\| 05000074 \| WA_05000074 \| \|
	\| 05000244 \| __WORKAROUND_SPECULATIVE_SYNCS \| -mcsync-anomaly \|
	\| 05000245 \| __WORKAROUND_SPECULATIVE_LOADS \| -mspecld-anomaly \|
	\| 05000257 \| WA_05000257 \| \|
	\| 05000283 \| WA_05000283 \| \|
	\| 05000312 \| WA_LOAD_LCREGS \| \|
	\| 05000315 \| WA_05000315 \| \|
	\| 05000371 \| __WORKAROUND_RETS \| \|
	\| 05000426 \| __WORKAROUND_INDIRECT_CALLS \| Not -micplb \|

	** GCC feature switches
	\| Switch \| Description \|
	\|---------------------------+----------------------------------------\|
	\| -msim \| Use simulator runtime \|
	\| -momit-leaf-frame-pointer \| Omit frame pointer for leaf functions \|
	\| -mlow64k \| \|
	\| -mcsync-anomaly \| \|
	\| -mspecld-anomaly \| \|
	\| -mid-shared-library \| \|
	\| -mleaf-id-shared-library \| \|
	\| -mshared-library-id= \| \|
	\| -msep-data \| Enable separate data segment \|
	\| -mlong-calls \| Use indirect calls \|
	\| -mfast-fp \| \|
	\| -mfdpic \| \|
	\| -minline-plt \| \|
	\| -mstack-check-l1 \| Do stack checking in L1 scratch memory \|
	\| -mmulticore \| Enable multicore support \|
	\| -mcorea \| Build for Core A \|
	\| -mcoreb \| Build for Core B \|
	\| -msdram \| Build for SDRAM \|
	\| -micplb \| Assume ICPLBs are enabled at runtime. \|