mirror of https://github.com/openjdk/jdk.git synced 2026-06-11 04:57:12 +00:00

Go to file

erfang 55d4e81958 8382052: VectorAPI: AArch64: Optimize the lanewise BITWISE_BLEND operation with BSL

Vector API `lanewise BITWISE_BLEND` on AArch64 is currently lowered to a
generic vector sequence built from `(XorV(AndV(XorV)))` nodes. AArch64
provides a more efficient mapping for this operation through the NEON
`BSL` and SVE `BSL` (bitwise select) instructions.

This change teaches C2 to recognize the `BITWISE_BLEND` patterns and
lower them to the dedicated AArch64 instructions for better performance.

The change includes the AArch64 match rules and assembler support,
updates the AArch64 asm tests, adds IR framework nodes for the new mach
instructions, introduces a new jtreg IR test and extends the
MaskedLogicOpts JMH benchmark for 128-bit long type.

JMH results show **11% - 54%** performance improvements for the
optimized cases, and all jtreg tests (tier1, tier2 and tier3) passe on
SVE2, SVE1, and NEON configurations.

On a Nvidia Grace (Neoverse-V2) machine with 128-bit SVE2:
```
Benchmark	                Unit	ARRAYLEN Before	    Error    After	Error	Uplift
bitwiseBlendOperationInt128	ops/s	256.00	 3787.49    5.29     4277.64	8.89	1.13
bitwiseBlendOperationInt128	ops/s	512.00	 1888.24    11.02    2143.21	6.32	1.14
bitwiseBlendOperationInt128	ops/s	1024.00	 938.22	    6.24     1053.45	14.68	1.12
bitwiseBlendOperationLong128	ops/s	256.00	 1895.45    13.68    2140.31	3.68	1.13
bitwiseBlendOperationLong128	ops/s	512.00	 938.71	    5.32     1052.16	14.07	1.12
bitwiseBlendOperationLong128	ops/s	1024.00	 474.15	    2.33     526.49	2.62	1.11
```

On an AWS Graviton3 (Neoverse-V1) machine with 256-bit SVE1:
```
Benchmark	                Unit	ARRAYLEN Before	    Error    After	Error	Uplift
bitwiseBlendOperationInt128	ops/s	256.00	 2051.52    13.85    2481.44	0.27	1.21
bitwiseBlendOperationInt128	ops/s	512.00	 995.47	    20.77    1235.10	5.70	1.24
bitwiseBlendOperationInt128	ops/s	1024.00	 507.73	    9.83     617.59	2.43	1.22
bitwiseBlendOperationLong128	ops/s	256.00	 1000.99    21.50    1235.39	5.48	1.23
bitwiseBlendOperationLong128	ops/s	512.00	 507.73	    9.74     617.67	2.32	1.22
bitwiseBlendOperationLong128	ops/s	1024.00	 258.86	    0.01     310.70	0.04	1.20
```

On a Nvidia Grace (Neoverse-V2) machine with 128-bit NEON:
```
Benchmark	                Unit	ARRAYLEN Before	    Error    After	Error	Uplift
bitwiseBlendOperationInt128	ops/s	256.00	 2336.17    13.18    3505.19	19.61	1.50
bitwiseBlendOperationInt128	ops/s	512.00	 1145.50    12.40    1735.24	10.43	1.51
bitwiseBlendOperationInt128	ops/s	1024.00	 571.41	    6.51     866.01	3.34	1.52
bitwiseBlendOperationLong128	ops/s	256.00	 1140.38    13.77    1740.28	11.16	1.53
bitwiseBlendOperationLong128	ops/s	512.00	 570.20	    7.58     865.67	3.33	1.52
bitwiseBlendOperationLong128	ops/s	1024.00	 280.94	    2.58     432.78	0.19	1.54
```

2026-05-25 02:08:05 +00:00

.github

8385011: GHA: Enable Linux AArch64 tests

2026-05-20 16:12:28 +00:00

.jcheck

8370890: Start of release updates for JDK 27

2025-12-04 17:01:41 +00:00

bin

8375649: idea.sh script adds source paths in a single, enormous, line to jdk.iml

2026-04-15 09:22:34 +00:00

doc

8378157: Section hyperlink in doc/testing.md refers to building.html instead of building.md