Ehsan Behrangi
|
00cc9be854
|
8381560: AArch64: Optimize String.equals intrinsic
This change improves the AArch64 implementation of String.equals by
introducing SIMD-based fast paths using SVE and NEON.
SVE implementation:
- Uses predicated loads and comparisons for short lengths (len < VL)
- Uses a full predicated loop for longer inputs
- Handles the tail via an overlapped compare at (base + len - VL)
NEON implementation:
- Uses an 8-byte pre-read to simplify tail handling and eliminate
4/2/1-byte scalar branches
- Processes 16-byte chunks using LDP pair loads
- Uses CMP/CCMP to collapse comparisons into a single branch on mismatch
These changes reduce branch pressure and improve throughput for both
short and long strings.
Correctness:
- The implementation preserves existing semantics and matches behavior
for all lengths
Testing:
- Updated and extended intrinsic tests to cover boundary conditions
and mismatch positions
Benchmark:
Across evaluated macrobenchmarks (DaCapo and Renaissance), most workloads
spend <0.5% of CPU time in String.equals. DaCapo biojava is a notable
exception (~8–9%). In biojava, most String.equals calls are on very short
strings (1–2 bytes), where SVE shows ~1% end-to-end improvement, while
NEON is largely neutral or shows a small regression (~1%).
Measured using JMH on AArch64 (Arm Neoverse V2 CPU).
Values are relative (%) vs baseline. Negative values indicate regressions.
Mismatch results are reported across first(DF), middle(DM),
and last(DL) difference positions.
SVE results:
Length | L1_EQ L1_DF L1_DM L1_DL | U16_EQ U16_DF U16_DM U16_DL | Avg
-------+----------------------------+-----------------------------+------
0 | 19.63 | 20.05 | 19.84
1 | 16.59 17.81 16.57 18.34 | 16.02 0.71 0.42 1.39 | 10.98
2 | 16.44 1.32 0.30 -0.16 | 15.90 -5.17 -4.55 -1.09 | 2.87
3 | 26.58 1.60 1.43 27.07 | 30.34 -8.86 -7.06 14.08 | 10.65
7 | 41.47 -2.94 -3.37 39.82 | 24.02 -8.82 -6.27 20.48 | 13.05
8 | 19.08 -1.16 -3.50 -0.90 | 22.49 -9.75 17.50 13.13 | 7.11
9 | 20.17 -4.12 -5.17 19.03 | 9.25 -2.24 21.35 3.39 | 7.71
15 | 19.48 -3.83 -4.50 19.01 | 29.26 -10.06 11.76 17.07 | 9.77
16 | 19.04 -3.15 16.41 16.85 | 38.37 -11.12 13.18 27.70 | 14.66
17 | 8.95 -2.40 5.68 6.38 | 16.32 -1.61 7.49 11.44 | 6.53
31 | 28.87 -0.01 19.79 23.37 | 41.43 -7.57 23.85 35.89 | 20.70
32 | 32.58 3.38 12.39 26.90 | 46.01 -10.99 20.53 44.15 | 21.87
33 | 11.62 -15.20 6.04 13.27 | 32.27 -9.38 20.33 32.28 | 11.40
63 | 44.66 -11.59 37.20 42.56 | 55.41 -10.57 43.19 55.90 | 32.10
64 | 53.99 -2.19 27.04 51.79 | 59.36 -8.72 35.41 60.32 | 34.63
65 | 33.79 -14.01 23.95 29.15 | 48.91 -11.58 36.54 50.03 | 24.60
127 | 62.10 -3.79 47.51 62.79 | 58.13 -8.89 60.68 60.90 | 42.43
128 | 67.38 -2.47 38.62 67.09 | 62.83 -0.38 51.72 61.87 | 43.33
129 | 52.02 -1.42 39.17 49.20 | 55.04 -9.52 53.23 52.81 | 36.32
256 | 66.11 -1.38 56.12 64.93 | 70.67 -3.68 53.67 74.54 | 47.62
Average:
33.03 -2.40 17.46 30.34 | 37.60 -7.27 23.84 33.49 | 20.91
NEON results:
Length | L1_EQ L1_DF L1_DM L1_DL | U16_EQ U16_DF U16_DM U16_DL | Avg
-------+----------------------------+-----------------------------+------
0 | 9.22 | 8.69 | 8.95
1 | 3.07 3.59 1.34 5.42 | 6.36 -6.20 -6.71 -10.59 | -0.47
2 | 3.23 -4.79 -5.67 -4.09 | 8.06 -8.43 -9.89 -9.20 | -3.85
3 | 12.80 -4.16 -3.95 11.28 | 11.94 -14.50 -14.41 11.83 | 1.36
7 | 31.00 -7.21 -12.76 33.59 | 4.73 -17.67 -17.38 1.65 | 1.99
8 | 4.43 -7.20 -4.70 -6.73 | 2.71 -18.05 -3.17 -4.05 | -4.59
9 | -9.33 -19.90 -16.27 -1.80 | 16.65 -23.72 4.26 8.78 | -5.17
15 | -6.96 -16.17 -15.60 -4.01 | 7.46 -24.60 -3.19 77.82 | 1.84
16 | 2.48 -16.38 -2.56 -3.62 | 9.08 -19.29 -5.45 77.93 | 5.27
17 | 4.88 -18.85 -0.18 19.35 | 18.43 -19.80 -8.37 84.96 | 10.05
31 | 6.92 -21.13 -4.62 60.71 | 24.42 -21.81 9.48 188.59 | 30.32
32 | 7.75 -24.20 -5.29 68.23 | 25.33 -20.57 4.17 183.65 | 29.88
33 | 20.23 -20.42 -11.33 98.60 | 23.76 -24.76 5.97 188.57 | 35.08
63 | 30.25 -22.30 14.29 152.37 | 25.02 -28.37 21.43 419.68 | 76.55
64 | 28.99 -22.91 9.03 185.51 | 38.20 -22.82 19.76 446.60 | 85.29
65 | 16.13 -21.77 1.45 211.38 | 27.94 -24.79 17.50 446.80 | 84.33
127 | 33.69 -28.94 28.75 429.23 | 41.75 -24.86 37.35 832.68 |168.71
128 | 26.28 -29.03 24.13 432.87 | 43.48 -18.53 26.44 810.20 |164.48
129 | 27.73 -20.30 20.84 439.01 | 44.09 -22.35 30.09 827.38 |168.31
256 | 53.30 -20.27 26.09 841.37 | 56.66 -21.07 47.41 1604.98|323.56
Average:
15.30 -16.97 2.26 156.24 | 22.24 -20.12 8.17 325.70 | 59.10
Observations:
- SVE shows consistent improvements across all tested lengths, with gains
increasing as input size grows
- NEON improves equal-string performance across all lengths
- NEON shows regressions for short mismatched inputs due to the loss
of the scalar tbz-based early-exit sequence, which efficiently
detects mismatches at small sizes and at early positions
- The scalar implementation relies on a branchy 4/2/1 tbz ladder,
which is efficient for early mismatches but suboptimal for equal
strings
- The NEON implementation replaces this with a branchless SIMD
approach and performs upfront comparisons of the first and last
8 bytes, improving throughput and late-mismatch detection
|
2026-06-05 12:22:15 +01:00 |
|
Mohamed Issa
|
bb4d2abb0f
|
8382482: Optimize equals scenario in x86 scalar floating point min/max reduction loops
Reviewed-by: sviswanathan, epeter, sparasa
|
2026-05-28 20:16:12 +00:00 |
|
Xueming Shen
|
185d933bb9
|
8376602: [Vector API] Upgrade SLEEF from 3.6.1 to 3.9.0
Reviewed-by: psandoz, fyang, erikj
|
2026-05-27 04:56:50 +00:00 |
|
Xiaohong Gong
|
6a07b21e9a
|
8378737: AArch64: Fix SVE match rule issues for VectorMask.andNot()
Reviewed-by: vlivanov, aph
|
2026-05-21 01:40:27 +00:00 |
|
Alan Bateman
|
4edfc387f1
|
8377070: Update jimage format to support classes compiled with preview feature enabled
Co-authored-by: David Beaumont <dbeaumont@openjdk.org>
Reviewed-by: jpai, coleenp, sgehwolf
|
2026-05-12 10:09:28 +00:00 |
|
Jatin Bhateja
|
7ff7efd59d
|
8358521: Optimize vector operations by reassociating broadcasted inputs
Reviewed-by: epeter, vlivanov, xgong
|
2026-05-12 06:18:37 +00:00 |
|
Daniel Gredler
|
975e209244
|
8380794: AttributedString performance
Reviewed-by: jlu, naoto
|
2026-05-11 23:24:19 +00:00 |
|
Galder Zamarreño
|
af9ed6c022
|
8382881: Swap min/max values and avoid equals min/max values in MinMaxVector
Reviewed-by: roland
|
2026-05-07 09:20:45 +00:00 |
|
Paul Hübner
|
8de6298ed5
|
8379630: Add JMH benchmark to measure the overhead of using captured call state
Reviewed-by: pminborg, jvernee, liach
|
2026-05-07 07:57:44 +00:00 |
|
Quan Anh Mai
|
41a5c032f5
|
8382700: C2: Delay inlining instead of giving up when hit NodeCountInliningCutoff
Co-authored-by: Vladimir Ivanov <vlivanov@openjdk.org>
Co-authored-by: Maurizio Cimadamore <mcimadamore@openjdk.org>
Co-authored-by: Ioannis Tsakpinis <iotsakp@gmail.com>
Reviewed-by: kvn, vlivanov
|
2026-04-30 18:17:38 +00:00 |
|
John Engebretson
|
13c92d0d4d
|
8371656: HashMap.putAll() optimizations
Reviewed-by: smarks
|
2026-04-28 21:38:55 +00:00 |
|
Bhavana Kilambi
|
3384c6736d
|
8366444: Add support for add/mul reduction operations for Float16
Reviewed-by: jbhateja, mchevalier, xgong, epeter
|
2026-04-15 12:27:56 +00:00 |
|
Eric Fang
|
436d291a1c
|
8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns
Reviewed-by: xgong, vlivanov, galder
|
2026-04-15 08:24:51 +00:00 |
|
Evgeny Astigeevich
|
9cf2b686bd
|
8381003: [REDO] Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance
Reviewed-by: aph
|
2026-04-07 10:35:31 +00:00 |
|
Emanuel Peter
|
9607a7284d
|
8380513: [VectorAlgorithms] mismatch benchmark and test
Reviewed-by: mchevalier, galder, chagedorn
|
2026-04-01 12:18:49 +00:00 |
|
Emanuel Peter
|
7df06d1489
|
8379395: [VectorAlgorithms] new dot-product implementation using fma
Reviewed-by: mchevalier, chagedorn
|
2026-04-01 12:18:29 +00:00 |
|
Daniel Gredler
|
f46a698113
|
8381015: CharsetEncoder.canEncode(CharSequence) is slow for UTF-8, UTF-16, UTF-32
Reviewed-by: naoto, vyazici
|
2026-03-31 21:46:05 +00:00 |
|
Mohamed Issa
|
1a99655554
|
8378295: Update scalar AVX10 floating point min/max definitions
Reviewed-by: sviswanathan, mhaessig, jbhateja, sparasa
|
2026-03-27 04:56:30 +00:00 |
|
Brian Burkhalter
|
40d65f1063
|
8379583: (fs) Files.copy use of posix_fadvise is problematic on Linux
Reviewed-by: alanb
|
2026-03-26 18:38:04 +00:00 |
|
Joel Sikström
|
4dca6e4ca8
|
8380903: [BACKOUT] Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance
Reviewed-by: aboldtch
|
2026-03-25 14:01:26 +00:00 |
|
Evgeny Astigeevich
|
3737cad6d9
|
8370947: Mitigate Neoverse-N1 erratum 1542419 negative impact on GCs and JIT performance
Co-authored-by: Axel Boldt-Christmas <aboldtch@openjdk.org>
Reviewed-by: shade, eosterlund, aph, aboldtch
|
2026-03-25 12:46:25 +00:00 |
|
Eirik Bjørsnøs
|
0379c0b005
|
8379557: Further optimize URL.toExternalForm
Reviewed-by: vyazici
|
2026-03-18 14:36:58 +00:00 |
|
Emanuel Peter
|
b2728d0a4b
|
8376891: [VectorAlgorithms] add more if-conversion benchmarks and tests
Reviewed-by: qamai, psandoz, xgong, jbhateja
|
2026-03-09 07:26:02 +00:00 |
|
Patrick Strawderman
|
08c8520b39
|
8378698: Optimize Base64.Encoder#encodeToString
Reviewed-by: liach, rriggs
|
2026-03-04 20:04:30 +00:00 |
|
Liam Miller-Cushon
|
0fbf58d8ff
|
8372353: API to compute the byte length of a String encoded in a given Charset
Reviewed-by: rriggs, naoto, vyazici
|
2026-03-04 17:33:32 +00:00 |
|
Fei Yang
|
b7d0cb5fb3
|
8378888: jdk/incubator/vector/Float16OperationsBenchmark.java uses wrong package name
Reviewed-by: jiefu, jbhateja, syan, liach
|
2026-03-02 12:49:01 +00:00 |
|
Jasmine Karthikeyan
|
074044c2f3
|
8342095: Add autovectorizer support for subword vector casts
Reviewed-by: epeter, qamai
|
2026-02-26 05:15:30 +00:00 |
|
Jatin Bhateja
|
6abb29cc07
|
8376794: Enable copy and mismatch Partial Inlining for AMD AVX512 targets
Reviewed-by: sviswanathan, thartmann
|
2026-02-12 06:52:08 +00:00 |
|
Mohamed Issa
|
161aa5d528
|
8371955: Support AVX10 floating point comparison instructions
Reviewed-by: epeter, sviswanathan, sparasa
|
2026-02-09 19:14:46 +00:00 |
|
Eirik Bjørsnøs
|
986d377224
|
8376533: Remove test dependencies on ReferenceQueue$Lock in preparation for JDK-8376477
Reviewed-by: rriggs, shade, cjplummer
|
2026-02-06 17:06:04 +00:00 |
|
Alan Bateman
|
ac6e8d481a
|
8376568: Change Thread::getStackTrace to use handshake op for all cases
Reviewed-by: pchilanomate, sspitsyn
|
2026-02-05 13:46:23 +00:00 |
|
Eric Fang
|
d0e9730783
|
8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations
Co-authored-by: Andrew Haley <aph@openjdk.org>
Reviewed-by: aph, xgong
|
2026-02-05 07:58:33 +00:00 |
|
Emanuel Peter
|
06d1345f29
|
8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark
Co-authored-by: Otmar Ertl <otmar.ertl@dynatrace.com>
Reviewed-by: vlivanov, jbhateja, psandoz, xgong
|
2026-01-29 08:39:10 +00:00 |
|
Daniel Gredler
|
992a8ef46b
|
8376226: CharsetEncoder.canEncode(CharSequence) is much slower than necessary
Reviewed-by: alanb, naoto
|
2026-01-27 13:20:26 +00:00 |
|
Hai-May Chao
|
21dc41f744
|
8314323: Implement JEP 527: TLS 1.3 Hybrid Key Exchange
Co-authored-by: Jamil Nimeh <jnimeh@openjdk.org>
Co-authored-by: Weijun Wang <weijun@openjdk.org>
Reviewed-by: wetmore, mullan
|
2026-01-20 16:16:38 +00:00 |
|
Liam Miller-Cushon
|
d433ce5236
|
8369564: Provide a MemorySegment API to read strings with known lengths
Co-authored-by: Per Minborg <pminborg@openjdk.org>
Reviewed-by: jvernee, mcimadamore
|
2026-01-12 15:22:42 +00:00 |
|
Jeremy Wood
|
2a965dffdd
|
8374377: PNGImageDecoder Slow For 8-bit PNGs
Reviewed-by: jdv, prr
|
2026-01-09 09:56:39 +00:00 |
|
Jonas Norlinder
|
c834e4c641
|
8373647: Avoid fstat when opening file for write with RandomAccessFile or FileOutputStream
Reviewed-by: redestad, alanb
|
2026-01-08 16:46:28 +00:00 |
|
Sergey Bylokhov
|
c6246d58f7
|
8374383: Update the copyright year to 2025 in the remaining files under test/ where it was missed
Reviewed-by: jpai
|
2025-12-31 10:04:45 +00:00 |
|
Sergey Bylokhov
|
5c694eab0f
|
8374363: Update copyright year to 2025 for test/micro in files where it was missed
Reviewed-by: phh
|
2025-12-27 04:45:56 +00:00 |
|
Mark Powers
|
817e3dfde9
|
8350711: [JMH] test Signatures.RSASSAPSS failed for 2 threads config
Reviewed-by: hchao, valeriep
|
2025-12-16 18:38:11 +00:00 |
|
Justin Lu
|
81e3757688
|
8373566: Performance regression with java.text.MessageFormat subformat patterns
Reviewed-by: liach, rriggs, naoto
|
2025-12-16 18:11:37 +00:00 |
|
Emanuel Peter
|
650de99fc6
|
8367158: C2: create better fill and copy benchmarks, taking alignment into account
Reviewed-by: qamai, kvn
|
2025-12-12 07:17:17 +00:00 |
|
Hamlin Li
|
6700baa505
|
8357551: RISC-V: support CMoveF/D vectorization
Reviewed-by: fyang, luhenry
|
2025-12-08 13:38:22 +00:00 |
|
Qizheng Xing
|
b83bf0717e
|
8360192: C2: Make the type of count leading/trailing zero nodes more precise
Reviewed-by: qamai, epeter, jbhateja
|
2025-12-08 13:16:39 +00:00 |
|
Jonas Norlinder
|
858d2e434d
|
8372584: [Linux]: Replace reading proc to get thread user CPU time with clock_gettime
Reviewed-by: dholmes, kevinw, redestad
|
2025-12-03 09:35:59 +00:00 |
|
Xueming Shen
|
b97ed667db
|
8365675: Add String Unicode Case-Folding Support
Reviewed-by: rriggs, naoto, ihse
|
2025-12-02 19:47:18 +00:00 |
|
Per Minborg
|
1ce2a44e9f
|
8371571: Consolidate and enhance bulk memory segment ops benchmarks
Reviewed-by: jvernee
|
2025-11-26 15:11:10 +00:00 |
|
Galder Zamarreño
|
a7bb99ed00
|
8372119: Missing copyright header in MinMaxVector
Reviewed-by: chagedorn, thartmann
|
2025-11-24 09:24:19 +00:00 |
|
Josiah Noel
|
ea19ad2ac8
|
8347167: Reduce allocation in com.sun.net.httpserver.Headers::normalize
Reviewed-by: vyazici, dfuchs, michaelm
|
2025-11-20 15:54:25 +00:00 |
|