Why the removal of the Linux Kernel PAE config could reduce the PCIE BAR IO access speed significantly

93 views Asked by At

I have a intel atom j1900 board with 2G memory and a FPGA pcie device developed. the FPGA pcie device information is shown belows:

root@GNS:~# lspci -s 03:00.0 -vvv
03:00.0 Non-VGA unclassified device: Analog Devices Device 1536
        Subsystem: Analog Devices Device 0007
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 95
        Region 0: Memory at d0b00000 (32-bit, non-prefetchable) [size=1M]
        Region 1: Memory at d0a00000 (32-bit, non-prefetchable) [size=1M]
        Region 2: Memory at d0900000 (32-bit, non-prefetchable) [size=1M]
        Region 3: Memory at d0800000 (32-bit, non-prefetchable) [size=1M]
        Region 4: Memory at d0700000 (32-bit, non-prefetchable) [size=1M]
        Region 5: Memory at d0600000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+
                Address: 00000000fee0100c  Data: 4172
        Capabilities: [78] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [80] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM not supported, Exit Latency L0s <4us, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [200 v1] Vendor Specific Information: ID=1172 Rev=0 Len=044 <?>
        Kernel driver in use: MyDriver_PCIe_i686

when I compiled a kernel without PAE support, the bar io access performance of this device was reduced significantly. here is the test result under none-PAE and PAE kernel. My test OS environment is debian 9 stretch

Kernel Version Bytes Length Write Throughput Read Throughput Write Time Read Time
4.9.0-13-686 2048 96.45 Mib/s 16.26 Mib/s 162 us 961 us
4.9.0-13-686 1024 65.10 Mib/s 15.20 Mib/s 120 us 514 us
4.9.0-13-686 512 38.30 Mib/s 13.38 Mib/s 102 us 292 us
4.9.0-13-686 256 21.23 Mib/s 10.56 Mib/s 92 us 185 us
4.9.0-13-686 128 10.97 Mib/s 7.57 Mib/s 89 us 129 us
4.9.0-13-686 64 5.55 Mib/s 4.70 Mib/s 88 us 104 us
4.9.0-13-686 32 2.74 Mib/s 2.81 Mib/s 89 us 87 us
4.9.0-13-686 8 0.69 Mib/s 0.78 Mib/s 88 us 78 us
4.9.0-13-686 4 0.35 Mib/s 0.40 Mib/s 87 us 77 us
4.9.0-13-686-pae 2048 181.69 Mib/s 17.48 Mib/s 86 us 894 us
4.9.0-13-686-pae 1024 166.22 Mib/s 17.32 Mib/s 47 us 451 us
4.9.0-13-686-pae 512 139.51 Mib/s 17.13 Mib/s 28 us 228 us
4.9.0-13-686-pae 256 108.51 Mib/s 16.41 Mib/s 18 us 119 us
4.9.0-13-686-pae 128 61.04 Mib/s 15.50 Mib/s 16 us 63 us
4.9.0-13-686-pae 64 37.56 Mib/s 13.95 Mib/s 13 us 35 us
4.9.0-13-686-pae 32 18.78 Mib/s 11.10 Mib/s 13 us 22 us
4.9.0-13-686-pae 8 4.36 Mib/s 5.09 Mib/s 14 us 12 us
4.9.0-13-686-pae 4 2.35 Mib/s 3.39 Mib/s 13 us 9 us
4.9.0-18-rt-686-pae 2048 171.70 Mib/s 17.42 Mib/s 91 us 897 us
4.9.0-18-rt-686-pae 1024 159.44 Mib/s 17.17 Mib/s 49 us 455 us
4.9.0-18-rt-686-pae 512 126.01 Mib/s 16.84 Mib/s 31 us 232 us
4.9.0-18-rt-686-pae 256 93.01 Mib/s 15.88 Mib/s 21 us 123 us
4.9.0-18-rt-686-pae 128 54.25 Mib/s 15.02 Mib/s 18 us 65 us
4.9.0-18-rt-686-pae 64 30.52 Mib/s 13.20 Mib/s 16 us 37 us
4.9.0-18-rt-686-pae 32 15.26 Mib/s 9.77 Mib/s 16 us 25 us
4.9.0-18-rt-686-pae 8 4.70 Mib/s 4.07 Mib/s 13 us 15 us
4.9.0-18-rt-686-pae 4 2.35 Mib/s 2.35 Mib/s 13 us 13 us
0

There are 0 answers