Commit graph

932 commits

Author SHA1 Message Date
ReinUsesLisp
4e35177e23 shader_ir: Implement VOTE
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics

Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.

To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:

* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true

ballotARB, also known as "uint64_t(activeThreadsNV())", emits

VOTE.ANY Rd, PT, PT;

on nouveau's compiler. This doesn't match exactly to Nvidia's code

VOTE.ALL Rd, PT, PT;

Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
2019-08-21 14:50:38 -03:00
Fernando Sahmkow
febb88efc4 Common/Alignment: Add noexcept where required. 2019-07-19 21:49:54 -04:00
Fernando Sahmkow
024b5fe91a Kernel: Address Feedback 2019-07-19 11:28:57 -04:00
Fernando Sahmkow
0901c33753 Common: Correct alignment allocator to work on C++14 or higher. 2019-07-19 11:11:42 -04:00
Fernando Sahmkow
9bede4eeed VM_Manager: Align allocated memory to 256bytes
This commit ensures that all backing memory allocated for the Guest CPU
is aligned to 256 bytes. This due to how gpu memory works and the heavy
constraints it has in the alignment of physical memory.
2019-07-19 10:06:08 -04:00
Fernando Sahmkow
8af6e6a052 shader_ir: Implement a new shader scanner 2019-07-09 08:14:36 -04:00
Fernando Sahmkow
3b9d89839d texture_cache: Address Feedback 2019-07-05 09:46:53 -04:00
ReinUsesLisp
de982deb25 common/alignment: Address feedback 2019-06-24 01:47:09 -03:00
ReinUsesLisp
06c4ce8645 shader: Decode SUST and implement backing image functionality 2019-06-20 21:38:33 -03:00
Fernando Sahmkow
94f2be5473 texture_cache: Optimize GetMipBlockHeight and GetMipBlockDepth 2019-06-20 21:36:12 -03:00
ReinUsesLisp
345e73f2fe video_core: Use un-shifted block sizes to avoid integer divisions
Instead of storing all block width, height and depths in their shifted
form:

block_width = 1U << block_shift;

Store them like they are provided by the emulated hardware (their
block_shift form). This way we can avoid doing the costly
Common::AlignUp operation to align texture sizes and drop CPU integer
divisions with bitwise logic (defined in Common::AlignBits).
2019-06-20 21:36:12 -03:00
Fernando Sahmkow
b347543e83 Reduce amount of size calculations. 2019-06-20 21:36:12 -03:00
Lioncash
969cd6dc1d
common/hex_util: Reserve std::string memory ahead of time
Avoids potentially performing multiple reallocations (depending on the
size of the input data) by reserving the necessary amount of memory
ahead of time.

This is trivially doable, so there's no harm in it.
2019-06-12 17:54:11 -04:00
Lioncash
a62088539e
common/hex_util: Combine HexVectorToString() and HexArrayToString()
These can be generified together by using a concept type to designate
them. This also has the benefit of not making copies of potentially very
large arrays.
2019-06-12 17:54:05 -04:00
ReinUsesLisp
dec1cbaf7f cmake: Add missing shader hash file entries 2019-06-06 20:11:48 -03:00
Lioncash
1edf018319 common/math_util: Provide a template deduction guide for Common::Rectangle
Allows for things such as:

auto rect = Common::Rectangle{0, 0, 0, 0};

as opposed to being required to explicitly write out the underlying
type, such as:

auto rect = Common::Rectangle<int>{0, 0, 0, 0};

The only requirement for the deduction is that all constructor arguments
be the same type.
2019-05-31 04:44:02 -03:00
bunnei
ed74a3cb8b
Merge pull request #1931 from DarkLordZach/mii-database-1
mii: Implement MiiManager backend and several mii service commands
2019-05-30 13:26:40 -04:00
Lioncash
e7ab0e9127 common/file_util: Remove unnecessary return at end of void StripTailDirSlashes()
While we're at it, also invert the conditional into a guard clause.
2019-05-23 14:33:29 -04:00
Lioncash
11e9bee91d common/file_util: Make GetCurrentDir() return a std::optional
nullptr was being returned in the error case, which, at a glance may
seem perfectly OK... until you realize that std::string has the
invariant that it may not be constructed from a null pointer. This
means that if this error case was ever hit, then the application would
most likely crash from a thrown exception in std::string's constructor.

Instead, we can change the function to return an optional value,
indicating if a failure occurred.
2019-05-23 14:24:13 -04:00
Lioncash
943f6da1ac common/file_util: Remove duplicated documentation comments
These are already present within the header, so they don't need to be
repeated in the cpp file.
2019-05-23 14:22:12 -04:00
Lioncash
2b1fcc8a14 common/file_util: Make ReadFileToString and WriteStringToFile consistent
Makes the parameter ordering consistent, and also makes the filename
parameter a std::string. A std::string would be constructed anyways with
the previous code, as IOFile's only constructor with a filepath is one
taking a std::string.

We can also make WriteStringToFile's string parameter utilize a
std::string_view for the string, making use of our previous changes to
IOFile.
2019-05-23 13:52:43 -04:00
Lioncash
e3b2539986 common/file_util: Remove unnecessary c_str() calls
The file stream open functions have supported std::string overloads
since C++11, so we don't need to use c_str() here. Same behavior, less
code.
2019-05-23 13:37:47 -04:00
Lioncash
8cd3d9be26 common/file_util: Make IOFile's WriteString take a std::string_view
We don't need to force the usage of a std::string here, and can instead
use a std::string_view, which allows writing out other forms of strings
(e.g. C-style strings) without any unnecessary heap allocations.
2019-05-23 13:35:31 -04:00
Lioncash
a6f7a44aab
common/zstd_compression: Remove #pragma once directive from source file
Introduced in 72477731ed. This is only
necessary within header files.
2019-05-04 01:54:29 -04:00
Zach Hilman
1aa2b99a98 mii: Implement Delete and Destroy file 2019-04-25 08:07:57 -04:00
Zach Hilman
f0db2e3ef3 mii_manager: Cleanup and optimization 2019-04-25 08:07:57 -04:00
Zach Hilman
ca5638a142 common: Extract UUID to its own class
Since the Mii database uses UUIDs very similar to the Accounts database, it makes no sense to not share code between them.
2019-04-25 08:07:57 -04:00
Lioncash
4620ed47a3 common/{lz4_compression, zstd_compression}: Add missing header guards
These two files were missing the #pragma once directive.
2019-04-15 13:00:08 -04:00
bunnei
b42595fa6b
Merge pull request #2391 from lioncash/scope
common/scope_exit: Replace std::move with std::forward in ScopeExit()
2019-04-12 21:52:35 -04:00
Lioncash
0d8ef2d3b9 common/swap: Improve codegen of the default swap fallbacks
Uses arithmetic that can be identified more trivially by compilers for
optimizations. e.g. Rather than shifting the halves of the value and
then swapping and combining them, we can swap them in place.

e.g. for the original swap32 code on x86-64, clang 8.0 would generate:

    mov     ecx, edi
    rol     cx, 8
    shl     ecx, 16
    shr     edi, 16
    rol     di, 8
    movzx   eax, di
    or      eax, ecx
    ret

while GCC 8.3 would generate the ideal:

    mov     eax, edi
    bswap   eax
    ret

now both generate the same optimal output.

MSVC used to generate the following with the old code:

    mov     eax, ecx
    rol     cx, 8
    shr     eax, 16
    rol     ax, 8
    movzx   ecx, cx
    movzx   eax, ax
    shl     ecx, 16
    or      eax, ecx
    ret     0

Now MSVC also generates a similar, but equally optimal result as clang/GCC:

    bswap   ecx
    mov     eax, ecx
    ret     0

====

In the swap64 case, for the original code, clang 8.0 would generate:

    mov     eax, edi
    bswap   eax
    shl     rax, 32
    shr     rdi, 32
    bswap   edi
    or      rax, rdi
    ret

(almost there, but still missing the mark)

while, again, GCC 8.3 would generate the more ideal:

    mov     rax, rdi
    bswap   rax
    ret

now clang also generates the optimal sequence for this fallback as well.

This is a case where MSVC unfortunately falls short, despite the new
code, this one still generates a doozy of an output.

    mov     r8, rcx
    mov     r9, rcx
    mov     rax, 71776119061217280
    mov     rdx, r8
    and     r9, rax
    and     edx, 65280
    mov     rax, rcx
    shr     rax, 16
    or      r9, rax
    mov     rax, rcx
    shr     r9, 16
    mov     rcx, 280375465082880
    and     rax, rcx
    mov     rcx, 1095216660480
    or      r9, rax
    mov     rax, r8
    and     rax, rcx
    shr     r9, 16
    or      r9, rax
    mov     rcx, r8
    mov     rax, r8
    shr     r9, 8
    shl     rax, 16
    and     ecx, 16711680
    or      rdx, rax
    mov     eax, -16777216
    and     rax, r8
    shl     rdx, 16
    or      rdx, rcx
    shl     rdx, 16
    or      rax, rdx
    shl     rax, 8
    or      rax, r9
    ret     0

which is pretty unfortunate.
2019-04-12 00:07:39 -04:00
Lioncash
66b73fd399 common/swap: Mark byte swapping free functions with [[nodiscard]] and noexcept
Allows the compiler to inform when the result of a swap function is
being ignored (which is 100% a bug in all usage scenarios). We also mark
them noexcept to allow other functions using them to be able to be
marked as noexcept and play nicely with things that potentially inspect
"nothrowability".
2019-04-11 20:42:44 -04:00
Lioncash
9cb4b7be40 common/swap: Simplify swap function ifdefs
Including every OS' own built-in byte swapping functions is kind of
undesirable, since it adds yet another build path to ensure compilation
succeeds on.

Given we only support clang, GCC, and MSVC for the time being, we can
utilize their built-in functions directly instead of going through the
OS's API functions.

This shrinks the overall code down to just

if (msvc)
  use msvc's functions
else if (clang or gcc)
  use clang/gcc's builtins
else
  use the slow path
2019-04-11 20:36:19 -04:00
Lioncash
598954436f common/swap: Remove 32-bit ARM path
We don't plan to support host 32-bit ARM execution environments, so this
is essentially dead code.
2019-04-11 20:15:47 -04:00
Lioncash
b569641098 common/scope_exit: Replace std::move with std::forward in ScopeExit()
The template type here is actually a forwarding reference, not an rvalue
reference in this case, so it's more appropriate to use std::forward to
preserve the value category of the type being moved.
2019-04-11 20:01:33 -04:00
bunnei
f14328bf0a
Merge pull request #2300 from FernandoS27/null-shader
shader_cache: Permit a Null Shader in case of a bad host_ptr.
2019-04-07 17:58:27 -04:00
bunnei
21a4e7deea
Merge pull request #2098 from FreddyFunk/disk-cache-zstd
gl_shader_disk_cache: Use Zstandard for compression
2019-04-07 17:48:33 -04:00
Fernando Sahmkow
021cd56bc9 Permit a Null Shader in case of a bad host_ptr. 2019-04-07 07:52:01 -04:00
Lioncash
93b84e9308 common/multi_level_queue: Silence truncation warning in iterator operator++ 2019-04-05 15:35:46 -04:00
Lioncash
33db37e669 common/bit_util: Make CountLeading/CountTrailing functions have the same return types
Makes the return type consistently uniform (like the intrinsics we're
wrapping). This also conveniently silences a truncation warning within
the kernel multi_level_queue.
2019-04-05 15:29:40 -04:00
Lioncash
0e2f617abc common/lz4_compression: Remove #pragma once directive from the cpp file
Introduced within 798d76f4c7, this only
really has an effect within header files.

Silences a -Wpragma-once-outside-header warning with clang.
2019-04-03 22:07:04 -04:00
bunnei
d6374b2522
Merge pull request #2093 from FreddyFunk/disk-cache-better-compression
Better LZ4 compression utilization for the disk based shader cache and the yuzu build system
2019-04-03 21:50:29 -04:00
Lioncash
781ab8407b general: Use deducation guides for std::lock_guard and std::unique_lock
Since C++17, the introduction of deduction guides for locking facilities
means that we no longer need to hardcode the mutex type into the locks
themselves, making it easier to switch mutex types, should it ever be
necessary in the future.
2019-04-01 12:53:47 -04:00
bunnei
a89266bc5e
Merge pull request #2303 from lioncash/thread
common/thread: Remove unused functions
2019-03-30 20:10:32 -04:00
Lioncash
394095438a common/thread: Remove unused functions
Many of these functions are carried over from Dolphin (where they aren't
used anymore). Given these have no use (and we really shouldn't be
screwing around with OS-specific thread scheduler handling from the
emulator, these can be removed.

The function for setting the thread name is left, however, since it can
have debugging utility usages.
2019-03-29 13:26:21 -04:00
unknown
b4857e326f common/zstd_compression: simplify decompression interface 2019-03-29 18:22:08 +01:00
unknown
72477731ed common/zstd_compression: Add Zstandard wrapper 2019-03-29 18:22:08 +01:00
unknown
ca82589350 common: Link libzstd_static 2019-03-29 18:22:07 +01:00
unknown
c791192d64 Addressed feedback 2019-03-29 18:12:42 +01:00
unknown
74cee1b65d gl_shader_disk_cache: Use better compression for transferable and precompiled shader disk chache files 2019-03-29 16:42:19 +01:00
unknown
798d76f4c7 data_compression: Move LZ4 compression from video_core/gl_shader_disk_cache to common/data_compression 2019-03-29 16:42:19 +01:00