The Rendering Technologies of

Содержание

1. The Rendering Technologies of
2. Thin G-Buffer 2.0For Crysis 3, wanted:Minimize redundant
3. Thin G-Buffer 2.0
4. Target Image
5. Depth
6. RG: Normals
7. B: Glossiness
8. A: Translucency
9. R: Albedo Y
10. G: Albedo CbCr (interleaved)
11. B: Specular intensity
12. G-Buffer PackingWorld space normal packed into 2
13. G-Buffer Packing (2)Albedo in Y’CbCr color space (WIKI01)Stored in 2 channels via Chrominance Subsampling (WIKI02)
14. Hybrid Deferred RenderingDeferred lighting still processed as
15. Hybrid Deferred Rendering (2)Deferred (Red) + Forward (Green)
16. Thin G-Buffer BenefitsUnified solution across all platformsDeferred
17. Thin G-Buffer HindsightsWhy not pack G-Buffer directly?Because
18. Volumetric Fog UpdatesDensity calculation based on fog
19. Volumetric Fog UpdatesLittle tuning: Artist controllable gradients
20. Volumetric Fog ShadowsBased on TÓTH09: Don’t accumulate in-scattered light but shadow contribution along view ray instead
21. Volumetric fog shadowsInterleave pass distributes 1024 shadow
22. Naive Upscale
23. Bilateral Upscale
24. Silhouette POM
25. Silhouette POMAlternative to tessellation based displacement mappingLooked
26. Silhouette POM: StepsTransform vertices and extrude -
27. Silhouette POM
28. Silhouette POM
29. Massive Grass
30. Massive Grass: SimulationGrass blade instance:A chain of
31. Massive Grass: Simulation
32. Massive Grass: Simulation
33. Massive Grass: Simulation
34. Massive Grass: Simulation
35. Massive Grass: Mesh MergingOne patch results in
36. Massive Grass: Mesh Merging
37. Massive Grass: Update LoopCulling process (for each
38. Massive Grass: ChallengesEfficient buffer managementResulting meshes can
39. Massive Grass: Challenges (2)Efficient scheduling:Patch instances are
40. Massive Grass: Challenges (3)Alpha tested geometry, literaly
41. Anti-aliasingSubjective topic: Sharp VS Blurry Some PC
42. DX11 Deferred MSAA: 101The problem:Multiple passes and
43. DX11 Deferred MSAAFoundation for almost all our
44. Custom Resolve & Per-Sample MaskPost G-Buffer, perform
45. SV_Coverage
46. Custom Per-Sample Mask
47. Final Result
48. Pixel/Sample Frequency PassesEnsure disabling sample bit override
49. Alpha Test Super-SamplingAlpha testing is a special
50. Alpha Test Super-SamplingAlpha Test SSAA Disabled
51. Alpha Test Super-SamplingAlpha Test SSAA Enabled
52. Corner CasesCascades sun shadow maps:Doing it “by
53. Corner CasesSoft particles (or similar techniques accessing
54. MSAA Friendliness MSAA unfriendly techniques, the usual suspects:No AA at all or noticeable bright/dark silhouettesBadGood
55. MSAA Friendliness MSAA unfriendly techniques, the usual suspects:No AA at all or noticeable bright/dark silhouettesBadGood
56. MSAA Friendliness Rules of thumb: Accessing and/or
57. MSAA Correctness vs PerformanceOur goal was correctness
58. Conclusion What’s next for CryENGINE ?A Big Next
59. Special ThanksNicolas ThibierozChris Auty, Carsten Wenzel, Chris
60. Questions?Tiago@Crytek.com / Twitter: Crytek_TiagoCarsten@Crytek.comChristopherR@Crytek.com / Twitter: Cry_Raine
61. Where are hiring !
62. ReferencesWENZEL06 – Wenzel, C. “Real-time Atmospheric Effects
63. Extra Slides
64. Massive Grass: ChallengesTrick: Updating allocation done with
65. Скачать презентанцию

Thin G-Buffer 2.0For Crysis 3, wanted:Minimize redundant drawcallsAB details on G-Buffer with proper glossinessTons of vegetation => Deferred translucencyMultiplatform friendly

Слайды и текст этой презентации

Слайд 1 The Rendering Technologies of

Слайд 2Thin G-Buffer 2.0
For Crysis 3, wanted:
Minimize redundant drawcalls
AB details on

G-Buffer with proper glossiness
Tons of vegetation => Deferred translucency
Multiplatform friendly

Thin G-Buffer 2.0For Crysis 3, wanted:Minimize redundant drawcallsAB details on G-Buffer with proper glossinessTons of vegetation =>

Слайд 3Thin G-Buffer 2.0

Слайд 7B: Glossiness

Слайд 8A: Translucency

Слайд 10G: Albedo CbCr (interleaved)

Слайд 11B: Specular intensity

Слайд 12G-Buffer Packing
World space normal packed into 2 components (WIKI00)
Stereographic projection

worked ok in practice (also cheap)

Glossiness + Normal Z sign

packed together

G-Buffer PackingWorld space normal packed into 2 components (WIKI00)Stereographic projection worked ok in practice (also cheap)Glossiness +

Слайд 13G-Buffer Packing (2)
Albedo in Y’CbCr color space (WIKI01)

Stored in 2

channels via Chrominance Subsampling (WIKI02)

Слайд 14Hybrid Deferred Rendering
Deferred lighting still processed as usual (SOUSA11)
L-Buffers now

using BW friendlier R11G11B10F formats
Precision was sufficient, since material properties

not applied yet

Deferred shading composited via fullscreen pass
For more complex shading such as Hair or Skin, process forward passes

Allowed us to drop almost all opaque forward passes
Less Drawcalls, but G-Buffer passes now with higher cost
Fast Double-Z Prepass for some of the closest geometry helps slightly
Overall was nice win, on all platforms*

Hybrid Deferred RenderingDeferred lighting still processed as usual (SOUSA11)L-Buffers now using BW friendlier R11G11B10F formatsPrecision was sufficient,

Слайд 15Hybrid Deferred Rendering (2)
Deferred (Red) + Forward (Green)

Слайд 16Thin G-Buffer Benefits
Unified solution across all platforms
Deferred Rendering for less

BW/Memory than vanilla
Good for MSAA + avoiding tiled rendering on

Xbox360
Tackle glossiness for transparent geometry on G-Buffer
Alpha blended cases, e.g. Decals, Deferred Decals, Terrain Layers
Can composite all such cases directly into G-Buffer
Avoid need for multipass
Deferred sub-surface scattering
Visual + performance win, in particular for vegetation rendering

Thin G-Buffer BenefitsUnified solution across all platformsDeferred Rendering for less BW/Memory than vanillaGood for MSAA + avoiding

Слайд 17Thin G-Buffer Hindsights
Why not pack G-Buffer directly?
Because we need to

be able to blend details into G-Buffer
Would need to decode

–> blend –> encode
Or could blend such cases into separate targets (bad for MSAA/Consoles)

Programmable blending would have been nice
Transparent cases can’t use alpha channel for store*
sRGB output only for couple channels or all
Would allow for more interesting and optimal packing schemes
While at it, stencil write from fragment shader would also be handy

Thin G-Buffer HindsightsWhy not pack G-Buffer directly?Because we need to be able to blend details into G-BufferWould

Слайд 18Volumetric Fog Updates
Density calculation based on fog model established for

Crysis 1 (WENZEL06)
Deferred pass for opaque geometry
Per-Vertex approximation for

transparent geometry

Volumetric Fog UpdatesDensity calculation based on fog model established for Crysis 1 (WENZEL06)Deferred pass for opaque geometry

Слайд 19Volumetric Fog Updates
Little tuning: Artist controllable gradients (via ToD tool)
Height

based: Density and color for specified top and bottom height
Radial

based: Size, color and lobe around sun position

Volumetric Fog UpdatesLittle tuning: Artist controllable gradients (via ToD tool)Height based: Density and color for specified top

Слайд 20Volumetric Fog Shadows
Based on TÓTH09: Don’t accumulate in-scattered light but

shadow contribution along view ray instead

Слайд 21Volumetric fog shadows
Interleave pass distributes 1024 shadow samples on a

8x8 grid shared by neighboring pixels
Half resolution destination target
Gather pass

computes final shadow value
Bilateral filtering was used to minimize ghosting and halos
Shadow stored in alpha, 8 bit depth in red channel
Used 8 taps to compare against center full resolution depth
Max sample distance configurable (~150-200m in C3 levels)
Cloud shadow texture baked into final result
Final result modifies fog height and radial color

Volumetric fog shadowsInterleave pass distributes 1024 shadow samples on a 8x8 grid shared by neighboring pixelsHalf resolution

Слайд 22Naive Upscale

Слайд 23Bilateral Upscale

Слайд 24Silhouette POM

Слайд 25Silhouette POM
Alternative to tessellation based displacement mapping
Looked into various approaches,

most weren’t practical for production
Current implementation is based on principle

of barycentric correspondence (JESCHKE07)

Silhouette POMAlternative to tessellation based displacement mappingLooked into various approaches, most weren’t practical for productionCurrent implementation is

Слайд 26Silhouette POM: Steps
Transform vertices and extrude - VS
Generate prisms (do

not split into tetrahedral) and setup clip planes - GS
Generally

prism sides are bilinear patches, we approximate by a conservative plane
Note to IHVs: Emitting per-triangle constants would be nice!
In theory, on DX11.1, we could emit via UAV output?
Ray marching - PS
Compute intersection of view ray with prism in WS, translate to texture space via (Jeschke07) barycentric correspondence
Use resulting texture uv and height for entry and exit to trace height field
Compute final uv and selectively discard pixel (viewer below height map; view ray leaving prism before hitting terrain)
Lots of pressure on PS, yet GS is the bottleneck (prism gen)

Silhouette POM: StepsTransform vertices and extrude - VSGenerate prisms (do not split into tetrahedral) and setup clip

Слайд 27Silhouette POM

Слайд 28Silhouette POM

Слайд 29Massive Grass

Слайд 30Massive Grass: Simulation
Grass blade instance:
A chain of points held together

by constraints
Distance + bending constrains to try maintain local space

rest pose angle per-particle
Physics collision geometry converted into small sphere set
Collisions handled as plane constrains
No stable collision handling, overdamp the instance
Applied to vegetation meshes via software-skinning
Exposed parameters per group:
Stiffness, damping, wind force factor, random variance

Massive Grass: SimulationGrass blade instance:A chain of points held together by constraintsDistance + bending constrains to try

Слайд 31Massive Grass: Simulation

Слайд 32Massive Grass: Simulation

Слайд 33Massive Grass: Simulation

Слайд 34Massive Grass: Simulation

Слайд 35Massive Grass: Mesh Merging
One patch results in N-Meshes
N is number

of materials used
Instances grouped into 16x16x16 meter patches (yes, volumetric)
Typical

Numbers:
50k – 70k visible instances on consoles. PC > 100k
Instances have 18 to 3.6k vertices depending on mesh complexity
Closest instances simulated every frame
Based on distance: simulation and time sliced skinning
Instances removed further away

Massive Grass: Mesh MergingOne patch results in N-MeshesN is number of materials usedInstances grouped into 16x16x16 meter

Слайд 36Massive Grass: Mesh Merging

Слайд 37Massive Grass: Update Loop
Culling process (for each visible patch):
Mark visible

instances
Compute LOD
Check if instance should be skipped in distance
After culling:
Allocate

(from pool) dynamic VB/IB memory for each patch
Sample force fields into per-patch buffer (coarse discretization 4x4x4)
Sample physics for potential colliders, extract collider geometry
Dispatch sim & skin jobs for each patch

Massive Grass: Update LoopCulling process (for each visible patch):Mark visible instancesCompute LODCheck if instance should be skipped

Слайд 38Massive Grass: Challenges
Efficient buffer management
Resulting meshes can vary in size

per frame
Naive implementation (C2) resulted in bad perf on PC

and out of vram on consoles due to fragmentation
Current implementation inspired by “Don’t Throw it all Away” (McDONALD12)
Large pools for dynamic IB/VB
Each maintains two free lists (usable and pending)
Each item in pending list is moved to main free list as soon as GPU query guarantees GPU done with pool
1.3 MB consoles main memory and PC 16 MB

Massive Grass: ChallengesEfficient buffer managementResulting meshes can vary in size per frameNaive implementation (C2) resulted in bad

Слайд 39Massive Grass: Challenges (2)
Efficient scheduling:
Patch instances are divided into small

groups
Sim job kicked off for each group in main thread
DP

in render thread has blocking wait for sim job
Job considered low-priority
Important:
Avoid unnecessary copies, skin directly to final destination
Reduce throughput and memory requirements (used half & fixed point precision everywhere)
PC: ~15 ms, 300 to 600 jobs on worst case scenarios
Xbox360 ~16ms, 800 jobs; PS3 ~10ms, 100-400 jobs

Massive Grass: Challenges (2)Efficient scheduling:Patch instances are divided into small groupsSim job kicked off for each group

Слайд 40Massive Grass: Challenges (3)
Alpha tested geometry, literaly everywhere
Massive overdraw, also

troublesome for MSAA
Literaly worst case scenario for RSX due to

poor z-cull
Prototyped alternatives (e.g. geometry based)
Art was not happy with these unfortunately

End solution: keep it simple
G-Buffer stage minimalistic
Consoles: Mostly outputting vertex data
Art side surface coverage minimization

Massive Grass: Challenges (3)Alpha tested geometry, literaly everywhereMassive overdraw, also troublesome for MSAALiteraly worst case scenario for

Слайд 41Anti-aliasing
Subjective topic: Sharp VS Blurry
Some PC gamers hate blurry,

some hate sharp.
Some even love 800x600 and no AA

Anti-aliasingSubjective topic: Sharp VS Blurry Some PC gamers hate blurry, some hate sharp. Some even love 800x600

Слайд 42DX11 Deferred MSAA: 101
The problem:
Multiple passes and reading/writing from Multisampled

Render Targets
SV_SampleIndex / SV_Coverage system value semantics allow to solve

via multipass for pixel/sample frequency passes (Thibieroz08)
SV_SampleIndex
Forces pixel shader execution for each sub-sample
SV_SampleIndex provides index of the sub-sample currently executed
Index can be used to fetch sub-sample from your Multisampled RT
E.g. FooMS.Load( UnnormScreenCoord, nCurrSample)
SV_Coverage
Indicates to pixel shader which sub-samples covered during raster stage
Can also modify sub-sample coverage for custom coverage mask

DX11 Deferred MSAA: 101The problem:Multiple passes and reading/writing from Multisampled Render TargetsSV_SampleIndex / SV_Coverage system value semantics

Слайд 43DX11 Deferred MSAA
Foundation for almost all our supported AA techniques
Simple

theory => troublesome practice
At least with fairly complex and deferred

based engines
Disclaimer:
Non-MSAA friendly code accumulates fast
Breaks regularly as new techniques added with no care for MSAA
Pinpoint non-msaa friendly techniques, and update them one by one.
Rinse and repeat and you’ll get there eventually.
Will be enforced by default on our future engine versions

DX11 Deferred MSAAFoundation for almost all our supported AA techniquesSimple theory => troublesome practiceAt least with fairly

Слайд 44Custom Resolve & Per-Sample Mask
Post G-Buffer, perform a custom msaa

resolve:
Outputs sample 0 for lighting/other msaa dependent passes
Creates sub-sample mask

on same pass, rejecting similar samples
Tag stencil with sub-sample mask

How to combine with existing complex techniques that might be using Stencil Buffer already?
Reserve 1 bit from stencil buffer
Update it with sub-sample mask
Make usage of stencil read/write bitmask to avoid bit override
Restore whenever a stencil clear occurs

Custom Resolve & Per-Sample MaskPost G-Buffer, perform a custom msaa resolve:Outputs sample 0 for lighting/other msaa dependent

Слайд 45SV_Coverage

Слайд 46Custom Per-Sample Mask

Слайд 47Final Result

Слайд 48Pixel/Sample Frequency Passes
Ensure disabling sample bit override via stencil write

mask
StencilWriteMask = 0x7F
Pixel Frequency Passes
Set stencil read mask to reserved

bits for per-pixel regions (~0x80)
Bind pre-resolved (non-multisampled) targets SRVs
Render pass as usual
Sample Frequency Passes
Set stencil read mask to reserved bit for per-sample regions (0x80)
Bind multisampled targets SRVs
Index current sub-sample via SV_SAMPLEINDEX
Render pass as usual

Pixel/Sample Frequency PassesEnsure disabling sample bit override via stencil write maskStencilWriteMask = 0x7FPixel Frequency PassesSet stencil read

Слайд 49Alpha Test Super-Sampling
Alpha testing is a special case
Default SV_Coverage only

applies to triangle edges
Create your own sub-sample coverage mask
E.g. check

if current sub-sample AT or not and set bit

Alpha Test Super-SamplingAlpha testing is a special caseDefault SV_Coverage only applies to triangle edgesCreate your own sub-sample

Слайд 50Alpha Test Super-Sampling
Alpha Test SSAA Disabled

Слайд 51Alpha Test Super-Sampling
Alpha Test SSAA Enabled

Слайд 52Corner Cases

Cascades sun shadow maps:
Doing it “by the book” gets

expensive quickly
Render shadows as usual at pixel frequency
Bilateral upscale during

deferred shading composite pass

Corner CasesCascades sun shadow maps:Doing it “by the book” gets expensive quicklyRender shadows as usual at pixel

Слайд 53Corner Cases
Soft particles (or similar techniques accessing depth):
Recommendation to tackle

via per-sample frequency is quite slow on real world scenarios
Max

Depth instead works quite ok for most cases and N-times faster

Bad

Good

Corner CasesSoft particles (or similar techniques accessing depth):Recommendation to tackle via per-sample frequency is quite slow on

Слайд 54MSAA Friendliness
MSAA unfriendly techniques, the usual suspects:
No AA at

all or noticeable bright/dark silhouettes

Bad
Good

Слайд 55MSAA Friendliness
MSAA unfriendly techniques, the usual suspects:
No AA at

all or noticeable bright/dark silhouettes

Bad
Good

Слайд 56MSAA Friendliness

Rules of thumb:
Accessing and/or rendering to Multisampled

Render Targets?
Then you’ll need to care about accessing/outputting correct

sub-sample
Obviously, always minimize BW – avoid fat formats
The later is always valid, but even more for MSAA cases

MSAA Friendliness Rules of thumb: Accessing and/or rendering to Multisampled Render Targets? Then you’ll need to care

Слайд 57MSAA Correctness vs Performance
Our goal was correctness and quality over

performance
You can always cut some corners as most games doing:
Alpha

to Coverage instead of Alpha Test Super-Sampling
Or even no Alpha Test AA
Render only opaque with MSAA
Then render alpha blended passes withouth MSAA
Assuming HDR rendering: note that tone mapping is implicitly done post-resolve resulting is loss of detail on high contrast regions
Note to IHVs: Having explicit access to HW capabilities such as EQAA/CSAA would be nice
Smarter AA combos

MSAA Correctness vs PerformanceOur goal was correctness and quality over performanceYou can always cut some corners as

Слайд 58Conclusion
What’s next for CryENGINE ?
A Big Next Generation leap is

finally upon us
In 2 years time, GPUs will be at

~16 TFLOPS and ridiculous amount of available memory.
Extrapolate results from there, without >8 year old consoles slowing progress 
4k resolution will bring some interesting challenges/opportunities

Call to arms - still a lot of problems to solve
IHVs/Microsoft: PC GPU profilers have a lot to evolve! How about a unified GPU Profiler, working great for all IHVs?
Microsoft: Sup with DX11 (lack of) documentation? Where’s DX12?
You: No great realtime GI / realtime reflections solution yet!

Conclusion What’s next for CryENGINE ?A Big Next Generation leap is finally upon usIn 2 years time, GPUs

Слайд 59Special Thanks
Nicolas Thibieroz
Chris Auty, Carsten Wenzel, Chris Raine, Chris Bolte,

Baldur Karlsson, Andrew Khan, Michael Kopietz, Ivo Zoltan Frey, Desmond

Gayle, Marco Corbetta, Jake Turner, Pierre-Ives Donzallaz, Magnus Larbrant, Nicolas Schulz, Nick Kasyan, Vladimir Kajalin..
Uff… lets just make it shorter:

Thanks to the entire Crytek Team ^_^

Special ThanksNicolas ThibierozChris Auty, Carsten Wenzel, Chris Raine, Chris Bolte, Baldur Karlsson, Andrew Khan, Michael Kopietz, Ivo

Слайд 60Questions?

Tiago@Crytek.com / Twitter: Crytek_Tiago
Carsten@Crytek.com
ChristopherR@Crytek.com / Twitter: Cry_Raine

Слайд 61Where are hiring !

Слайд 62References
WENZEL06 – Wenzel, C. “Real-time Atmospheric Effects in Games”, 2006
JESCHKE07

- Jeschke, S. et al. “Interactive Smooth and Curved Shell

Mapping”, 2007
THIBIEROZ08 – Thibieroz, N. “Deferred Shading with Multisampling Anti-Aliasing in DirectX10”, 2008
TÓTH09 – Tóth, B. et al. “Real-time Volumetric Lighting in Participating Media”, 2009
SOUSA11 - Sousa, T. “CryENGINE 3 Rendering Techniques”, 2011
McDONALD12 – McDonald, J. “Don’t Throw it all Away”, 2012
WIKI00 – “Stereographic projection”, http://en.wikipedia.org/wiki/Stereographic_projection
WIKI01 – “Y’CbCr”, http://en.wikipedia.org/wiki/YCbCr
WIKI02– “Chroma subsampling”, http://en.wikipedia.org/wiki/Chroma_subsampling

ReferencesWENZEL06 – Wenzel, C. “Real-time Atmospheric Effects in Games”, 2006JESCHKE07 - Jeschke, S. et al. “Interactive Smooth

Слайд 63Extra Slides

Слайд 64Massive Grass: Challenges
Trick: Updating allocation done with Copy-On-Write in case

GPU still using original location
Consoles: incrementally defragment pools with GPU

memory copies
Also possible on PC, but more expensive due to CopySubResource limitations (need scratchpad memory, since CSR won’t allow copies where Dst/Src are same resource)
Note to IHVs: Being able to copy from same Dst/Src resource, if non-overlapping memory regions, would be handy

Ended up using allocation & usage scheme for static geometry as well

Massive Grass: ChallengesTrick: Updating allocation done with Copy-On-Write in case GPU still using original locationConsoles: incrementally defragment

Скачать презентацию

Разделы презентаций

The Rendering Technologies of

Содержание

Слайды и текст этой презентации

Слайд 1 The Rendering Technologies of

Слайд 2Thin G-Buffer 2.0For Crysis 3, wanted:Minimize redundant drawcallsAB details on

G-Buffer with proper glossinessTons of vegetation => Deferred translucencyMultiplatform friendly

Слайд 3Thin G-Buffer 2.0

Слайд 4Target Image

Слайд 5Depth

Слайд 6RG: Normals

Слайд 7B: Glossiness

Слайд 8A: Translucency

Слайд 9R: Albedo Y

Слайд 10G: Albedo CbCr (interleaved)

Слайд 11B: Specular intensity

Слайд 12G-Buffer PackingWorld space normal packed into 2 components (WIKI00)Stereographic projection

worked ok in practice (also cheap)Glossiness + Normal Z sign

Слайд 13G-Buffer Packing (2)Albedo in Y’CbCr color space (WIKI01)Stored in 2

channels via Chrominance Subsampling (WIKI02)

Слайд 14Hybrid Deferred RenderingDeferred lighting still processed as usual (SOUSA11)L-Buffers now

using BW friendlier R11G11B10F formatsPrecision was sufficient, since material properties

Слайд 15Hybrid Deferred Rendering (2)Deferred (Red) + Forward (Green)

Слайд 16Thin G-Buffer BenefitsUnified solution across all platformsDeferred Rendering for less

BW/Memory than vanillaGood for MSAA + avoiding tiled rendering on

Слайд 17Thin G-Buffer HindsightsWhy not pack G-Buffer directly?Because we need to

be able to blend details into G-BufferWould need to decode

Слайд 18Volumetric Fog UpdatesDensity calculation based on fog model established for

Crysis 1 (WENZEL06)Deferred pass for opaque geometry Per-Vertex approximation for

Слайд 19Volumetric Fog UpdatesLittle tuning: Artist controllable gradients (via ToD tool)Height

based: Density and color for specified top and bottom heightRadial

Слайд 20Volumetric Fog ShadowsBased on TÓTH09: Don’t accumulate in-scattered light but

shadow contribution along view ray instead

Слайд 21Volumetric fog shadowsInterleave pass distributes 1024 shadow samples on a

8x8 grid shared by neighboring pixelsHalf resolution destination targetGather pass

Слайд 22Naive Upscale

Слайд 23Bilateral Upscale

Слайд 24Silhouette POM

Слайд 25Silhouette POMAlternative to tessellation based displacement mappingLooked into various approaches,

most weren’t practical for productionCurrent implementation is based on principle

Слайд 26Silhouette POM: StepsTransform vertices and extrude - VSGenerate prisms (do

not split into tetrahedral) and setup clip planes - GSGenerally

Слайд 27Silhouette POM

Слайд 28Silhouette POM

Слайд 29Massive Grass

Слайд 30Massive Grass: SimulationGrass blade instance:A chain of points held together

by constraintsDistance + bending constrains to try maintain local space

Слайд 31Massive Grass: Simulation

Слайд 32Massive Grass: Simulation

Слайд 33Massive Grass: Simulation

Слайд 34Massive Grass: Simulation

Слайд 35Massive Grass: Mesh MergingOne patch results in N-MeshesN is number

of materials usedInstances grouped into 16x16x16 meter patches (yes, volumetric)Typical

Слайд 36Massive Grass: Mesh Merging

Слайд 37Massive Grass: Update LoopCulling process (for each visible patch):Mark visible

instancesCompute LODCheck if instance should be skipped in distanceAfter culling:Allocate

Слайд 38Massive Grass: ChallengesEfficient buffer managementResulting meshes can vary in size

per frameNaive implementation (C2) resulted in bad perf on PC

Слайд 39Massive Grass: Challenges (2)Efficient scheduling:Patch instances are divided into small

groupsSim job kicked off for each group in main threadDP

Слайд 40Massive Grass: Challenges (3)Alpha tested geometry, literaly everywhereMassive overdraw, also

troublesome for MSAALiteraly worst case scenario for RSX due to

Слайд 41Anti-aliasingSubjective topic: Sharp VS Blurry Some PC gamers hate blurry,

some hate sharp. Some even love 800x600 and no AA

Слайд 42DX11 Deferred MSAA: 101The problem:Multiple passes and reading/writing from Multisampled

Render TargetsSV_SampleIndex / SV_Coverage system value semantics allow to solve

Слайд 43DX11 Deferred MSAAFoundation for almost all our supported AA techniquesSimple

theory => troublesome practiceAt least with fairly complex and deferred

Слайд 44Custom Resolve & Per-Sample MaskPost G-Buffer, perform a custom msaa

resolve:Outputs sample 0 for lighting/other msaa dependent passesCreates sub-sample mask

Слайд 45SV_Coverage

Слайд 46Custom Per-Sample Mask

Слайд 47Final Result

Слайд 48Pixel/Sample Frequency PassesEnsure disabling sample bit override via stencil write

maskStencilWriteMask = 0x7FPixel Frequency PassesSet stencil read mask to reserved

Слайд 49Alpha Test Super-SamplingAlpha testing is a special caseDefault SV_Coverage only

applies to triangle edgesCreate your own sub-sample coverage maskE.g. check

Слайд 50Alpha Test Super-SamplingAlpha Test SSAA Disabled

Слайд 51Alpha Test Super-SamplingAlpha Test SSAA Enabled

Слайд 52Corner CasesCascades sun shadow maps:Doing it “by the book” gets

Слайд 2Thin G-Buffer 2.0
For Crysis 3, wanted:
Minimize redundant drawcalls
AB details on

G-Buffer with proper glossiness
Tons of vegetation => Deferred translucency
Multiplatform friendly

Слайд 12G-Buffer Packing
World space normal packed into 2 components (WIKI00)
Stereographic projection

worked ok in practice (also cheap)

Glossiness + Normal Z sign

Слайд 13G-Buffer Packing (2)
Albedo in Y’CbCr color space (WIKI01)

Stored in 2

Слайд 14Hybrid Deferred Rendering
Deferred lighting still processed as usual (SOUSA11)
L-Buffers now

using BW friendlier R11G11B10F formats
Precision was sufficient, since material properties

Слайд 15Hybrid Deferred Rendering (2)
Deferred (Red) + Forward (Green)

Слайд 16Thin G-Buffer Benefits
Unified solution across all platforms
Deferred Rendering for less

BW/Memory than vanilla
Good for MSAA + avoiding tiled rendering on

Слайд 17Thin G-Buffer Hindsights
Why not pack G-Buffer directly?
Because we need to

be able to blend details into G-Buffer
Would need to decode

Слайд 18Volumetric Fog Updates
Density calculation based on fog model established for

Crysis 1 (WENZEL06)
Deferred pass for opaque geometry
Per-Vertex approximation for

Слайд 19Volumetric Fog Updates
Little tuning: Artist controllable gradients (via ToD tool)
Height

based: Density and color for specified top and bottom height
Radial

Слайд 20Volumetric Fog Shadows
Based on TÓTH09: Don’t accumulate in-scattered light but

Слайд 21Volumetric fog shadows
Interleave pass distributes 1024 shadow samples on a

8x8 grid shared by neighboring pixels
Half resolution destination target
Gather pass

Слайд 25Silhouette POM
Alternative to tessellation based displacement mapping
Looked into various approaches,

most weren’t practical for production
Current implementation is based on principle

Слайд 26Silhouette POM: Steps
Transform vertices and extrude - VS
Generate prisms (do

not split into tetrahedral) and setup clip planes - GS
Generally

Слайд 30Massive Grass: Simulation
Grass blade instance:
A chain of points held together

by constraints
Distance + bending constrains to try maintain local space

Слайд 35Massive Grass: Mesh Merging
One patch results in N-Meshes
N is number

of materials used
Instances grouped into 16x16x16 meter patches (yes, volumetric)
Typical

Слайд 37Massive Grass: Update Loop
Culling process (for each visible patch):
Mark visible

instances
Compute LOD
Check if instance should be skipped in distance
After culling:
Allocate

Слайд 38Massive Grass: Challenges
Efficient buffer management
Resulting meshes can vary in size

per frame
Naive implementation (C2) resulted in bad perf on PC

Слайд 39Massive Grass: Challenges (2)
Efficient scheduling:
Patch instances are divided into small

groups
Sim job kicked off for each group in main thread
DP

Слайд 40Massive Grass: Challenges (3)
Alpha tested geometry, literaly everywhere
Massive overdraw, also

troublesome for MSAA
Literaly worst case scenario for RSX due to

Слайд 41Anti-aliasing
Subjective topic: Sharp VS Blurry
Some PC gamers hate blurry,

some hate sharp.
Some even love 800x600 and no AA

Слайд 42DX11 Deferred MSAA: 101
The problem:
Multiple passes and reading/writing from Multisampled

Render Targets
SV_SampleIndex / SV_Coverage system value semantics allow to solve

Слайд 43DX11 Deferred MSAA
Foundation for almost all our supported AA techniques
Simple

theory => troublesome practice
At least with fairly complex and deferred

Слайд 44Custom Resolve & Per-Sample Mask
Post G-Buffer, perform a custom msaa

resolve:
Outputs sample 0 for lighting/other msaa dependent passes
Creates sub-sample mask

Слайд 48Pixel/Sample Frequency Passes
Ensure disabling sample bit override via stencil write

mask
StencilWriteMask = 0x7F
Pixel Frequency Passes
Set stencil read mask to reserved

Слайд 49Alpha Test Super-Sampling
Alpha testing is a special case
Default SV_Coverage only

applies to triangle edges
Create your own sub-sample coverage mask
E.g. check

Слайд 50Alpha Test Super-Sampling
Alpha Test SSAA Disabled

Слайд 51Alpha Test Super-Sampling
Alpha Test SSAA Enabled

Слайд 52Corner Cases

Cascades sun shadow maps:
Doing it “by the book” gets

expensive quickly
Render shadows as usual at pixel frequency
Bilateral upscale during

Слайд 53Corner Cases
Soft particles (or similar techniques accessing depth):
Recommendation to tackle

via per-sample frequency is quite slow on real world scenarios
Max

Слайд 54MSAA Friendliness
MSAA unfriendly techniques, the usual suspects:
No AA at

all or noticeable bright/dark silhouettes

Bad
Good

Слайд 55MSAA Friendliness
MSAA unfriendly techniques, the usual suspects:
No AA at

all or noticeable bright/dark silhouettes

Bad
Good

Слайд 56MSAA Friendliness

Rules of thumb:
Accessing and/or rendering to Multisampled

Render Targets?
Then you’ll need to care about accessing/outputting correct

Слайд 57MSAA Correctness vs Performance
Our goal was correctness and quality over

performance
You can always cut some corners as most games doing:
Alpha

Слайд 58Conclusion
What’s next for CryENGINE ?
A Big Next Generation leap is

finally upon us
In 2 years time, GPUs will be at

Слайд 59Special Thanks
Nicolas Thibieroz
Chris Auty, Carsten Wenzel, Chris Raine, Chris Bolte,

Слайд 60Questions?

Tiago@Crytek.com / Twitter: Crytek_Tiago
Carsten@Crytek.com
ChristopherR@Crytek.com / Twitter: Cry_Raine

Слайд 62References
WENZEL06 – Wenzel, C. “Real-time Atmospheric Effects in Games”, 2006
JESCHKE07

Слайд 64Massive Grass: Challenges
Trick: Updating allocation done with Copy-On-Write in case

GPU still using original location
Consoles: incrementally defragment pools with GPU