优化 Optimizations

Date:2013-07-19 15:52

Just like on PCs, mobile platforms like iOS and Android have devices of various levels of performance. You can easily find a phone that's 10x more powerful for rendering than some other phone. Quite easy way of scaling:

和PC电脑类似,像iOS和安卓这样的移动平台有很多不同等级性能的设备。你可以很容易找到一个手机,它的渲染能力是另一个手机的十几倍。缩放性能的方法有很多:

  1. Make sure it runs okay on baseline configuration
    确保它可以在基准配置上运行
  2. Use more eye-candy on higher performing configurations:
    在性能更高的配置上使用一些养眼的东西:
    • Resolution 分辨率
    • Post-processing 后处理效果
    • MSAA 多重采样抗锯齿
    • Anisotropy 各向异性
    • Shaders 着色器
    • Fx/particles density, on/off
      特效/粒子密度,开启/关闭 着眼于GPUs Focus on GPUs

Focus on GPUs
着眼于GPU

Graphics performance is bound by fillrate, pixel and geometric complexity (vertex count). All three of these can be reduced if you can find a way to cull more renderers. Occlusion culling and could help here. Unity will automatically cull objects outside the viewing frustum.

充填率、像素以及几何体的复杂度(顶点数量)绑定了图形性能。如果你可以找到一个方法来剔除更多的渲染,那么这三点就可以被降低。遮挡剔除就可以做到这一点。Unity会自动剔除视见平截头体之外的对象。

On mobiles you're essentially fillrate bound (fillrate = screen pixels * shader complexity * overdraw), and over-complex shaders is the most common cause of problems. So use mobile shaders that come with Unity or design your own but make them as simple as possible. If possible simplify your pixel shaders by moving code to vertex shader.

在移动设备上,你应该对充填率绑定(充填率=屏幕像素*着色器复杂度*透支)很敏感,而且过度复杂的着色器是最常见的引发问题的起因。因此,请使用Unity自带的移动平台着色器或者设计你自己的着色器但是使它们尽可能简单。如果可能,为了简化你的像素着色器,把代码移动到顶点着色器中。

If reducing the Texture Quality in Quality Settings makes the game run faster, you are probably limited by memory bandwidth. So compress textures, use mipmaps, reduce texture size, etc.

如果在Quality Settings里降低Texture Quality的值来使得游戏运行的更流畅,你可能会被内存带宽所限制。因此请压缩纹理,使用mipmaps,减少纹理大小等等。

LOD (Level of Detail) ' make objects simpler or eliminate them completely as they move further away. The main goal would be to reduce the number of draw calls.

LOD(细节等级)使得对象更加简单,或是在它们移向远方时完全消除它们。主要目的都是为了减少绘制调用的数目。

Good practice 优秀的实践

Mobile GPUs have huge constraints in how much heat they produce, how much power they use, and how large or noisy they can be. So compared to the desktop parts, mobile GPUs have way less bandwidth, low ALU performance and texturing power. The architectures of the GPUs are also tuned to use as little bandwidth & power as possible.

对于移动平台的GPUs,它们产生了多少热量、它们使用了多少能量以及它们多大或者多吵都是有很大限制的。因此,和台式电脑相比,移动平台GPUs具有更少的带宽、低下的ALU性能和纹理功能。GPUs的体系结构也同样被调整为尽可能少的使用带宽和能量。

Unity is optimized for OpenGL ES 2.0, it uses GLSL ES (similar to HLSL) shading language. Built in shaders are most often written in HLSL (also known as Cg). This is cross compiled into GLSL ES for mobile platforms. You can also write GLSL directly if you want to, but doing that limits you to OpenGL-like platforms (e.g. mobile + Mac) since there currently are no GLSL->HLSL translation tools. When you use float/half/fixed types in HLSL, they end up highp/mediump/lowp precision qualifiers in GLSL ES.

Unity优化了OpenGL ES 2.0,它使用GLSL ES(和HLSL类似)着色语言。内置的着色器大部分都是使用HLSL(也被称为Cg)编写的。对于移动平台,这被交叉编译为GLSL ES。如果你想,你也可以直接使用GLSL,但是这样做会限制你发布在OpenGL的平台(例如移动平台+Mac),因为目前没有GLSL到HLSL的转换工具。当你在HLSL中使用float/half/fixed 类型时,在GLSL ES中它们是由highp/mediump/lowp前置标识符结束的。

Here is the checklist for good practice:

下面的清单列出了一些优秀的实践:

  1. Keep the number of materials as low as possible. This makes it easier for Unity to batch stuff.
    保持材质的数目尽可能少。这使得Unity更容易进行批处理。
  2. Use texture atlases (large images containing a collection of sub-images) instead of a number of individual textures. These are faster to load, have fewer state switches, and are batching friendly.
    使用纹理精灵(一张大贴图里包含了很多子贴图)来代替一系列单独的小贴图。它们可以更快地被加载,具有很少的状态转换,而且批处理更友好。
  3. Use Renderer.sharedMaterial instead of Renderer.material if using texture atlases and shared materials.
    如果使用了纹理精灵和共享材质,使用Renderer.sharedMaterial 来代替Renderer.material 。
  4. Forward rendered pixel lights are expensive.
    像素灯光提前渲染的代价是昂贵的。
    • Use light mapping instead of realtime lights where ever possible.
      尽可能使用灯光映射来代替实时灯光。
    • Adjust pixel light count in quality settings. Essentially only the directional light should be per pixel, everything else - per vertex. Certainly this depends on the game.
      在质量设置中调整像素灯光的数量。只有平行光应该是逐像素的,其他所有都应该是逐顶点的。当然,这取决于游戏。
  5. Experiment with Render Mode of Lights in the Quality Settings to get the correct priority.
    反复调整Quality Settings中的Render Mode of Lights来得到正确的优先级。
  6. Avoid Cutout (alpha test) shaders unless really necessary.
    避免Cutout(透明度测试)着色器,除非是真的需要。
  7. Keep Transparent (alpha blend) screen coverage to a minimum.
    保持透明(透明度混合)屏幕覆盖范围最小。
  8. Try to avoid situations where multiple lights illuminate any given object.
    尝试避免多个灯光照亮任何给定对象的情况。
  9. Try to reduce the overall number of shader passes (Shadows, pixel lights, reflections).
    尝试减少着色通道(阴影,像素灯光,反射)的全部数量。
  10. Rendering order is critical. In general case:
    渲染顺序是非常重要的。通常情况下:
    1. fully opaque objects roughly front-to-back.
      大致从前往后的完全不透明对象。
    2. alpha tested objects roughly front-to-back.
      大致是从前往后的透明度测试的对象。
    3. skybox. 天空盒子
    4. alpha blended objects (back to front if needed).
      透明度混合对象(如果需要就从后往前)
  11. Post Processing is expensive on mobiles, use with care.
    后期处理在移动平台上是代价昂贵的,请小心使用。
  12. Particles: reduce overdraw, use the simplest possible shaders.
    例子系统:降低透支,使用尽可能简单的着色器。
  13. Double buffer for Meshes modified every frame:
    对于每一帧都需要修改的网格使用双缓存:
void Update (){
  // flip between meshes
  bufferMesh = on ? meshA : meshB;
  on = !on;
  bufferMesh.vertices = vertices; // modification to mesh
  meshFilter.sharedMesh = bufferMesh;
}

Sharer optimizations 着色器优化

Checking if you are fillrate-bound is easy: does the game run faster if you decrease the display resolution? If yes, you are limited by fillrate.

检查你是否是充填率绑定的是容易的:如果你降低显示分辨率,游戏是否运行的更流畅?如果是,那么你就是被充填率限制了。

Try reducing shader complexity by the following methods:

尝试使用下面的方法减小着色器复杂度:

  • Avoid alpha-testing shaders; instead use alpha-blended versions.
    避免透明度测试着色器;使用透明度混合的版本来代替。
  • Use simple, optimized shader code (such as the 'Mobile' shaders that ship with Unity).
    使用简单的、优化的着色器代码(例如Unity自带的移动平台的着色器)。
  • Avoid expensive math functions in shader code (pow, exp, log, cos, sin, tan, etc). Consider using pre-calculated lookup textures instead.
    避免在着色器代码里使用高昂的数学函数(pow, exp, log, cos, sin, tan等等)。考虑使用预计算的查表贴图来代替。
  • Pick lowest possible number precision format (float, half, fixedin Cg) for best performance.
    为了得到最高性能,选择最低可能的精度数目格式(Cg中是float, half, fixed)。

Focus on CPUs
着眼于CPUs

It is often the case that games are limited by the GPU on pixel processing. So they end up having unused CPU power, especially on multicore mobile CPUs. So it is often sensible to pull some work off the GPU and put it onto the CPU instead (Unity does all of these): mesh skinning, batching of small objects, particle geometry updates.

游戏在像素处理时被GPU所限制,是非常常见的。它们的CPU能力就没有被使用,特别是在多核的移动平台的CPUs。因此,将一些工作从GPU里拉出来,放到CPU里进行(Unity做了这些所有的事情)通常是明智的:网格蒙皮,小对象的批处理,粒子几何体更新。

These should be used with care, not blindly. If you are not bound by draw calls, then batching is actually worse for performance, as it makes culling less efficient and makes more objects affected by lights!

这些应该小心使用,而不是盲目使用。如果你不是被绘制调用所限制,那么批处理实际上使得性能更加糟糕,因为它减少了剔除的效率,并使得更多的对象受灯光影响!

Good practice 优秀的实践

  • Don't use more than a few hundred draw calls per frame on mobiles.
    在移动设备上,每帧不要使用超过几百的绘制调用。
  • FindObjectsOfType (and Unity getter properties in general) are very slow, so use them sensibly.
    FindObjectsOfType(和Unity其他常见的getter属性)是非常慢的,因此聪明地使用它们。
  • Set the Static property on non-moving objects to allow internal optimizations like static batching.
    将非移动对象设置为Static属性来允许内置优化,例如静态批处理。
  • Spend lots of CPU cycles to do occlusion culling and better sorting (to take advantage of Early Z-cull).
    花费大量的CPU循环来进行遮挡剔除和更好的排序(利用Early Z-cull)

Physics 物理

Physics can be CPU heavy. It can be profiled via the Editor profiler. If Physics appears to take too much time on CPU:

物理是非常消耗CPU的。它可以通过编辑器分析器被优化。如果物理模拟看起来花费了过多的CPU时间:

  • Tweak Time.fixedDeltaTime (in Project settings -> Time) to be as high as you can get away with. If your game is slow moving, you probably need less fixed updates than games with fast action. Fast paced games will need more frequent calculations, and thus fixedDeltaTime will need to be lower or a collision may fail.
    把Time.fixedDeltaTime (在Project settings -> Time)的值调整为你可以接受的最高值。如果你的游戏移动很慢,相对于那么快速动作的游戏,你可能需要更小的固定更新。快速步调的游戏将需要更频繁的计算,因此 fixedDeltaTime 需要降低,否则碰撞可能会失败。
  • Physics.solverIterationCount (Physics Manager).
    Physics.solverIterationCount(物理管理器)
  • Use as little Cloth objects as possible.
    使用尽可能少的Cloth对象。
  • Use Rigidbodies only where necessary.
    只在必需时使用刚体。
  • Use primitive colliders in preference mesh colliders.
    相比于网格碰撞器,优先使用原型碰撞器。
  • Never ever move a static collider (ie a collider without a Rigidbody) as it causes a big performance hit.
    永远不要移动一个静态碰撞器(例如一个没有刚体的碰撞器),因为这会导致很大的性能损失。
    • Shows up in Profiler as 'Static Collider.Move' but actual processing is in Physics.Simulate
      在分析器里显示为Static Collider.Move,但是实际上是在Physics.Simulate里处理的。
    • If necessary, add a RigidBody and set isKinematic to true.
      如果必需,添加一个刚体,并选中它的isKinematic 。
  • On Windows you can use NVidia's AgPerfMon profiling tool set to get more details if needed.
    如果需要,在Windows上你可以使用英伟达的AgPerfMon分析工具集合来得到更多细节

Android

GPU

These are the popular mobile architectures. This is both different hardware vendors than in PC/console space, and very different GPU architectures than the 'usual' GPUs.

下面是一些流行的移动平台体系架构。它相比于PC/控制台空间具有不同的硬件供应商,以及与通常的GPUs相比非常不同的GPU体系架构。

  • ImgTec PowerVR SGX - Tile based, deferred: render everything in small tiles (as 16x16), shade only visible pixels
    ImgTec PowerVR SGX – 基于平铺的,延迟的:在小单元(例如16*16)里渲染东西,只对可见像素着色
  • NVIDIA Tegra - Classic: Render everything
    英伟达图睿 – 典型的:渲染所有东西
  • Qualcomm Adreno - Tiled: Render everything in tile, engineered in large tiles (as 256k). Adreno 3xx can switch to traditional.
    高通Adreno – 平铺的:. 在单元里渲染所有东西,在大单元(例如256k)里加强。Adreno.3xx可以切换到传统模式
  • ARM Mali Tiled: Render everything in tile, engineered in small tiles (as 16x16)
    ARM Mali Tiled:在单元里渲染所有东西,在小单元(例如16*16)里加强

Spend some time looking into different rendering approaches and design your game accordingly. Pay especial attention to sorting. Define the lowest end supported devices early in the dev cycle. Test on them with the profiler on as you design your game.

花一些时间来深入了解不同的渲染方法,并相应地设计你的游戏。尤其需要注意排序。在开发过程的前期定义好支持最低终端设备。使用分析器测试你设计你的游戏运行在的平台设备。

Use platform specific texture compression.

使用平台特定的纹理压缩。

Further reading 扩展阅读

Screen resolution 屏幕分辨率

Android version 安卓版本

 iOS

GPU

Only PowerVR architecture (tile based deferred) to be concerned about.

只需要考虑PowerVR体系结构(基于平铺延迟的)。

  • ImgTec PowerVR SGX. Tile based, deferred: render everything in tiles, shade only visible pixels
    基于平铺延迟的:在单元里渲染所有东西,只对可见的像素着色。
  • ImgTec .PowerVR MBX. Tile based, deferred, fixed function - pre iPhone 4/iPad 1 devices
    基于平铺延迟的,固定编程的 - iPhone 4/iPad 1之前的设备

This means: 这意味着:

  • Mipmaps are not so necessary.
    Mipmaps不是那么必需的。
  • Antialiasing and aniso are cheap enough, not needed on iPad 3 in some cases
    反锯齿和反向异性是足够简单的,在某些情况下不需要在iPad 3上。

And cons: 以及缺点:

  • If vertex data per frame (number of vertices * storage required after vertex shader) exceeds the internal buffers allocated by the driver, the scene has to be 'split' which costs performance. The driver might allocate a larger buffer after this point, or you might need to reduce your vertex count. This becomes apparent on iPad2 (iOS 4.3) at around 100 thousand vertices with quite complex shaders.
    如果每帧的顶点数据(在顶点着色之后所需的顶点*空间的数目)超过了驱动分配的内部缓存,屏幕将不得不进行分屏,这将消耗性能。在这点之后,驱动可能会分配一个更大的缓存,或者你可能需要降低你的顶点数量。这在iPad2 (iOS 4.3)变为是编程透明的,即在一个相当复杂的着色器中大约100,000个顶点。
  • TBDR needs more transistors allocated for the tiling and deferred parts, leaving conceptually less transistors for 'raw performance'. It's very hard (i.e. practically impossible) to get GPU timing for a draw call on TBDR, making profiling hard.
    TBDR需要分配更多的晶体管来进行覆盖,而且是部分延迟的,理论上留给原生性能更少的晶体管。在TBDR上得到GPU的一个绘制调用时间是非常困难的,这使得分析变得困难。

Further reading 扩展阅读

Screen resolution 屏幕分辨率

iOS version (iOS版本 )

Dynamic Objects 动态对象

Asset Bundles 资源包

  • Asset Bundles are cached on a device to a certain limit
    在某种程度内资源包可以被缓存在设备上
  • Create using the Editor API
    使用编辑器API来创建
  • Load 加载
    • Using WWW API: WWW.LoadFromCacheOrDownload
      使用WWW API:WWW.LoadFromCacheOrDownload
    • As a resource: AssetBundle.CreateFromMemory or AssetBundle.CreateFromFile
      最后一个方法:AssetBundle.CreateFromMemory或AssetBundle.CreateFromFile
  • Unload 卸载
    • AssetBundle.Unload
      • There is an option to unload the bundle, but keep the loaded asset from it
        你可以选择来卸载一个包,但是保留从它加载的资源
      • Also can kill all the loaded assets even if they're referenced in the scene
        也可以关闭所有加载的资源,甚至当它们已经在场景中被引用时
    • Resources.UnloadUnusedAssets
      • Unloads all assets no longer referenced in the scene. So remember to kill references to the assets you don't need.
        卸载所有的不在场景中被引用的资源。因此,请记住要关闭你不需要的资源的引用。
      • Public and static variables are never garbage collected.
        公有的和静态的变量是永远不会被垃圾回收的。
    • Resources.UnloadAsset
      • Unloads a specific asset from memory. It can be reloaded from disk if needed.
        从内存中卸载一个特定的资源。如果需要,它可以再次从硬盘上加载进来。

Is there any limitation for download numbers of Assetbundle at the same time on iOS? (e.g Can we download over 10 assetbundles safely at the same time(or every frame)? )
在iOS上,在同一时间加载的Assetbundle的数目有限制吗?(例如,我们可以在同一时间(或者每一帧)安全地加载超过10个assetbundle吗?)

Downloads are implemented via async API provided by OS, so OS decides how many threads need to be created for downloads. When launching multiple concurrent downloads you should keep in mind total device bandwidth it can support and amount of free memory. Each concurrent download allocates its own temporal buffer, so you should be careful there to not run out of memory.

通过OS提供的异步API实现加载,因此OS决定加载时需要创建多少线程。当开启多个并发的加载时,你应该时刻记住设备支持的所有带宽以及空余内存的数量。每一个并发的加载分配它自己临时的缓存,因此在这里你应该小心不要超出内存。

Resources 资源

  • Assets need to be recognized by Unity to be placed in a build.
    在发布过程中,Unity需要识别资源来放置它们。
  • Add .bytes file extension to any raw bytes you want Unity to recognize as a binary data.
    将.bytes文件扩展作为一个二进制数据添加到你想要Unity识别的任何原始字节。
  • Add .txt file extension to any text files you want Unity to recognize as a text asset
    将.txt文件扩展作为一个文本资源添加到你想要Unity识别的任何文本文件。
  • Resources are converted to a platform format at a build time.
    在发布的时候,资源将会被转换为特定的平台格式。
  • Resources.Load()

Silly issues checklist 不应该做的事情的清单

  • Textures without proper compression 没有经过合适压缩的纹理
    • Different solutions for different cases, but be sure to compress textures unless you're sure you should not.
      不同的情况下有不同的分辨率,但是确保压缩你的纹理除非你肯定你不应该。
    • ETC/RGBA16 - default for android
      安卓的默认模式
      • but can tweak depending on the GPU vendor
        但是会根据GPU供应商而改变
      • best approach is to use ETC where possible
        最好的方法是尽可能使用ETC
      • alpha textures can use two ETC files with one channel being for alpha
        透明纹理可以使用两个ETC文件,其中一个通道用于透明度
    • PVRTC - default for iOS (iOS的默认模式)
      • good for most cases 在大多数情况是好的
  • Textures having Get/Set pixels enabled - doubles the footprint, uncheck unless Get/Set is needed
    开启了Get/Set像素的纹理 – 加倍了封装,除非需要,否则不要选中Get/Set
  • Textures loaded from JPEG/PNGs on the runtime will be uncompressed
    动态加载的JPEG/PNGs纹理将会被压缩
  • Big mp3 files marked as decompress on load
    大型mp3文件在加载时被标记为未压缩的
  • Additive scene loading 附加的场景加载
  • Unused Assets that remain uncleaned in memory 内存中保留了没有被清理的未使用的Assets
    • Static fields 静态区域
    • not unloaded asset bundles 未加载的资源包
  • If it randomly crashes, try on a devkit or a device with 2 GB memory (like Ipad 3).
    如果它随机崩溃,尝试在一个开发工具或一个具有2GB(例如Ipad 3)的设备上运行。

Sometimes there's nothing in the console, just a random crash

有时控制台里没有发生任何状况,而仅仅是一个随机崩溃。

  • Fast script call and stripping may lead to random crashes on iOS. Try without them.
    在iOS上,快速脚本调用和代码剥离可能会导致崩溃。尽量不要使用它们。

页面最后更新:2012-10-10

分类:Manual| 翻译: 悄悄