优化 Optimizations

Date:2013-07-19 15:52

Just like on PCs, mobile platforms like iOS and Android have devices of various levels of performance. You can easily find a phone that's 10x more powerful for rendering than some other phone. Quite easy way of scaling:


  1. Make sure it runs okay on baseline configuration
  2. Use more eye-candy on higher performing configurations:
    • Resolution 分辨率
    • Post-processing 后处理效果
    • MSAA 多重采样抗锯齿
    • Anisotropy 各向异性
    • Shaders 着色器
    • Fx/particles density, on/off
      特效/粒子密度,开启/关闭 着眼于GPUs Focus on GPUs

Focus on GPUs

Graphics performance is bound by fillrate, pixel and geometric complexity (vertex count). All three of these can be reduced if you can find a way to cull more renderers. Occlusion culling and could help here. Unity will automatically cull objects outside the viewing frustum.


On mobiles you're essentially fillrate bound (fillrate = screen pixels * shader complexity * overdraw), and over-complex shaders is the most common cause of problems. So use mobile shaders that come with Unity or design your own but make them as simple as possible. If possible simplify your pixel shaders by moving code to vertex shader.


If reducing the Texture Quality in Quality Settings makes the game run faster, you are probably limited by memory bandwidth. So compress textures, use mipmaps, reduce texture size, etc.

如果在Quality Settings里降低Texture Quality的值来使得游戏运行的更流畅,你可能会被内存带宽所限制。因此请压缩纹理,使用mipmaps,减少纹理大小等等。

LOD (Level of Detail) ' make objects simpler or eliminate them completely as they move further away. The main goal would be to reduce the number of draw calls.


Good practice 优秀的实践

Mobile GPUs have huge constraints in how much heat they produce, how much power they use, and how large or noisy they can be. So compared to the desktop parts, mobile GPUs have way less bandwidth, low ALU performance and texturing power. The architectures of the GPUs are also tuned to use as little bandwidth & power as possible.


Unity is optimized for OpenGL ES 2.0, it uses GLSL ES (similar to HLSL) shading language. Built in shaders are most often written in HLSL (also known as Cg). This is cross compiled into GLSL ES for mobile platforms. You can also write GLSL directly if you want to, but doing that limits you to OpenGL-like platforms (e.g. mobile + Mac) since there currently are no GLSL->HLSL translation tools. When you use float/half/fixed types in HLSL, they end up highp/mediump/lowp precision qualifiers in GLSL ES.

Unity优化了OpenGL ES 2.0,它使用GLSL ES(和HLSL类似)着色语言。内置的着色器大部分都是使用HLSL(也被称为Cg)编写的。对于移动平台,这被交叉编译为GLSL ES。如果你想,你也可以直接使用GLSL,但是这样做会限制你发布在OpenGL的平台(例如移动平台+Mac),因为目前没有GLSL到HLSL的转换工具。当你在HLSL中使用float/half/fixed 类型时,在GLSL ES中它们是由highp/mediump/lowp前置标识符结束的。

Here is the checklist for good practice:


  1. Keep the number of materials as low as possible. This makes it easier for Unity to batch stuff.
  2. Use texture atlases (large images containing a collection of sub-images) instead of a number of individual textures. These are faster to load, have fewer state switches, and are batching friendly.
  3. Use Renderer.sharedMaterial instead of Renderer.material if using texture atlases and shared materials.
    如果使用了纹理精灵和共享材质,使用Renderer.sharedMaterial 来代替Renderer.material 。
  4. Forward rendered pixel lights are expensive.
    • Use light mapping instead of realtime lights where ever possible.
    • Adjust pixel light count in quality settings. Essentially only the directional light should be per pixel, everything else - per vertex. Certainly this depends on the game.
  5. Experiment with Render Mode of Lights in the Quality Settings to get the correct priority.
    反复调整Quality Settings中的Render Mode of Lights来得到正确的优先级。
  6. Avoid Cutout (alpha test) shaders unless really necessary.
  7. Keep Transparent (alpha blend) screen coverage to a minimum.
  8. Try to avoid situations where multiple lights illuminate any given object.
  9. Try to reduce the overall number of shader passes (Shadows, pixel lights, reflections).
  10. Rendering order is critical. In general case:
    1. fully opaque objects roughly front-to-back.
    2. alpha tested objects roughly front-to-back.
    3. skybox. 天空盒子
    4. alpha blended objects (back to front if needed).
  11. Post Processing is expensive on mobiles, use with care.
  12. Particles: reduce overdraw, use the simplest possible shaders.
  13. Double buffer for Meshes modified every frame:
void Update (){
  // flip between meshes
  bufferMesh = on ? meshA : meshB;
  on = !on;
  bufferMesh.vertices = vertices; // modification to mesh
  meshFilter.sharedMesh = bufferMesh;

Sharer optimizations 着色器优化

Checking if you are fillrate-bound is easy: does the game run faster if you decrease the display resolution? If yes, you are limited by fillrate.


Try reducing shader complexity by the following methods:


  • Avoid alpha-testing shaders; instead use alpha-blended versions.
  • Use simple, optimized shader code (such as the 'Mobile' shaders that ship with Unity).
  • Avoid expensive math functions in shader code (pow, exp, log, cos, sin, tan, etc). Consider using pre-calculated lookup textures instead.
    避免在着色器代码里使用高昂的数学函数(pow, exp, log, cos, sin, tan等等)。考虑使用预计算的查表贴图来代替。
  • Pick lowest possible number precision format (float, half, fixedin Cg) for best performance.
    为了得到最高性能,选择最低可能的精度数目格式(Cg中是float, half, fixed)。

Focus on CPUs

It is often the case that games are limited by the GPU on pixel processing. So they end up having unused CPU power, especially on multicore mobile CPUs. So it is often sensible to pull some work off the GPU and put it onto the CPU instead (Unity does all of these): mesh skinning, batching of small objects, particle geometry updates.


These should be used with care, not blindly. If you are not bound by draw calls, then batching is actually worse for performance, as it makes culling less efficient and makes more objects affected by lights!


Good practice 优秀的实践

  • Don't use more than a few hundred draw calls per frame on mobiles.
  • FindObjectsOfType (and Unity getter properties in general) are very slow, so use them sensibly.
  • Set the Static property on non-moving objects to allow internal optimizations like static batching.
  • Spend lots of CPU cycles to do occlusion culling and better sorting (to take advantage of Early Z-cull).
    花费大量的CPU循环来进行遮挡剔除和更好的排序(利用Early Z-cull)

Physics 物理

Physics can be CPU heavy. It can be profiled via the Editor profiler. If Physics appears to take too much time on CPU:


  • Tweak Time.fixedDeltaTime (in Project settings -> Time) to be as high as you can get away with. If your game is slow moving, you probably need less fixed updates than games with fast action. Fast paced games will need more frequent calculations, and thus fixedDeltaTime will need to be lower or a collision may fail.
    把Time.fixedDeltaTime (在Project settings -> Time)的值调整为你可以接受的最高值。如果你的游戏移动很慢,相对于那么快速动作的游戏,你可能需要更小的固定更新。快速步调的游戏将需要更频繁的计算,因此 fixedDeltaTime 需要降低,否则碰撞可能会失败。
  • Physics.solverIterationCount (Physics Manager).
  • Use as little Cloth objects as possible.
  • Use Rigidbodies only where necessary.
  • Use primitive colliders in preference mesh colliders.
  • Never ever move a static collider (ie a collider without a Rigidbody) as it causes a big performance hit.
    • Shows up in Profiler as 'Static Collider.Move' but actual processing is in Physics.Simulate
      在分析器里显示为Static Collider.Move,但是实际上是在Physics.Simulate里处理的。
    • If necessary, add a RigidBody and set isKinematic to true.
      如果必需,添加一个刚体,并选中它的isKinematic 。
  • On Windows you can use NVidia's AgPerfMon profiling tool set to get more details if needed.



These are the popular mobile architectures. This is both different hardware vendors than in PC/console space, and very different GPU architectures than the 'usual' GPUs.


  • ImgTec PowerVR SGX - Tile based, deferred: render everything in small tiles (as 16x16), shade only visible pixels
    ImgTec PowerVR SGX – 基于平铺的,延迟的:在小单元(例如16*16)里渲染东西,只对可见像素着色
  • NVIDIA Tegra - Classic: Render everything
    英伟达图睿 – 典型的:渲染所有东西
  • Qualcomm Adreno - Tiled: Render everything in tile, engineered in large tiles (as 256k). Adreno 3xx can switch to traditional.
    高通Adreno – 平铺的:. 在单元里渲染所有东西,在大单元(例如256k)里加强。Adreno.3xx可以切换到传统模式
  • ARM Mali Tiled: Render everything in tile, engineered in small tiles (as 16x16)
    ARM Mali Tiled:在单元里渲染所有东西,在小单元(例如16*16)里加强

Spend some time looking into different rendering approaches and design your game accordingly. Pay especial attention to sorting. Define the lowest end supported devices early in the dev cycle. Test on them with the profiler on as you design your game.


Use platform specific texture compression.


Further reading 扩展阅读

Screen resolution 屏幕分辨率

Android version 安卓版本



Only PowerVR architecture (tile based deferred) to be concerned about.


  • ImgTec PowerVR SGX. Tile based, deferred: render everything in tiles, shade only visible pixels
  • ImgTec .PowerVR MBX. Tile based, deferred, fixed function - pre iPhone 4/iPad 1 devices
    基于平铺延迟的,固定编程的 - iPhone 4/iPad 1之前的设备

This means: 这意味着:

  • Mipmaps are not so necessary.
  • Antialiasing and aniso are cheap enough, not needed on iPad 3 in some cases
    反锯齿和反向异性是足够简单的,在某些情况下不需要在iPad 3上。

And cons: 以及缺点:

  • If vertex data per frame (number of vertices * storage required after vertex shader) exceeds the internal buffers allocated by the driver, the scene has to be 'split' which costs performance. The driver might allocate a larger buffer after this point, or you might need to reduce your vertex count. This becomes apparent on iPad2 (iOS 4.3) at around 100 thousand vertices with quite complex shaders.
    如果每帧的顶点数据(在顶点着色之后所需的顶点*空间的数目)超过了驱动分配的内部缓存,屏幕将不得不进行分屏,这将消耗性能。在这点之后,驱动可能会分配一个更大的缓存,或者你可能需要降低你的顶点数量。这在iPad2 (iOS 4.3)变为是编程透明的,即在一个相当复杂的着色器中大约100,000个顶点。
  • TBDR needs more transistors allocated for the tiling and deferred parts, leaving conceptually less transistors for 'raw performance'. It's very hard (i.e. practically impossible) to get GPU timing for a draw call on TBDR, making profiling hard.

Further reading 扩展阅读

Screen resolution 屏幕分辨率

iOS version (iOS版本 )

Dynamic Objects 动态对象

Asset Bundles 资源包

  • Asset Bundles are cached on a device to a certain limit
  • Create using the Editor API
  • Load 加载
    • Using WWW API: WWW.LoadFromCacheOrDownload
      使用WWW API:WWW.LoadFromCacheOrDownload
    • As a resource: AssetBundle.CreateFromMemory or AssetBundle.CreateFromFile
  • Unload 卸载
    • AssetBundle.Unload
      • There is an option to unload the bundle, but keep the loaded asset from it
      • Also can kill all the loaded assets even if they're referenced in the scene
    • Resources.UnloadUnusedAssets
      • Unloads all assets no longer referenced in the scene. So remember to kill references to the assets you don't need.
      • Public and static variables are never garbage collected.
    • Resources.UnloadAsset
      • Unloads a specific asset from memory. It can be reloaded from disk if needed.

Is there any limitation for download numbers of Assetbundle at the same time on iOS? (e.g Can we download over 10 assetbundles safely at the same time(or every frame)? )

Downloads are implemented via async API provided by OS, so OS decides how many threads need to be created for downloads. When launching multiple concurrent downloads you should keep in mind total device bandwidth it can support and amount of free memory. Each concurrent download allocates its own temporal buffer, so you should be careful there to not run out of memory.


Resources 资源

  • Assets need to be recognized by Unity to be placed in a build.
  • Add .bytes file extension to any raw bytes you want Unity to recognize as a binary data.
  • Add .txt file extension to any text files you want Unity to recognize as a text asset
  • Resources are converted to a platform format at a build time.
  • Resources.Load()

Silly issues checklist 不应该做的事情的清单

  • Textures without proper compression 没有经过合适压缩的纹理
    • Different solutions for different cases, but be sure to compress textures unless you're sure you should not.
    • ETC/RGBA16 - default for android
      • but can tweak depending on the GPU vendor
      • best approach is to use ETC where possible
      • alpha textures can use two ETC files with one channel being for alpha
    • PVRTC - default for iOS (iOS的默认模式)
      • good for most cases 在大多数情况是好的
  • Textures having Get/Set pixels enabled - doubles the footprint, uncheck unless Get/Set is needed
    开启了Get/Set像素的纹理 – 加倍了封装,除非需要,否则不要选中Get/Set
  • Textures loaded from JPEG/PNGs on the runtime will be uncompressed
  • Big mp3 files marked as decompress on load
  • Additive scene loading 附加的场景加载
  • Unused Assets that remain uncleaned in memory 内存中保留了没有被清理的未使用的Assets
    • Static fields 静态区域
    • not unloaded asset bundles 未加载的资源包
  • If it randomly crashes, try on a devkit or a device with 2 GB memory (like Ipad 3).
    如果它随机崩溃,尝试在一个开发工具或一个具有2GB(例如Ipad 3)的设备上运行。

Sometimes there's nothing in the console, just a random crash


  • Fast script call and stripping may lead to random crashes on iOS. Try without them.


分类:Manual| 翻译: 悄悄