Optimisation

Starship optimisation toolkit

The Starship software optimization suite can be seen as a “windows mega-mod” which works across the board using a variety of customizations and additional software to achieve a range of goals. Around 25 different tools and modifications are used to create a windows that is more stable and efficient whilst also performing better.

The one hardware element is the super-cooling which should be seen as an optional extra since it is still in prototype phase. The software suite only employs techniques or utilities that have been tested over a minimum of 5 years, there is a fine line between stability and performance and finding the sweet spots can take many years.

The result is something we are proud to be able to offer as a best case setup for creative productions and overall work and play. Everything is designed to maintain compatibility and reduce the risk as much as possible. If things go wrong you can restore everything in a flash and updates come pre-tested for the software you use.

We’ve focused around a specific hardware range to optimize our ability to provide a solid and high performance machine. This focuses our delivery and support community providing maximum compatibility for newer functionality and broadest community support around the web and from the manufacturers.

IO Optimisations

  • Reducing the load on file acquisitions by pre-caching small files in RAM

As the computer loads programs or data, there are a range of files that get acquired from the drive. In the case of a mechanical hard-disk especially, the process of seeking (the needle on the record finding the next track) is by far the slowest part of the process and the small files cause the most delays often slowing the disk data-rate to a few MB/s.

By filling a memory cache with every small file on your system drive with relatively small amounts of memory. A typical usage case would involve say 60,000 files under 64K using 64MB, you can load that data into memory within a fraction of a second at boot and since that moment the drive doesn’t need to acquire any file under 64K.

SSDs also benefit from this process in multiple ways. Reducing the requirement of small files improves the speed of the overall subsystem. Allowing the SSD to focus on larger files also has the benefit of reducing the time necessary to do absolutely anything on the computer by just a tiny amount since those small files are now acquired at GB/s.

  • Reducing and consolidating file writes via delayed block writes and defragmentation before disk

This involves diverting all disk writes to a memory buffer and holding it there for extended periods of time. It is applied to the system partition only, so does not affect your data which is stored on a separate partition using standard windows file writing mechanisms.

Whilst in memory, these writes are often rewritten or deleted and the memory cache will trim them. In standard usage cases this reduction in written data adds up to reductions small as 15-20% and as high as 60-70%. Data is flushed either when the cache is full or on demand using a keyboard shortcut which flushes the entire cache at once.

In addition, while the data is in memory defragmentation can be applied to it, which ensures that it can be read in one go from the drive if needed in future. Groups of files are clustered together because of the time they were modified so in the case of the files currently in the cache it makes perfect sense to group them together.

A typical cache can be anything from 128MB on small machines to 8-16GB on larger machines.  This also provides a fail-safe system for some systems in that you can decide to flush the data or in the case of failure switching off the machine manually would prevent any changes from being stored to the system drive.

  • Reducing device load with cluster-size and short-stroke optimizations

The system drive is separated from the data drive and uses slightly larger cluster sizes that the default NTFS of 4K. Using larger sizes reduces the total number of all the clusters on the drive, making every file-system transaction that little bit more efficient. Data drives go even larger (to 64K), reducing their MFT and CPU footprint.

There is a small amount of space wastage involved in doing this, as with slightly larger clusters means more empty data when storing small files (1K will still take up 4K in the default setting). This is more than compensated in the reduced footprint caused by an efficient build of windows and the file precaching.

Data which has relevance to other data is clumped together on the disk via the time it was modified. This means that say all the files that were modified in a particular period (say all the files in the Photoshop update or a particular session) were in one part of the disk so an increase in the likelihood of read-ahead cache hits.

Hard disks benefit much more from this approach because the collection of files needed to perform a task tend to be within a “short-stroke” of each other so the drive spends less time shifting between them. Dramatic improvements can be seen from this approach with boot and application startup times halved.

  • Resulting benefits
    A reduction in power over the disk subsystem
    Increase in speed and CPU efficiency
    Balanced risk/reward

CPU (Multiple Intel Xeon)

  • More raw performance by Overclock

Minimal overclocking control is necessary to get more out a CPU while keeping it within its rated power profile. This already improves performance without taking any risk at all. The specialised cooling systems of our machines will hopefully allow us to go much further while remaining within safe margins for professional use.

At the time of writing, the scene of overclocking the cpu has gotten to the stage of liquid nitrogen to take the GHz into the 8.7 range. A previous approach of Freon cooling has been left behind but remains a fairly untapped solution to safely producing stable 6GHz operation safe for domestic and professional use.

  • CPU cache efficiency improved via core affinity and timeslice optimisation

Moving all the system applications onto the back-of the queue cores (that last in the list) frees up much of the random threads running around the system from ever touching the majority of cores. There is still enough performance to run all the system tasks but their presence on the cpu cache and registers is reducted.

Because the main bulk of the CPU cores are now freed up, the cache efficiency of the processes you are running heavy applications will be closer to 100%. Additionally, forcing longer time-slices of the CPU to the foreground application gives better cache efficiency.

  • Easy switching between power modes

Providing easy shortcut access to CPU power profiles allows easy experimentation without risk of accidental overheating. Lean, built in software is pre-configured for different “warp-modes” allowing for total control over how hot, fast and hungry your computer is.

Quiet forces the computer to remain under a certain threshold putting efficiency before performance

Normal allows the CPU to perform optimally within its rated TDP

Performance pushes the CPU outside the boundaries but never close to any risky speed

Max pushes the CPU as far as it can go without crashing, although there is always a small risk if you use this

  • Resulting benefits
    Increase in core speed
    Increase in efficiency of on-silicon branch-prediction, registers and cache
    Instant control over performance modes

Secondary processing (Nvidia GPU,Intel PHI)

  • Dual acceleration from Nvidia Quadro and Intel Xeon Phi coprocessors

Using a combination of high-end Nvidia visual and CUDA processing with Xeon Phi x86 and OpenCL processing provides a wide range of compatibility with the broadest range of applications. Both 32 and 64 bit operations are accelerated as well as the option to combine processing or run multiple loads at the same time.

  • Adjust GPU performance range settings to accommodate new cooling system (Vbios)

The GPU can be modified to change its power and performance setting on the onboard device memory. It has safety margins for power draw which need to be increased to allow for faster operation while staying within the safe temperature and stability boundaries to make it viable for commercial and creative use.

  • Test new drivers and provide qualification when it’s time to upgrade

Video drivers are delivered as soon as they are available and a sub-forum is created to respond to any experiences people have updating or using the new driver with each application having its own thread. Based on our community feedback we make decisions on each iteration of the official rollback.

  • Resulting benefits
    More performance within safe parameters
    32 and 64 bit OpenCL, Cuda and IA32 acceleration
    Ready for industrial and scientific applications

Operating system

  • A managed and heavily modified windows distribution

The operating system is stripped down to only the important feature-set for workstation usage. Based on years of testing and produced in collaboration with major contributors in the scene, the modifications take into account the need for absolute stability and are carefully chosen to minimize impact while maximizing performance.

  • Software suite is lean and transparent

A small range of utilities are used to provide a range of optimisations, all of which are very small and selected for their ability to be configured precisely. When done properly, these optimizations will be mostly transparent to the user. There are a few extra controls that are necessary to know such as “flush disk now” and CPU/GPU modes.

  • Easy restore with certified system snapshots

Updates to the operating system are provided in a managed way and rolled out over a 3-month schedule. This improves stability and gives us time to make sure our certified applications are all working correctly. The user is free to update or customise however they want with the knowledge that they have a certified rollback to rely on.

  • Resulting benefits
    Reduced system management
    Increased stability
    Lowered memory, disk, network and CPU footprint

Example 1
Laptop on batteries with 2GB ram and Celeron running on modified windows 7 home basic 32bit

The stripped down version of Windows plus all apps is taking around 500MB at bootup and in addition there is a 256MB file precache (with all files under 1MB precached with around 2 seconds load time on a 7200RPM HDD) and a 256MB block-level write cache (around 220MB with 32MB overhead). This leaves around 1GB of headroom.

I’ve been using this 2GB 10-year-old Celeron laptop for the last hour to write this document and here are the before and after screenshots of a cache flush I initiated. The Total Write is the amount of data that would have been sent to the disk, Normal Write represents the data that was manually flushed to the disk.

current1Cache-flushcurrent2

This is an IO report of the system partition which includes Windows and any additional software such as Office etc. Over the last hour, only 1GB has been read from the drive. This included the modified windows bootup and precaching files and loading word and winamp (to listen to music).

It does not include the music I was listening to or the writing of this document itself as they are stored on the data partition which is excluded from this typing of caching to maintain data security. So mostly what is being written here is a trickle of data files and logs, system maintenance and other minor operations.

Because none of them were actually written to the disk, the device itself can stay dormant in a low power state for longer. Waking up either to write out data because the cache became full, or because the user initiated a flush writing everything out to the disk, which only has to then wake up for a few seconds and back to sleep.

Because for the most part nothing that has changed on my system disk at this moment is actually important – even if I did lose power suddenly and lost the 120MB that hadn’t been written yet – that still would have no impact on my system and I would not have lost this document.

Since I have been running on batteries this whole time, I appreciate the reduced disk usage. When the operating system is trickling a few small files to the disk, this forces the device to remain active and in higher power-states for longer periods. I’ve gained a percentage of battery life out of this process as well as great responsiveness.

Example 2
Mobile workstation

2010 Alienware M15X Laptop, Intel Core i7 920XM (Quality Sample), 2013 Nvidia Geforce GTX 770M (GK106 Kepler), 16GB Crucial DDR3 Ram, Samsung 840 SSD (500GB), Windows 10 Technical Preview using DirectX 11 drivers

alienware-m15x_3

aida64 cinebench lux PCMark synthIO

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s