Adv

11/05/2010

Reality behind RFS Lag of Samsung Galaxy S

Reality behind RFS Lag - xda-developers.

Reality behind RFS Lag




This is probably missing a lot of facts that we haven't uncovered yet. When we learn more, we can update what we know here

Background

All data is stored on an 8gb or 16gb MoviNAND chip, of which 2GB is 'system data', and the rest is for user storage. The MoviNAND is one of the first mobile 'smart SSD' chips. That means that the MoviNAND handles all operations such as data wear leveling, physical data lookup, as well as having it's own internal buffers. This cleverness is both good... and very bad.

FSYNC

When writing data to disk, your system and apps will make a call to the driver to 'write some data to file X'. This data will then be placed into kernel filesystem buffers and streamed off as commands to the MoviNAND. The MoviNAND will then slowly accept these commands, and place them into its own buffer, and the disk controller itself will then go about it's business writing this data to disk, using lookup tables to determine where to write the data to ensure maximum NAND lifetime, etc. It does a lot of work.

The system or apps also have an extra tool, called FSYNC. When this is used, the kernel and filesystem will clear the buffer for the affected file, and ensure it is written to disk. The current thread will block, and wait for the fsync call to return to signal that the data is fully written to disk. The kernel itself will wait for an event from the MoviNAND to signal that the data has been completely written.

In a 'dumb' disk, this fsync is fairly quick - the kernel buffer will be written directly to where the kernel has directed, and the round trip time (RTT) will be as long as it takes for data to be written.

In a 'very smart' desktop SSD, the fsync can return instantly - the disk controller will take the data and place it in it's battery-backup protected, and then go about it's wear leveling and writing in the background without bothering the system.

In the 'smart' MoviNAND, the fsync will take a very very long time to return - sometimes fsync on MoviNAND will take several seconds(confirm?) to return. This is because the MoviNAND may have a long line of housekeeping tasks waiting for it when a fsync is called, and it will complete all of it's tasks before returning.

RFS

RFS has a fairly badly written driver, that will call an fsync on file close.

Basically, RFS runs in 'ultra secure' mode by default. This security may not be really needed - I personally don't want it if it means enormous slow downs. It also doesn't help data security if the system/app is holding a file open, only if it closes the file. The MoviNAND is also fairly smart, and appears to write it's cache to disk before turning off, and also appears to have capacitors to keep it alive for a little bit of time in the event of a power cut.

SQLite

Most Android apps use SQLite - a fairly simple database that is easy to embed. Sqlite has 'transactions' - not real transactions, but a transaction in sqlite is where the database is locked for the duration of a database write, and multiple databases writes can be included in one transaction. At the end of a transaction, sqlite will call FSYNC on the database file, causing a possibly long wait while the MoviNAND does it's thing. Certain applications will not bunch up writes into a single transaction, and will do all of their writes in new transactions. This means that fsync will be called again and again. This isn't really a problem on most devices, as fsync is a very fast operation. This is a problem on the SGS, because MoviNAND fsync is very slow.

The various fixes and why they work

Native EXT4 to replace RFS (Voodoo)

By replacing RFS with EXT4, the 'sync on fileclose' problem is removed. The EXT series of filesystems is also more efficient at allocating information into blocks than RFS/FAT32 is. This means less real writes to MoviNAND, which means that the MoviNAND buffer should be smaller, and when a sync is called, fewer commands have to be run. When a sync is called on EXT4, it will still be very slow, as the MoviNAND's sync is still slow.
Basically, EXT4 improves filesystem grouping which leads to less commands, and does not have the broken 'sync on file close' that RFS does. It will not heavily improve sqlite database access in certain apps, as the full fsync on transaction end will still have to go through MoviNAND, and will be slow.

When pulling out the battery, there is a chance to lose data that has been written to a file but has not yet been told to sync to disk. This means that EXT4 is less secure than RFS. However, I believe the performance to be worth the risk.

Loopback EXT2 on top of RFS (OCLF)

By creating a loopback filesystem of EXT2, the 'sync on fileclose' problem is removed as well. Since the Loopback File is never closed until the EXT2 is unmounted, RFS will not call fsync when a file in the EXT2 loopback is closed. Since a single large file is created on RFS instead of multiple small files, RFS is unable to mis-allocate the file, or fragment it. The actual allocation of filesystem blocks is handled by EXT2. As a note, care should be taken in making the large file on RFS - it MUST align correctly with the MoviNAND boundries, or operations will be slowed down due to double-disk accesses for files, etc. It is unknown whether OCLF is aligning this correctly (how to determine this? 4KB block size gives double the performance of 2KB block size, so it might be aligning it correctly already).

Loopback also has the benefit of speeding up Sqlite databases (at the expense of a transaction being lost in power outage, as it could still be in ram). As always, this is a performance tradeoff between data security when the battery is pulled out, and performance. When pulling a battery out while using the loopback filesystem, there is a chance to lose the last few seconds of database writes. In practice, this isn't a huge deal for a mobile phone - most lost data will be resynced when the phone reboots. In my opinion, the performance is worth it because of the very slow speed of a sync on MoviNAND.

Loopback EXT2 on top of EXT4

All of the above for normal loopback EXT2 applies. In addition, when the loopback flushes data, it will be flushed to EXT4 instead of RFS. This will probably be better than flushing to RFS, as the RFS driver is not as well written as the EXT4 driver. The difference should not be very large, though.

Journaling

Journaling on an SSD is not required. Your data will not be lost, your puppy will not die. Here is a post made by Theodore Tso -http://marc.info/?l=linux-ext4&m=125803982214652&w=2




But there will be some distinct tradeoffs with
omitting the journal, including possibility that sometimes on an
unclean shutdown you will need to do a manual e2fsck pass.



Not using a journal is not a big deal, as long as you take care to do a full e2fsck pass when an unclear shutdown has occurred. This is the main reason for a journal - to prevent the need to do a full disk check, and instead the journal can be easily read, and the full disk check avoided.

EXT2 vs EXT4

EXT2 appears to work better on the SGS than EXT4. This is because EXT4 has more CPU overhead than EXT2. Journaling is also very bad on MoviNAND. Why? It appears to be the command buffer in the MoviNAND controller. A call to update the journal will use a command slot in the MoviNANDs buffer, that could otherwise have been used for a real disk write. This means that journaling on MoviNAND is a VERY expensive operation compared to journaling on a 'dumb' disk.

Well, you could technically use EXT4 and simply disable the high cpu and other features until you are left with EXT2, since EXT4 and EXT2 are basically the same thing.

At any rate, the difference between EXT4 and EXT2 is not very large, and there's no need for flamewars over it - it comes down to a choice of 'running' performance vs 'startup' performance, with EXT2 edging out EXT4 for everyday speed, while EXT4 not required a long disk check at boot.

Future Work

Rewrite the firmware for the MoviNAND's flash to handle fsyncs properly and not bring the system to it's knees. I joke, but this is really the true solution.

Other solutions include hacking EXT's fsync method to return instantly, and ensuring that the real fsync is called when the system shuts down. Or doing nothing, fsync is there for a reason, I guess, and would be fine if MoviNAND's fsync wasn't so very slow.

There is probably a lot of small details missing from this writeup. They'll be updated when we learn more.  Thanks for all the useful discussions and arguments, everyone!




5 則留言:

henrykvii 說...

Sorry, for lame question, but I'm not 100% sure. Does it mean that all mentioned lag fix methods are save for MoviNAND chip (not disable wear leveling for disk)?

Dennis 說...

Most of them should safe, if have no bugs, and if you use them correctly. None of them introduce addition risk to the system for all normal usage (nobody use plain fat or ext2).

The conversion and reverse conversion process is the most dangerous part, which should be fine as there are backups during the process. But there is one thing you should take care yourself - un-do the lagfix before upgrade the firmware, which I forgot.

alecao 說...

Very good post!! Very much questions answered.

links for 2011-01-21 « Doze:12 說...

[...] Reality behind RFS Lag of Samsung Galaxy S – Dennis' Blog [...]

sasa 說...

"it comes down to a choice of ‘running’ performance vs ‘startup’ performance, with EXT2 edging out EXT4 for everyday speed, while EXT4 not required a long disk check at boot."

Thanks for the very informative read.

Isn’t everyday speed something everybody wants? i mean how many times a day do you reboot your phone that fast boot is something to care about?

I’m gonna leave this EXT4 cult and convert my data/dbdata/cache to EXT2. Only thing i wiil need to find is a kernel that lets me convert system to EXT4 since speedmod does not let you do that :) DamianGTO looks promising :)