View Single Post
The number of zip files isn't directly related to the number of tasks you have: you could have a few tasks stored in 15,000 zip files, or 15,000 tasks stored in one zip file. (At the moment, I personally have 1,745 tasks in 171 zip files.)

Each zip file represents a change that is being synchronized to other copies of OmniFocus on other machines. (We often refer these changes to using database/finance terminology, calling them "transactions", and we often refer to the different copies of OmniFocus as your "sync clients.") Once every sync client has had an opportunity to see a change, it gets compacted into the "root" transaction (the one whose filename starts with 0), which records the common history which every sync client has already seen.

In other words, the number of zip files you have is directly related to how often you make changes and how long it's been since you've synced each client.


OmniFocus is aggressively paranoid about whether a sync client has seen a change or not, so when a client syncs (and writes a new status update of where it is) it leaves an old status update around for at least an hour just in case another client is also in the middle of syncing (possibly over a slow EDGE network) and might be trying to read the old status. What that means in practice is that a sync client doesn't only tie down the transactions it currently needs to get caught up, it also ties down the transactions it needed prior to its last sync. So if I only sync my laptop once every week, I might have two weeks of transactions tied down by those client records.

We're looking at trying to improve how often we're willing to compact old changes, but we have to err on the side of caution: if we don't compact frequently enough, syncing will be slow--but if we compact too frequently, syncing could lose data that a sync client still needs!


There's another problem we're trying to solve: right now, we're only able to compact segments of synced history when they converge to a single branch of changes. I'll explain this in more detail (with graphs!) in a future post.


More importantly, compacting down to fewer zip files isn't the only way to speed up sync processing. In fact, it's not the most important way! It does help, of course, but the real problem is that if you sync a remote change which is earlier than your latest local change, we don't have a good way to apply that to your current database (which has been modified from what the other client would have seen), so we end up rebuilding your database from scratch. Rebuilding the database means we have to reprocess every transaction, and that's where lots of transactions makes syncing slow—but it would be even better if we could just apply the minimal set of changes to your existing database to get it into the right state. (Having 15,000 zip files of history matters a lot less if you're only having to look at the last few of them!)

That's the problem I'm solving right now—and in my explorations so far, it makes a huge difference. (No more "Updating with synced data" screens on the iPhone!) But before I can feel comfortable unleashing this optimized sync code on the world, I have lots of testing to do to prove that the incremental changes we're making are getting it to the exact same state as rebuilding the database would.