Ga direct naar


Fragment Cache - an introduction / PHP

Thursday 10 June 2010 22:00

This article will introduce you to that powerful new form of website speed optimization called Fragment Caching. It builds upon ideas used elsewhere to make the technique even more powerful.

By Patrick van Bergen

Optimization of web content

When a website increases in complexity and starts to depend on numerous sources of data and thousands of lines of code, it slows down, no matter what programming language or database you use. And that affects both the time to serve the page and the number of pages per second that can be served by your webserver. There are many ways this problem can be tackled and all of these should be considered. More and faster hardware, more and better software, and then there is source code optimization.

When I was asked to think about ways to optimize our websites so that they could handle much more traffic, my mind wandered back to the year 1996 when I read Michael Abrash's Zen of graphics programming. In this book, which is about low-level graphics programming, Abrash does a really good job of explaining the thrill of optimization. One passage explicitly stuck with me. In the final chapter of the book he describes how he was working on a texture mapper of a 3D engine.

My X-Sharp texture mapper was in reasonable assembly—pretty good code, by most standards!—and I felt comfortable with my implementation; but then I got a letter from John Miles, who was at the time getting seriously into 3-D and is now the author of a 3-D game library. (Yes, you can license it from his company, Non-Linear Arts, if you’d like; John can be reached at 70322.2457@compuserve.com.) John wrote me as follows: “Hmm, so that’s how texture-mapping works. But 3 jumps per pixel? Hmph!

It was the “Hmph” that really got to me.

That was the first shot of juice for my optimizer (or at least blow to my ego, which can be just as productive). John went on to say he had gotten texture mapping down to 9 cycles per pixel and one jump per scanline on a 486 (all cycle times will be for the 486 unless otherwise noted); given that my code took, on average, about 44 cycles and 2 taken jumps (plus 1 not taken) per pixel, I had a long way to go." (p. 687)

This sets Michael on a track to build a texture mapper that is at least that fast. There's a pause at 10 cycles where he needs to motivate himself to do the last cycle. And then there's the mental hurdle to set himself to improve on the 9 cycle frontier. When he finally exceeds it, he writes:

And there you have it: A five to 10-times speedup of a decent assembly language texture mapper. All it took was some help from my friends, a good, stiff jolt of right-brain thinking, and some solid left-brain polishing—plus the knowledge that such a speedup was possible. Treat every optimization task as if John Miles has just written to inform you that he’s made it faster than your wildest dreams, and you’ll be amazed at what you can do!

How does this apply to webdevelopment? What is the fastest way to serve pages to the browser? Well, to serve static content, of course. But the content we're interested in is dynamic, it is created by a web application and it depends on data in the database, on user input, on time, on the code itself, and on the phase of the moon. It just isn't an option to create the page and cache it in its entirety for future use.

So if web caching is not an option, what is the next best thing? To cache parts of the page, put them together, and serve that. You can use a caching tool for this, like Memcached. There's only one catch: the cached content may need to change when any of the things it depends on changes: data, code, user input, and the like. You can pass an expiration time when you add your content to Memcache. This will cause your cache to live only a certain amount of time and then expire. Nice, but we can do better.

What are the factors on which the cache depends?

Let us start to create a list of the things on which our content depends:

  1. The code that creates it, of course
  2. The data from the database, that may be changed by the users of the website
  3. The parameters that are passed to the page (in PHP: $_REQUEST parameters)
  4. Which user is looking at the page, his privileges, the state of his visit to the website
  5. And time itself may be a factor: a page may depend on external data that you don't control and may need to poll every once in a while

So when do we need to update our cache? In other words, when does the cache expire and needs to be regenerated? Let us go through this same list once more:

  1. When the code changes. But how do you know if the code has changed? This can be a problem in itself. But it may also be really easy. When an SVN update occurs for example. SVN is the source control program we use. Is it possible to clear only those caches that depend on the lines of code that have actually changed? Theoretically, but in practice a piece of content depends on many functions in many files and is very hard to track.
  2. When the data changes that our content depends on. We can be more explicit about this. It should be possible to create a list of things our content depends on.
  3. Instead of changing the content of the cache whenever the parameters of the page change, it would be smarter to create more caches, one for each combination of parameters. However, this is only feasible when the number of combinations is limited.
  4. This is just more data and parameters, really. The user is a parameter if the content depends on him/her, otherwise it isn't. The state of the visit is data that depends on the user parameter.
  5. Time is the easiest factor. The cache may expire every minute, every day at 24:00, or the 23rd of March, but that's an unlikely case.

To expand on item 2, data, we need to be able to determine when the data has changed so that the cache can be expired. This is the hardest part, so let's go deeper into it.

Cache expiration based on data changes

There are three ways in which the cache can be expired after the data has changed.

Just before using the cache, the code using the cache may check if the data this cache depends on has changed. Simple, but computationally expensive. And it may not even be simple. For example: just before showing relational data, check if the last modified date of this data is different from the date you have stored along with the cache. This may be so costly that it just isn't worth the caching. On the other hand, if your content depends on external data that changes only infrequently, a simple check if that data has changed may save a lot of time.

The code that manages the data may clear the cache, the moment the data changes. Queries are commonly wrapped in some form of data model class. Such a class will detect a change in the data and at that point it may clear all caches that depend on it. It has a list of caches that need to be cleared. However, this is a case of strong coupling, and very undesirable. It means that your data needs to be aware of all the views that depend on it and needs to change whenever the views change. A violation of the MVC principle. That said, this type of expiration is very easy to implement. So if you're clueless about how to expire your cache in a decent way and caching is vital to your application, you may choose this form of expiration over a grinding website.

The preferred form, that I have not seen anywhere yet (which doesn't mean it has not been used before), is to use a broker. When the cache is created, the broker is informed of all the data sources this cache depends on. When a data source has changed, it just tells the broker that the data has changed. The broker then finds the caches that depend on this data, and expires them. This is both elegant and efficient. The moment the cache is called, there is no need to check if it is expired. The broker does not even need to be notified directly, it may even just listen to events that are dispatched by the system whenever data changes (thanks Taco, for this improvement on the design).

In the code presented here, we provide handles for all these forms of cache invalidation.

Side effects

There is one more topic I need to address before introducing the main subject. Code that produces content is often intermingled with code that does not directly produce content / output. Let me give you some examples:

  1. The code may increase a visit counter in the database
  2. It may write HTTP headers
  3. Other code that needs to be executed every time the page is built

It is not always possible or desirable to separate these side effects from the generated content. But you can see that they need to be treated differently, or the side effects will only happen the moment the cache is built, and be omitted the next time the page is called.

 

Fragment cache

A fragment cache is a cache for part of a page. It also offers some management functions for the cache, to make life easier.

In Ruby on Rails a fragment cache looks like this:

 

<% cache('all_available_products')  do %>
...content to be cached...
<% end %>

 

By the way, the RailsLab created a really insightful video on fragment caching: view it.

Notice the fragment identifier ("all_available_products") and the fact that the output of the fragment is output to the browser immediately.

A similar pattern we see in an interesting PHP framework called Yii.

 

if($this->beginCache('all_available_products')) {
    echo "content to be cached...";
    $this->endCache();
}

 

This pattern is perfect for simple caches. When expiration conditions get more complicated, the code tends to get messy:

 

<?php if($this->beginCache($id, array(
        'dependency'=>array(
        'class'=>'system.caching.dependencies.CDbCacheDependency',
        'sql'=>'SELECT MAX(lastModified) FROM Post')))) { ?>
    echo "content to be cached...";
<?php $this->endCache(); } ?>

 

In our code, this pattern looks similar:

 

if (!FragmentCache::beginCache('all_available_products')) {
        echo "content to be cached...";
        FragmentCache::endCache();
}

 

The fragment id 'all_available_products' can be left out if you don't need to reference it anywhere else. A fragment id will be created for you, based on the filename it occurs in, and the starting line of the fragment in it. You can also pass a second parameter to beginCache, the parameter $ttl (time-to-live) which tells the fragment after how many seconds it should be invalidated. Between the beginCache and endCache you are allowed to start other, nested, fragments.

We limit the number of parameters to two, however. We want to avoid the clutter we find in the beginCache function of Yii. For this reason, we decided on a class based solution for more complex fragments. This is what a complex fragment call looks like in our code.

 

echo FragmentCache::getCache('FragmentAllAvailableProducts');

 

Instead of passing all parameters directly to 'beginCache', we pass the name of a fragment class to the function 'getCache'. Two things are different in this code from the examples before:

  1. The content generating code is not included on the spot, only the class containing the code is named (FragmentAllAvailableProducts).
  2. The content is not output directly to the browser, it is returned as the result of a function call.

This offers the extra properties that the Ruby and Yii patterns did not have:

  1. The fragment can be included on more than one page without having to duplicate the code or to wrap it in an extra function.
  2. The returned cache contents can be processed before it is sent to the browser. This adds to the flexibility of the code.
  3. To determine if the cache is still valid, we can use any PHP code, and not just some predefined clauses

The actual fragment class looks like this:

 

class FragmentAllAvailableProducts extends Fragment
{   
    public function getOutput()
    {
        return "content to be cached...";
    }
}

And so the actual benefit of our syntax is that complex expiration conditions can be neatly wrapped in a class, where they can make use of all the richness of OOP.

That covers the basics. Let's go on to describe the current set of features.

Features

In these examples we will remove the getOutput() method from the class, for brevity.

Expiration interval:

Expire the cache (time-to-live) after n (= 5) seconds:

 

class FragmentAllAvailableProducts extends Fragment
{
    public function getTTL()
    {
        return 5;
    }
}

 

Using parameters

A fragment may be subject to given parameters. These may have been passed as request parameters to the page, or they have been created inside the page. A different cache will be created for each combination of the given parameters.

This is how the parametrized cache is retrieved with a single parameter:

 

echo FragmentCache::getCache('FragmentSingleProduct', array('product_id' => $productId));

 

The parameters are available in every method of the Fragment class, via $this->fragmentParameters. In this example we create a different cache for each product:

 

class FragmentAllAvailableProducts extends Fragment
{   
    public function getOutput()
    {
        $productId = $this->fragmentParameters['product_id'];
        return "do something with $productId...";
    }
}

 

Side effects

As described above, your fragment code may need to contain parts that need to be executed each time the page is built. Therefore code cannot be part of the "getOutput()" function and is placed in "execute()". You may also consider moving this code out of the fragment altogether. However, when this fragment is nested (see below) it becomes important to place the code inside the fragment.

 

class FragmentAllAvailableProducts extends Fragment
{   
    public function execute()
    {
        header('Content-type: application/pdf');
    }
}

 

Fragment nesting

Fragments may be nested. This can be useful if several fragments make use of the same piece of code. When you turn this piece of code into a fragment, it will need to be executed only once, and then cached for the other fragments. The "execute()" code of the child fragments will be executed each time a parent fragment is included.

An example:

 

class FragmentAllAvailableProducts extends Fragment
{   
    public function getOutput()
    {
        $part1 = FragmentCache::getCache('FragmentProductIntroduction');
        $part2 = FragmentCache::getCache('FragmentAllAvailableProducts');
        $part3 = FragmentCache::getCache('FragmentProductFooter');
        return $part1 . $part2 . $part3;
    }
}

 

When a child fragment becomes invalid, the parent will become invalid as well, automatically.

Data source dependencies

This is the most advanced part of the code. In this part you can specify the data your fragment depends on. The data is identified by data source identifiers. These identifiers may be any type of string as long as it is clear that they are systemwide representations of pieces of data. A single relation may be identifier by "relation_19866". The set of all relations may be identifier by "all_relations" or whatever. What matters is that the identifier can uniquely identify the data and that the identifier is actively used in the system to notify the FragmentCache that the data has changed.

By specifying the data sources this fragment depends on, the FragmentCache framework will create a link between these data sources and the cache. Whenever a data source changes, and the FragmentCache manager is informed, the cache of this fragment will be expired automatically by the cache manager.

 

class FragmentAllAvailableProducts extends Fragment
{   
    public function getDataSourceIds()
    {
        return array('all_products');
    }
}

 

This example fragment shows all products. Therefore it depends on the data of all products. If any of the existing products changes, the cache needs to expire and to be refreshed. The same thing holds if a new product is introduced. So it is not possible to list all the ids of the products in the getDataSourceIds() functions. A datasource id is introduced that represents all products. Now the code that manages products is responsible for notifying the FragmentCache if any product is added, removed or modified. At that time it calls:

 

FragmentCache::dataSourceChanged('all_products');

 

And the FragmentCache will remove the cache for FragmentAllAvailableProducts.

Custom expiration code

When the above expiration methods are not sufficient to handle your specific expiration case, you may define your own in the function isValid(). This function is called just before the cache is retrieved. If "isValid()" fails, a new cache will be created.

 

class FragmentAllAvailableProducts extends Fragment
{   
    public function isValid()
    {
        return ...custom validation code...;
    }
}

 

You often use this function to determine if the data has changed: if the data changes, the cache will be invalid. To know if the data has changed, you need to store what the data was like when the cache was built. For this reason, you can use the state variable $fragmentData. In the following example, a hash is created of a bunch of data. Then this hash is compared to the hash that was stored when the cache was created. If the hash is different, the cache is invalid and should be replaced:

 

class FragmentAllAvailableProducts extends Fragment
{   
    public function isValid()
    {
        $currentData = ...get data as an array...;
        $currentHash = md5(implode('#', $currentData));
        $originalHash = $this->getFragmentData('myhash', '');
        return $originalHash == $currentHash;
    }
    public function getOutput()
    {
        $currentData = ...get data as an array...;
        $currentHash = md5(implode('#', $currentData));
        $this->setFragmentData('myhash', $currentHash);
        return ...create output from data...;
    }
}

 

You can use the setFragmentData / getFragmentData to store any data you want to preserve across requests.

Cache priming

The first user to visit the page whose cache has just expired will have to wait until the cache is rebuilt. This may be unacceptable to you in some cases. If so, you may want to think about off-line cache priming. This means that you create a background process that periodically checks these caches. Just call getCache() and the cache will be recreated only if needed. If you want to force the cache to be recreated, call recreateCache().

 

FragmentCache::recreateCache('FragmentAllAvailableProducts');

 

If you are wondering if you can just call clearCache() and getCache() in stead, you can of course. But building a cache takes time. And in this time no cached output will be available. recreateCache() will only destroy the old cache when the new one is completely ready.

Explicit cache clearing

Clearing a single cache from within your code is only needed in exceptional cases. Above I described the case where the code that manages the data may clear the cache, the moment the data changes. This code could call

 

FragmentCache::clearCache($cacheId);

 

Clearing all caches is only needed when the code changes. Call this function:

 

FragmentCache::clearAllCaches();

 

Fragment Cache as a design principle

Fragment caching asks for a new way of looking at your code. Dependencies on data need to be made explicit. Forgetting a dependency means that the cache will not be updated when the forgotten data source changes.

The message to take away is that caching should not be done afterwards. It should be part of the design process. Not only will this decrease the chance of missing a cache expiration point, it will also guide you to write more optimal code. Each time you add a data dependency you are forced to think about it. Is it really necessary? Can it be made more specific so that the cache does not need to be rebuilt as often?

If performance is an important quality in your architecture, spending time working out a good Fragment Cache design is time well spent. It may even serve as the basic building block. Let me know your findings.

I will leave you with this comforting quote:

There are only two hard things in Computer Science: cache invalidation and naming things.

--Phil Karlton

The wooden alphabet puzzles can be found at Wooden Toys UK.

References

 

« Back

Reactions on "Fragment Cache - an introduction / PHP"

No posts found

Log in to comment on news articles.

Procurios zoekt PHP webdevelopers. Werk aan het Procurios Webplatform en klantprojecten! Zie http://www.slimmerwerkenbijprocurios.nl/.


Hello!

We are employees at Procurios, a full-service webdevelopment company located in the Netherlands. We are experts at building portals, websites, intranets and extranets, based on an in-house developed framework. You can find out more about Procurios and our products, might you be interested.

This weblog is built and maintained by us. We love to share our ideas, thoughts and interests with you through our weblog. If you want to contact us, please feel free to use the contact form!


Showcase

  • Klantcase: Bestseller
  • Klantcase: de ChristenUnie
  • Klantcase: Evangelische Omroep
  • Klantcase: de Keurslager
  • Klantcase: New York Pizza
  • Klantcase: Verhage

Snelkoppelingen