Joi Ito is asking if long RSS items are rude:
Are long RSS items rude? More and more people are reading inside of news readers and not bothering to go to the blogs themselves. (My logs show this.) Should we put full text of the blog entry in the RSS feed, even if it’s long? It will surely slow your refresh rate. Has anyone written a style guide for RSS feeds? It’s a moving target, but I would be interested to hear about how readers and writers are designing their RSS feeds.
Jason Kottke suggests offering two at least two feed options; one with full items, one with excerpts. Some weblogs already have this option, I recall.
Currently this weblog shows full items only. I’ve considered following Jason’s suggestion, and I may do so as soon as time (and my slowly growing PHP skills) allow. But personally, I prefer getting full items in the RSS feed, as I now do 95% of my reading inside NetNewsWire – and I’ll probably skip an item altogether if the excerpt is too short or not descriptive, or if there’s no proper title.
The problem with excerpts is with how they’re generated. If I recall correctly, there’s an option in Movable Type to write the full article and an excerpt. I doubt that many users take advantage of this, and apparently the usual practice is to have the RSS feed cut the item off after a certain number of bytes or words. While this may be positive, forcing people to say what they’re going to say before saying it, it often doesn’t work that way. I myself often lead off with a quote from somewhere else, which would cause a simplistic excerpting algorithm to cut off before my own comments start.
So how large is a full-item RSS feed? According to NNW’s statistics window, my average feed size is around 22K. Since I implemented ETag/if-modified-since support, my average NNW download size hovers around 5K. I frankly don’t think this is unreasonable bandwidth. On the other hand, some of the feeds I subscribe to don’t use ETags, and the feed size is quite larger. The heaviest feed I’m subscribing to currently is from Jon Udell’s weblog, which comes in at 60K average – and it’s downloaded every time.
One trick to doing lighter feeds is to avoid HTML-encoding in item texts by using the CDATA tag. Here’s how Jon’s current feed’s first item starts out:
<content:encoded><table align="right" border="0" cellpadding="6" cellspacing="0">
<tr><td>
<a href="http://www.windley.com/categories/networkingAndWifi/2003/02/04.html#a421"><img width="200" src="http://weblog.infoworld.com/udell/gems/windleyPringles.jpg"></a>
<div align="center" class="realsmall">Phil Windley</div>
</td></tr>
</table>
Hey, Phil Windley's...
</content:encoded>
and for comparison, here’s how an item from my own feed begins:
<content:encoded><![CDATA[<b>By Rainer Brockerhoff:</b><br /><br /><table width="90%" cellspacing="1" cellpadding="3" border="0" align="center"><tr> <td><b>Rainer Brockerhoff wrote:</b></td> </tr> <tr> <td>...just after being chided by my editor for not turning in a couple of articles that are somewhat overdue... </td> </tr></table><br />...
]]></content:encoded>
I’m comparing RSS 2.0 formats here. Actually, Jon’s feed is even heavier because he’s duplicating full item content inside both <description> and <content:encoded> tags. Even stranger, the<content:encoded> content isn’t encoded at all, since no CDATA section is included.
Now, of course I’m not picking on Jon specifically here. But one thing which helped me a lot while debugging my feed was to subscribe to myself, and using NNW’s “View RSS Source” and “Validate this Feed” contextual menu commands.
Finally, one pitfall with including HTML item content when using the <content:encoded><![CDATA[…]]></content:encoded> format is to use the item’s full formatting. I rewrote the feed generator to exclude all external tables, style sheet references, <span> and<div> tags, and am working on eliminating all superfluous whitespace.