WordPress RSS Import

XML-compliant RSS 0.92, 1.0, 2.0 batch import and synchronization

This is an alternative to the RSS importer in WordPress 1.2, providing several additional features.

What It Does

  • It imports RSS files into your WordPress weblog.
  • It handles all major RSS variants, including 0.92, 1.0, and 2.0.
  • It imports single files from either your local drive or from a URL you specify.
  • It imports entire folder hierarchies of RSS files (blogBrowser-style: one folder per year, one file per month), making it a general-purpose weblog batch import tool using RSS as the exchange format.
  • It aggregates RSS feeds, if you point one or more copies of it at feeds on the web and set it to run regularly. (Even when run frequently, it won’t import the same item twice.) You can use this to maintain more than one WordPress site that shares the same content, such as a test site and a production site.
  • It handles time zones in a sophisticated way, preserving the timezone offset so that each item can appear on your weblog under the author’s original local time, while using GMT for all date comparisons.
  • It respects and stores modification dates if given in the RSS file.
  • If modification dates are given in the RSS file, it can optionally import only new or changed posts, leaving posts alone that haven’t been changed or that have been changed more recently on the local machine.
  • Using the above feature and two copies of WordPress, it can synchronize two or more weblogs, bidirectionally or multi-directionally. New and changed posts on any one weblog will automatically show up on the others.
  • It complies with the XML specification, for correct behavior with XML namespaces with arbitrary prefixes and CDATA sections in arbitrary locations, both of which can trip up a regular-expression-based parser.

What It Doesn’t Do

  • It doesn’t handle malformed XML. Because the standard XML parser it uses accepts only well-formed XML, some invalid RSS files may be rejected. If your RSS files are not well-formed, you must use a regexp-based parser. In practice, since this script is intended to be used with RSS feeds over which you have control, this is not expected to be a significant limitation. Note that the RSS file does not have to be strictly valid (according to the Feed Validator) to be parsed; most of the ways in which an RSS file could be invalid would still get past the much less stringent baseline test of XML well-formedness.

The script is called “Bootleg RSS Import”, for lack of a better term.

Downloads

Version 1.2a1
view | download

Updated for WordPress 1.2. Put this in your wp-admin folder. This version adds support for time zones parsed out of the RSS file, in either RFC822 or W3CDTF (ISO8601) formats. It also removes code for adding the modification date field, since WordPress 1.2 already includes it, and replaces all legacy use of addslashes() with mysql_escape_string().

Version 1.2a1 RSS exporter
view | download

Replaces WordPress’s RSS generator (wp-rss.php), adding support for modification dates and time zones.

Version 1.0
view | download

For older versions of WordPress. Works with WordPress 0.9, 1.0, and possibly 1.1 (not tested). Optionally adds a modification date field to the WordPress database if one is missing. This is the version I had originally contributed to the WordPress project.

History

When I was evaluating WordPress 0.9, I needed to write an import filter to handle my older posts. (For the personal and work sites I maintain, I use a combination of WordPress and my own weblogging tool, and still need to bridge the two. My own tool uses an archive folder of RSS files as its native data format.)

I contributed it to the project before WordPress 1.0, but the the RSS import feature that eventually appeared in WordPress 1.2 used a different approach: a regexp-based parser, rather than a SAX-based one. The regexp-based parser has the advantage of working with more types of broken XML feeds, but comes with a cost in correctness in parsing valid XML feeds.

So I continued to use this SAX-based tool. It also has some other features I needed, including the ability to synchronize weblogs, parse folders full of RSS files, parse modification dates, and parse and preserve time zones.