tal explains DNS SOA records

Updated: 11/4/1999

What's the data in a DNS SOA record mean?

Let's look at a typical SOA record:

bell-labs.com.   IN SOA     dirty.research.bell-labs.com. tal.research.bell-labs.com (
                        883071700       ;serial (version)
                        3600    ;refresh period
                        900     ;retry refresh this often
                        604800  ;expiration period
                        3600    ;minimum TTL
                        )

The first couple fields:

The "dirty.research.bell-labs.com" is the DNS master. (The old "primary/secondary" terminology has been recently replaced with "master/slave"). However, that's just a comment. No piece of code actually uses that. The tal.research.bell-labs.com means that if you have problems with this domain, contact tal@research.bell-labs.com. @'s have special meaning in zone files, so we replace it with a "." and deal.

The serial number is a serial number. When you make changes to the zone, you give the new data a new serial number. However, it is important that the serial number always increase. Zone transfers happen when the serial number on the slave is LOWER than the master. This is very different than doing zone transfers when the serial number is DIFFERENT. The reason the transfer happens when the serial number is LOWER is that if you have multiple masters with mis-matched serial numbers, you don't want slaves to get into zone transfer battles.

(NOTE: As far as most humans can tell, "LOWER" means "having a lower value" just like you'd expect. The reality is not so simple, as the number "loops around to zero" when the numbers get very high. The way this is handled is that very high numbers are considered "lower" than very low numbers. However, only true DNS nerds will ever experience this. Forget I ever mentioned it.)

Zone transfers:

Zone transfers and individual queries are two different things. Zone transfers have to be manually configured on a server that wants to be a slave. Here are the details about zone transfers: Example #1: a fast-changing environment: Expire everything after a month, but do refreshes every hour, and tries every 10 minutes. Minimum TTL of 30 minutes.
fast.com.   IN SOA      ns1.fast.com. dude.fast.com. (
                        883081600 ;serial (version)
                        3600      ;refresh period
                        600       ;retry refresh this often
                        2592000   ;expiration period
                        1800      ;minimum TTL
                        )
Example #2: a static environment: Expire everything after a month, do refreshes once a day, and retries every hour.
static.com.   IN SOA    ns1.static.com. slwmver.static.com. (
                        883081601 ;serial (version)
                        86400     ;refresh period
                        3600      ;retry refresh this often
                        2592000   ;expiration period
                        1800      ;minimum TTL
                        )

Individual records:

Now we've covered all the SOA numbers except the minimum TTL.

Every line of a zone is a "RR" (resource record). The format looks like this:

www.yahoo.com.   344 IN  A       204.71.200.67
Now you most likely know that if you are in the "yahoo.com" zone, you don't need to specify ".yahoo.com", but just "www". If you have multiple records for the same name (for example, an A and an MX record) you don't need to list the name in every RR.

However, there is also a TTL (time to live) record for every line in the zone. In the above example, www.yahoo.com 's A record is valid for another 344 seconds. If there is no integer listed in the RR, then the "minimum TTL" value from the SOA record is used. So, in the bell-labs.com domain a record like

bell-labs.com.   IN MX    10 dusty.research.bell-labs.com.
will default to 3600 seconds for that record (because that's what's listed in the SOA).

The defaults are loaded into the data record by the master or slave when the zone is loaded (from a file or across the network). The DNS client that requests that record doesn't need to also get the SOA to find out what the TTL should have been. In fact, a DNS client that does no zone transfers should NEVER get a SOA record.

All the expiration/refresh/retry information from the section on zone transfers is pretty much useless for this section. The TTL's on RRs is handled by totally different mechanisms. If you get a RR, you are told how many seconds it is valid. When that time is up, you are supposed to re-get the data. Most clients don't, but what can ya do.

Propagation:

(I need to write a bit here about how the TTL's propagate. The fact that if you get a RR from a master or slave the TTL will be the full value. If you get a RR from any other host (DNS cache, DNS caching server, etc.) then the TTL you will get is whatever they got, minus how old it is. So, if I am a DNS cache, and I get a RR that tells me "3600 seconds" for a TTL, then I should tell my client that TTL. However, if 50 seconds later a second client asks me for that same RR, I should give them the RR with a TTL of 3550 because it is now 50 seconds closer to being out of date.)

Handling change:

In the static environment you might have a situation where you plan on making a big change at 5pm on Wednesday and you want the change to propagate quickly. You have been propagating TTL's of 1 day for ages. How do you you get the change on Wednesday to propagate instantly?

In a worst-case scenerio, changes take N+M seconds to propagate. (where N is the refresh period time, and M is the default TTL)). That's because a secondary might have gotten a zone right before the new zone data was avaiable AND then a client might have gotten a RR that it thinks is valid for M seconds. Since M and N are usually the same, the real propagation is often 2N seconds long.

So, how do we make this change on Wednesday? It takes some pre-planning.

On Monday morning you would set the SOA to have 1-hour refreshes and 1-hour TTLs. All the data sitting out there with a 24-hour refreshes will eventually time out and load the new data with the 1-hour/1-hour timeouts. You had to make this change on Monday, since you need 2 days (2N) for the change to propagate.

When you make the change on Wednesday at 5pm, it should propagate in one or two hours. Nearly an instant change.

Let things settle for a while. Take a day to make sure the changes you made are the way you really want them. You'll want the fixes to propagate quickly also. The editional load on your servers will be minor... since the check to see if a serial number has changed is extremely low-impact on server performance.

Finally when you are confident your changes were correct (for example, on Friday) you can set your SOA back to the way it used to be. Huge refreshes and huge TTLs.