Help, My Web site Is Down

When it comes to accessing a web site, the old saw that perception is reality applies. So if you can't get to your web site or your customers can't then for you or them, the web site is down. The next question, of course, is what can be done about it.

The Internet is an anarchy of many, many networks that are interconnected. To access your web site on our servers at our service providers data center (One Communications Data Center in Marlboro, MA), a user's browser has to:

  1. Get the IP address of your web site from a Name Server.
  2. Traverse these interconnected networks from their ISP to our server. This can be 10-30 hops.

A failure in any of these places will result in the web site being down from that user's point of view. The following will give you some instructions on how to isolate the source of the problem.

The first step is to open a DOS command window

  1. Use the windows key + R
  2. Enter cmd and press the Enter key
  3. You should see a command window like this:

DNS Failure

You can check if you are getting correct address resolution by typing in the command:

ping www.mmcis.com

(where www.mmcis.com is the name of your web site). The output of this command will tell us two things.

If the response is Unknown host, then you have having some kind of DNS (name server) issue. Some possible causes:

  • Your domain registration has expired (check with your domain registrar's whois service)
  • the name server for your DNS manager is not available. If we are your DNS manager and you cannot get to our servers, this would result in a DNS failure.

If the address is not of the form: 155.212.12.nnn then there is a more complicated DNS problem so contact us at webmaster@mmcis.com and tell us what you are getting for a IP address. It would also help us if you told us who your connectivity provider (ISP) is or what your default name server's address is. If you are behind a firewall or broadband router this may be a little more complex than we can treat here.

If the response is similar to the above screen shot it says you are getting the correct address and you are able to reach our servers.

Access Failure

If you are getting the correct address but the pings are timing out, then there is a failure somewhere along the route from your computer to our servers.

To find out where that might be, enter the command

tracert www.mmcis.com

You should get back a listing like this:

Tracing route to www.mmcis.com [155.212.12.6]
over a maximum of 30 hops:

1 1 ms <10 ms <10 ms 192.168.1.1
2 9 ms 7 ms 14 ms 73.171.231.34
3 7 ms 7 ms * ge-1-39-ur01.maynard.ma.boston.comcast.net [68.87.156.145]
4 6 ms * 7 ms 10g-8-1-ur01.framingham.ma.boston.comcast.net [68.87.144.125]
5 7 ms 14 ms 8 ms 10g-9-1-ur01.westroxbury.ma.boston.comcast.net [68.87.144.121]
6 8 ms * 7 ms 10g-7-1-ar02.needham.ma.boston.comcast.net [68.87.145.53]
7 8 ms 7 ms 7 ms 12.125.33.33
8 13 ms 13 ms 13 ms gbr1-p60.cb1ma.ip.att.net [12.123.40.138]
9 16 ms 14 ms 15 ms tbr1-p013401.cb1ma.ip.att.net [12.122.11.193]
10 15 ms 17 ms 13 ms tbr2-cl16.n54ny.ip.att.net [12.122.10.22]
11 15 ms 15 ms 12 ms 12.122.80.237
12 14 ms 13 ms 13 ms att-gw.dc.aol.com [192.205.32.2]
13 16 ms 13 ms 13 ms 0.ge-5-1-0.XL3.NYC4.ALTER.NET [152.63.3.113]
14 13 ms 18 ms 15 ms 0.so-1-2-0.XL1.NYC1.ALTER.NET [152.63.21.21]
15 14 ms 13 ms 13 ms POS6-0.GW12.NYC1.ALTER.NET [152.63.29.193]
16 16 ms 16 ms 15 ms 65.200.139.145
17 18 ms 17 ms 16 ms ny1-bb2-ge-0-0-0-2.conversent.net [204.17.109.6]
18 20 ms 18 ms 20 ms ct1-bb1-ge-0-3-0.conversent.net [209.113.217.115]
19 21 ms 21 ms 21 ms ct1-bb2-ae0-100.conversent.net [209.113.217.194]
20 22 ms 24 ms 23 ms ma1-bb1-as0.conversent.net [209.113.217.229]
21 28 ms 23 ms 24 ms ma1-gw4-ge0-0-0-2.conversent.net [216.41.101.5]
22 23 ms 23 ms 28 ms nedv-bb1-as0.conversent.net [209.113.139.2]
23 24 ms 24 ms 24 ms host6.155.212.12.conversent.net [155.212.12.6]

Trace complete.

This shows us that there are 23 hops from my Comcast connection to the server. If at some point in the route, there is a failure you will start to see three *'s instead of the times. That is the point of failure. You may get intermediate lines with *'s because some routers are programmed to ignore the packet type used in a trace route. If the last line is the IP address of your server those can be ignored.

Most often these outages are a router issue at the point where the local ISP connects to their upstream provider. The 2nd most prevalent is a problem where two of the major backbone providers interconnect. It can be a software problem, a hardware outage, a power issue, etc., etc.

There is nothing anyone can do to ensure that everyone can have access 24/7/365 to any particular point  on the internet (such as our servers). There are 1,000's of ISPs and several dozen major backbone providers (see http://www.nthelp.com/maps.htm for some of them). Just do the math to figure out how many backbone interconnect points there are and you can see why even these critical resources typically do not have route around capabilities. Medium to large ISPs have multiple upstream providers so that a failure in one of their routes to the backbones can be routed around (Our server collocation provider has such capability for example) but that takes time that may be measured in hours, since it involves reprogramming internal routers. Even when they are able to reroute you often have capacity reduction that can look like an outage to some users. Smaller ISPs typically depend on their one upstream provider without any fallback connectivity.

If the failure is not at a point within the conversent.net network, you can usually assume that other people are able to get to the web site even if you cannot.

Once way to test this is to see if Google's machines can get to it. Point your browser to

http://www.google.com/language_tools?hl=en

and enter your web site address in the translate a web page form. Select English to any language and see if Google produces a translated page.

You can also use one of the many network tool web sites such as http://network-tools.com/.

Don't use these sites to isolate your name server or routing problem since they might succeed when you are failing. You can use them to see that others may be able to get to your web site even if you cannot. Of course, the Google servers or the network tool websites might be on the same portion of the Internet that is blocked from accessing our servers. Try a few different network tool sites if the first one fails.