Way too often after working out a tough problem, I'm just too confused or tired to blog about it. Tonight is like that but I'm writing anyway. I just got my gateway machine routing again and it was disgustingly simple (considering the amount of time it was offline). I'm not going to explain all the details today but basically I have a machine that serves as a gateway between the LAN here in the supersecret headquarters of Late Night PC and the rest of the Internet at large. Asterisk, FreePBX, an iptables firewall, DNS server, and DHCP server all run on this CentOS box. The gateway machine connects between the DSL modem on eth1 and the last router for the LAN on eth0.
I had everything running nice and smooth. I was even working the kinks out of a traffic shaping script so VoIP, online games and big downloads wouldn't interfere with one another. There are a few machines in here and a few users. Sometimes the best part of having so many people at home is all the network problem-solving I get to do. It's a big lab for me to experiment in. And of course I love my family too. I enjoy their company and not just their network traffic.
A month or so ago I was in Toronto visiting my sister and I got a call that the LAN couldn't reach the Internet. Or some of it couldn't. That sucked but it could have waited until I got back. Except that it appeared that my development machine, which can be seen from the Internet, was serving up all kinds of nasties. By which I mean dirty pictures. By which I mean... nevermind, I couldn't look at the page long enough to see. I couldn't wrap my head around what was happening. When I tried to SSH in I got no response. The web server signature didn't match mine but when I asked Candace to do a ps aux, there didn't seem to be another web server running. The giveaway was when she disconnected my machine from the router and the nasties didn't stop. The DNS record for my server was pointing somewhere else. I use dynamic DNS for this one and all I could guess was that my record wasn't updated and someone else with an infected machine got my old IP address. Anyhow, Google saw it. I finally removed that stuff from Google a little while ago (and Google responded very quickly).
At the time though, I talked Candace through bypassing the gateway machine. I just undid that tonight, I have a few random network bits that I'd like to make a note of and share with anyone interested.
First off, I was getting this
So I had screwed up my testing by trying to simplify it. I moved the network cabling around to put the gateway machine back inline in its proper place. I rebooted it to make sure I'd get normal reboot conditions and not rely on anything temporary (at least not without knowing it).
After rebooting, ppp0 came up but I still couldn't resolve names.
Yeah that typo actually happened (I'm not proud). But DNS was clearly running on the DNS server that I wanted (ruby - the gateway machine).
I figured out that the DNS server was running but /etc/resolv.conf was looking at the router for DNS instead of ruby. The temporary solution to this is to tweak it:
My resolv.conf has a line that says it was written by dhclient-script. What happens is, at some point, ifup ppp0 runs. This script uses information in /etc/sysconfig/network-scripts/ifcfg-ppp0 to configure the interface and make the PPPoE connection. If you're trying to connect to your ISP with PPPoE on DSL then the "normal" way to do it from the command line is using ifup. I think everything else (GUIs and all that) is a wrapper around that script. So when ifup ppp0 runs, it uses dhclient to work as a DHCP client and get an IP address for this interface. This can be a little confusing for a gateway machine since (at least in my case) it's also running the DHCP server for the LAN.
When dhclient runs it somehow gets the idea that 192.168.3.1 is the nameserver it should list in resolv.conf. I don't know where it gets that idea at the moment and I'm out of steam for tonight. I can see though, from the dhclient-script manpage, that it supports some hooks. In this case, I could create a script (/etc/dhclient-up-ppp0-hooks) which dhclient-script would find and run before creating a new resolv.conf. In that script I'd have access to the nameservers and could tweak them in the variable $new_domain_name_servers,
So why am I blogging about this instead of writing that script? Frankly I'm not convinced it's the right solution. The DHCP server seems to be configured to send out the right address. Other DHCP clients get the right nameserver address but this one doesn't. So I'm going to investigate more before giving in to this idea.
Oh and while I'm on the subject of DHCP, dhcpd wasn't starting up earlier tonight. I think dhcpd was disabled so that I could use the dhcp server built in to the router while my gateway was out of commission. To get it running at boot I just did this
I know this isn't all perfectly clear, if you've got a question about the stuff I touched on here just leave a comment below.