I refreshed the server list about 20-30 times, in a row. The only listings
that kept changing status were the bottom 3 listings under Legends and Preferred. Nothing ever seemed to change under Standard servers. |
Okay, so:
Servers that are having problems. Are they all kept up to date on latest SVN? |
Nothing has been changed on the loginserver in months, so you guys are going to have to throw some debug code into the worldserver and see what exactly is causing it to disconnect.
Something that may be happening though.. now that I think about it, the paths being taken to the data center for our server farm had been a little screwy lately. Can you guys post some traceroutes from the servers having the issue? Thanks. |
Odd thing is that I can run a continuous ping to the LS without a blip, so it doesn't seem like it is an issue with dropping packets or routing or DNS. I also confirmed (same as what AX already did) that I get the same result using eqemulator.net or .org or using the IP directly.
And some of the server do stay current, but some of them definitely do not. An example of one is THF, which is still on Rev 1452 or so and they seem to be having the issue as well as most of the other servers. I am guessing the LS just got restarted now, cause I looked at the list and it looked like this: Code:
Version Server Information Status Legends Servers |
Also, here is a screenshot showing double listing of EZ and Irreverent servers:
http://stormhavenserver.com/download...le-servers.jpg |
Last 2 posts are me restarting the LS as I was working on the code.
Still not getting trace routes. All you guys have done is claim it isn't working and not provided me with any help figuring it out. You're going to have to do a little bit more. Start getting debug code in to figure out why the world is getting disconnected. Is it timing out, is the loginserver disconnecting it, etc. And get me those trace routes from the servers. |
I've put in changes that will fix a world displaying as down due to making a second connection before the first connection was declared stale.
EZ and THF look to be reconnecting pretty often. I can see right now the loginserver is having to remove a duplicate worldserver pretty much every time these worldservers are connecting. This means that it is NOT the loginserver that is disconnecting them. Something is causing the WORLDSERVER to reconnect on it's own. Quote:
|
Here is a traceroute from Storm Haven:
Code:
traceroute eqemulator.org My best guess so far is that the problem is somehow related to the fairly new (6 months old or so) code that allows servers to connect to multiple Login Servers at the same time. I know that when I removed the config to connect to LSs other than the EQEmu one, my server was suddenly able to stay connected to the EQEmu LS much better. And, Rogean, I don't mean to seem like I am pushing this issue all on you or anything. It is just hard to figure out a connection issue only looking at a single side of the connection. If there didn't seem to be any other oddities going on, I would assume that the problem was on my end only and would work on it quietly by myself until it was resolved. Since it seems that it may not just be my server, it doesn't hurt to investigate the possibility of the issue being with the EQEmu LS or even one of the other LS like tsahosting.net or peqtgc.net. I am curious if one of the other LSs was disconnecting servers, maybe they would try to reconnect to all LSs they are configured to connect to even if they are already connected? I should check with Gaeorn to see if he is seeing anything odd on tsahosting.net lately. |
Here's some web tests that may or may not be of help;
http://www.indeep76.com/eqemulator.net/ http://www.indeep76.com/checks/eqemulator.net/6851/ I reported the problems starting a few days ago because that's when they started. I mentioned it was after you did maintenance work, as an observation that might be of help to you; Prior to that date, everything was running fine, and I haven't changed a thing. I noticed an impressive amount of servers with the same situation, so I deducted the problem was LS server side. Quote:
|
The host for my primary server (running 1771) reported packet loss issues from an over-utilized circuit, but are in the process of resolving the problem.
I've been connected for around 24 hours now with no problems so I'm inclined to think my issues were purely host related and just coincidentally timed, but I'm keeping a close eye on it. I've updated my test server (completely different host) from 1771 to 1777 to see how it behaves. |
Im also experiencing this problem
Ive changed nothing in my exes or anything, was working fine a few days ago. |
Quote:
1. Login Server A is up and running perfectly fine without an issue. 2. Login Server B starts to experience an issue that is causing it to constantly drops world servers. 3. Multiple servers are configured to connect to both LS A and B. 4. As the servers start getting disconnected from Login Server B frequently, they start sending reconnects for both A and B at the same time even though A has remained stable. 5. Login Server A reacts badly to servers trying to register again that are already connected and registered. Somehow this causes all external servers (not on the same LAN as LS A) to experience connection issues with LS A. Now, that example is probably not possible at all, but if nothing has changed to cause this problem, I am grasping at straws. No servers have the info Rogean wants, because from a network perspective, everything looks perfect as far as I can tell. I never drop a ping to eqemulator.net and traceroutes and response times look great. I get about 35ms response time consistently, so that is not the problem. I think part of the problem is that P1999 and PEQ both run on the same host as the LS, so they will never experience this issue if it is only from external networks, which is what it appears to be. And without P99 having this issue, I am sure it is harder for Rogean to troubleshoot from his end, or to even verify there is a problem at all. If P99 was experiencing this issue as well, it probably would have been resolved already. Considering that so many servers are experiencing and voicing issues with the EQEmu LS only, I think it is pretty clear that the problem is there. Since PEQ and P99 don't have this problem at all, and all other servers seem to be having at least some sort of issue (some worse than others), I think that points to some type of issue with networking internal vs external. It could be a port issue, a routing issue, a DNS issue, or just about anything coming into the EQEmu hosting or anywhere up to that point. It could even be something related to the recent DDoS attack on P99 and related to how the hosting service assisted in mitigating the attack (assuming they did something). Considering there was "hardware maintenance" on the 15th and a recent DDoS attack, it seems like one of those issues/changes could be to blame. Without us knowing exact details on issues/changes like that, it is hard to assist in possible suggestions for the cause and resolution of the issues. |
That makers a lot of sense actually since the easiest way to combat a DDoS attack is to only allow so many connection attempts from an address in a given amount of time before either banning the address or temp banning it. If the our external world servers loose sync and keep trying to reconnect, the LS may be seeing that as a DDoS attack and causing issues. Even more so if as you hypothisized about having multiple LS's syncing to one world server and if one goes out of sync it resend the world server login request back to all LS's in it's list.
|
I've mentioned these things to Rogean before this thread started. I've had all the same issues on two servers and I've seen them consistently. Not that they are ultimately keeping people from playing, but there still is a definitive difference AFTER whatever 'maintenance' happened. As far as what has changed specifically, that will have to be on Rogean's side. And I'm sure it's one of those problems that is very hard to reproduce or trace on his end.
|
I am not familiar with EQEmu setup yet, but here is a simple test:
Login, continue to "Server Select" Screen. Do NOTHING, wait 60 seconds. If you get a pop stating "Error - A timeout occurred", your going no where. You can hit OK, and wait again for 60 seconds to see if the server list refreshes or you get the error, but this is in vain. You will always get the error unless you restart the client. If the list of servers updates, your good to go, you can get onto the server of your choice. |
All times are GMT -4. The time now is 06:52 PM. |
Powered by vBulletin®, Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.