EQEmulator Forums

EQEmulator Forums (https://www.eqemulator.org/forums/index.php)
-   Support::Windows Servers (https://www.eqemulator.org/forums/forumdisplay.php?f=587)
-   -   Possible Login Server Issue (https://www.eqemulator.org/forums/showthread.php?t=32733)

izmael 12-24-2010 06:45 PM

Hey folks, I'm just a player on one of the servers but this got so annoying I can't help but lurk over here. Basically, it's a MAJOR pain to log in right now, been this way for several days.

I have no knowledge about how the world server and the login server communicate, but (let me know if I'm wrong), I've got a feeling that:

1. The client establishes a TCP connection to the login server in order to submit login and password. This works maybe 30% of the time. I guess it gets some kind of token in return.

2. When the above works, the world server establishes a TCP connection to the login server to verify the token information given by the client. This appears to also work about 30% of the time.

Therefore on average you need like 10 attempts before logging in one character (if, like me, you box 11 chars... feel my pain.)


Do anyone with access to the server and/or the firewalls/routers immediately in front of the server can use tcpdump or a similar network tool to see why the connections are being dropped or reset? I have managed unix and IP networks for a long time and will gladly help if needed.

By the way, if the login server is behind a linux NAT gateway, I'd check the ip_conntrack table somewhere in /proc . It might be full (especially if you moved to a new host recently and the firewall changed or something). A full ip_conntrack would give exactly those symptoms.

It could be a million of other things though... and tcpdump is your friend in these situations.

Let me know if I can help.

izmael 12-24-2010 06:49 PM

As for the problem being "hard to reproduce or trace on Rogean's end" - if the login system runs in a Linux system (or other flavor of unix) system, I'll gladly help. There's a wonderful utility called "strace" that you can use to see what happens inside a running process. =)

Rogean 12-25-2010 03:18 AM

The fact that you guys are saying it works for a while after starting the worldserver, and then at some point it starts having problems... Then goes back to no problems after restarting the world server?

How is that not an issue with the worldserver?

There are a handful of servers having this issue but there are a LOT of servers not having any issue at all. I've connected a server from my house and it hasn't had any problems maintaining loginserver connections.

You guys are going to have to put some debug code into the TCP handling of the worldserver to figure out exactly whats going on.

izmael 12-25-2010 09:28 AM

Rogean,

I think people tend to "blame" the login server for that issue because it started happening exactly after it went down for "hardware maintenance".

trevius 12-25-2010 09:56 AM

The only servers I haven't seen an issue on so far for sure are P99 and PEQ and maybe VZ/TZ. Nearly all of the other preferred or Legends servers have been having an issue over the past few days. Not all of them have replied to this thread yet to report their issue. And as far as I can tell, many of the normal servers have also had connection issues too.

I was running Rev1769 on Storm Haven when this issue started a few days after I updated to that code. During the troubleshooting process, I downgraded to Rev1757 to see if that would help at all. My server ran on Rev1757 perfectly fine for at least a week when I was originally running it, so reverting back should have fixed it, but it did not.

If the issue was from the SVN, PEQ would also have the issue by now I assume, as they stay as current as Storm Haven, if not more-so. But being that they are on the same host as the LS, I highly doubt PEQ will see this issue ever.

Quote:

Originally Posted by Rogean (Post 195548)
The fact that you guys are saying it works for a while after starting the worldserver, and then at some point it starts having problems... Then goes back to no problems after restarting the world server?

How is that not an issue with the worldserver?

When a large number of servers are having the same issue all starting at the same time, I think it increases the chance that the problem is on the Login Server end. Yes, it does not sound like any normal type of networking issue, but it could be any number of things. Maybe the LS isn't replying to worldservers as quick as it should be or as often, I dunno.

BTW, Rogean, what is the name of the test server you have running at home and would it be possible to open it to the public for testing? I would like to check on it from time to time over a day or so to verify it remains on the LS list and also can all connections to the server on a regular basis without having intermittent issues or eventually dropping off of the list completely. I can't really monitor it much today, but probably starting tomorrow I could.

Also, when was the last time the LS was restarted? Looks like it is starting to get a bit off:

http://stormhavenserver.com/download...ginserver2.jpg

Tabasco 12-25-2010 11:53 AM

My main server stayed up for a little over 48 hours. For that duration it would drop to pending and then back to online frequently. It came right back up with an LSReconnect and stayed up for another 12 or so before going down again. That's the host with known packet loss issues. (r1771)

My test server, on a different host, hasn't even dropped to pending in the last 48 hours. (r1777)

I'd be glad to coordinate for some more specific tests.

Rogean 12-25-2010 03:23 PM

My test server is the Code Cave.

The servers showing up as [] is a known issue, I know what's causing it, and I'll fix it soon. It is not something that would affect other servers being disconnected.

The loginserver was rebooted last night to put more debug code in.

Rogean 12-30-2010 10:13 AM

Is this resolved?

I'm not seeing servers getting bumped off anymore.

Lillu 12-30-2010 11:43 AM

I don't want to jinx it but it seems all fine for 2-3 days now. Thanks for fixing it :)

Congdar 12-30-2010 05:32 PM

I'm not seeing those reconnect messages anymore.

Lillu 12-31-2010 01:58 PM

Well I might jinxed it. We just had a reboot and the server keep reconnecting to eqemu LS (same issue as before). Our private LS is fine.

anyways, Happy New Year :)

Xiggie3 01-01-2011 06:56 PM

On EZ server, seems this issue is back.

Xiggie3 01-04-2011 03:43 AM

So is anything being done about this? Just wanted to know if i should just switch over to the other login server.

trevius 01-04-2011 12:22 PM

Yeah, this issue is still going on for Storm Haven as well. As long as I keep EQEmu as the only LS we are configured for, it stays connected most of the time without as many disconnects. As soon as I configure the other LS's, it starts dropping off of EQEmu very quickly and stops reconnecting to it at all eventually.

Very odd issue indeed. Hopefully I can find some time to help troubleshoot the issue more from my side (as much as is possible anyway).

trevius 01-04-2011 12:57 PM

I just noticed that the LS for peqtgc.net and tsahosting.net both resolve to the same IP address. I am not sure if one of those 2 took over for the other, or if they are both trying to be ran from the same host. My server had been configured to connect to both, but doesn't seem to actually connect if I use my tsahosting login info for my server, but I think it did connect to it last time using no login info (which is what I used for the peqtgc.net LS). So, I assume the tsahosting.net login server is now actually the peqtgc.net login server. I wonder if servers trying to use both (like mine was) is what is causing all of the weird reconnect issues.

Gaeorn or Cavedude, can one of you confirm which LS is running at your host now?

Code:

C:\Users\Trevius>nslookup eqemulator.net
Server:  DD-WRT
Address:  192.168.1.1

Non-authoritative answer:
Name:    eqemulator.net
Address:  67.23.190.71


C:\Users\Trevius>nslookup peqtgc.net
Server:  DD-WRT
Address:  192.168.1.1

Non-authoritative answer:
Name:    peqtgc.net
Address:  72.67.6.37


C:\Users\Trevius>nslookup tsahosting.net
Server:  DD-WRT
Address:  192.168.1.1

Non-authoritative answer:
Name:    tsahosting.net
Address:  72.67.6.37

I am pretty sure we should only be set to use one of those, but not both, as I don't think both can run on the same host and use the same port. Once we know for sure which to use, maybe if all servers correct their config, it will resolve the problem seen on the EQEmu LS as well?


All times are GMT -4. The time now is 07:36 PM.

Powered by vBulletin®, Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.