EQEmulator Forums

EQEmulator Forums (https://www.eqemulator.org/forums/index.php)
-   Support::Windows Servers (https://www.eqemulator.org/forums/forumdisplay.php?f=587)
-   -   Possible Login Server Issue (https://www.eqemulator.org/forums/showthread.php?t=32733)

trevius 12-20-2010 10:28 AM

Possible Login Server Issue
 
I haven't been able to confirm this yet, but it looks like there may be a Login Server issue going on for any server that is restarted or drops off of the list and tries to reconnect. My server, Storm Haven as well as Dragon Soul and Irreverent all seem to be fluctuating from down to pending status on the LS Status page:

0.8.0 [View] Pending Irreverent - The Solo Server
0.8.0 [View] Pending Storm Haven
0.8.0 [View] Down Dragon Soul

From a client, they just keep popping in and out of the list. Eventually, they just give up and no one can connect.

I assume this is only temporary and Rogean will be able to fix it (if it really is a LS issue). Mostly just posting about it here so any other servers with the same problem aren't wasting a ton of time to figure out the cause like I was :P

Rogean 12-20-2010 07:01 PM

What operating systems are running on the servers having this issue?

I think it has something to do with that. I've never seen the problem occur on Windows servers.

Rogean 12-20-2010 07:06 PM

Also, there are two different issues:

One issue is the loginserver list page on the website showing a server is down when it is infact up and connected, and able to be logged into. This is due to a new connection being made to the loginserver before an old connection is recognized as stale. After connection, the first connection then times out and marks the server as down in the database. The solution for this would be to automatically kick off any servers with the same World ID when a connection is made.

And then theres the second issue, which I think is the more relevant issue to this thread, where certain servers will lose connection to the loginserver, and then only attempt to reconnect for a certain amount of time before giving up.

trevius 12-20-2010 07:53 PM

I run Debian Linux. I do know that there is a specific issue where if connectivity is lost to the LS, it will only try so many times before giving up, and I know that only happens on Linux. But, that is not what this issue is. This issue is that even while my connectivity to the LS is rock solid, Storm Haven will not stay connected to it. I restart my server, it connects briefly, then disconnects and reconnects multiple times until it finally stops. I am not sure what is causing it to get removed from the server list.

I would be think the problem is on my side, accept it seems like Irreverent and Dragon Soul are having the same issue. Maybe they are having their own troubles that are unrelated, though.

I am still looking into more possibilities on my end, but it is definitely a new type of issue I haven't seen happen before.

Congdar 12-20-2010 08:53 PM

When I checked my server status this morning, it was 'Down' according to the web page. I run on windows server 2k3r2. When I remoted into the server, my world.exe window is showing:
Code:

[12.20. - 13:46:28] [WORLD__LS] Connecting to login server: eqemulator.net:5998
[12.20. - 13:46:28] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
[12.20. - 14:08:12] [WORLD__LS] Connecting to login server: eqemulator.net:5998
[12.20. - 14:08:13] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
[12.20. - 14:29:47] [WORLD__LS] Connecting to login server: eqemulator.net:5998
[12.20. - 14:29:47] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
[12.20. - 14:51:21] [WORLD__LS] Connecting to login server: eqemulator.net:5998
[12.20. - 14:51:21] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
[12.20. - 15:02:24] [WORLD__LS] Connecting to login server: eqemulator.net:5998
[12.20. - 15:02:24] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
[12.20. - 15:24:08] [WORLD__LS] Connecting to login server: eqemulator.net:5998
[12.20. - 15:24:08] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
[12.20. - 15:45:42] [WORLD__LS] Connecting to login server: eqemulator.net:5998
[12.20. - 15:45:43] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
[12.20. - 16:07:27] [WORLD__LS] Connecting to login server: eqemulator.net:5998
[12.20. - 16:07:27] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
[12.20. - 16:29:01] [WORLD__LS] Connecting to login server: eqemulator.net:5998
[12.20. - 16:29:02] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
[12.20. - 16:50:46] [WORLD__LS] Connecting to login server: eqemulator.net:5998
[12.20. - 16:50:46] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998

I restarted the server and it seemed everything was back to normal, but after a couple of hours the above scenerio repeated.

EDIT: The server is up and listed in the client and I can log in just fine. Still listed as 'Down' in the web page, but there's something causing frequent loss of connection.

trevius 12-21-2010 01:24 AM

Maybe other servers aren't having the same exact problem mine is, but it looks like THF may also be doing the same thing. I see them listed as down and going into Pending from time to time as well even though the server is up. They have alternate LS connectivity, so it isn't really impacting them much yet.

Another odd thing is that I keep seeing duplicates of Irreverent, and THF in the EQ Client Server List.

Still looking to it on my end as well, but this just seems like a weird problem to me so far.

trevius 12-21-2010 02:44 AM

Well, I think I finally have Storm Haven connected stable to the LS (for now anyway). What I finally ended up doing was to disable connectivity to the other alternate login servers we were connected to (tsahosting.net and peqtgc.org). I am not sure how/if that was causing the problem, but it was the last thing I tried before it started staying connected.

Angelox 12-21-2010 08:35 AM

This is a problem and it started a few days ago, after the EqEmu Login Server was taken down for a while for "maintenance".
If you want to see who all has the problems , you can reload the servers page a few times (wait a while in between reloads);
http://www.eqemulator.org/index.php?pageid=serverlist
You'll see a group of servers constantly going from "down", to "pending", to "online", then down again (many of them just stay "down").
This wasn't so before (started a few days ago).
BtW Trevius, you still have the problem too.

Tabasco 12-21-2010 10:01 AM

It looks like I'm having the same issue. It stays up for a while and then I lose LS connection.

Code:

23945 [12.20. - 00:32:11] [COMMON__THREADS] Ending TCPConnectionLoop with thread ID -93160592
23945 [12.20. - 00:32:16] [WORLD__LS] Connecting to login server: eqemulator.net:5998
23945 [12.20. - 00:32:16] [COMMON__THREADS] Starting TCPConnectionLoop with thread ID -76375184
23945 [12.20. - 00:32:16] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
23945 [12.20. - 00:59:34] [COMMON__THREADS] Ending TCPConnectionLoop with thread ID -76375184
23945 [12.20. - 00:59:39] [WORLD__LS] Connecting to login server: eqemulator.net:5998
23945 [12.20. - 00:59:39] [COMMON__THREADS] Starting TCPConnectionLoop with thread ID -59589776
23945 [12.20. - 00:59:39] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
23945 [12.20. - 01:26:57] [COMMON__THREADS] Ending TCPConnectionLoop with thread ID -59589776
23945 [12.20. - 01:27:02] [WORLD__LS] Connecting to login server: eqemulator.net:5998
23945 [12.20. - 01:27:02] [COMMON__THREADS] Starting TCPConnectionLoop with thread ID -42804368
23945 [12.20. - 01:27:02] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
23945 [12.20. - 01:54:20] [COMMON__THREADS] Ending TCPConnectionLoop with thread ID -42804368
23945 [12.20. - 01:54:25] [WORLD__LS] Connecting to login server: eqemulator.net:5998

That's on ubuntu 10.04 x86-64 running 1771

It looks like I can have a day or more of uptime between issues.

songie 12-21-2010 10:10 AM

Mine seems to be constantly down since a couple of days ago, no matter if i restart it or not it doesnt take long to go "down" again

Rogean 12-21-2010 04:03 PM

Quote:

Originally Posted by Tabasco (Post 195394)
It looks like I'm having the same issue. It stays up for a while and then I lose LS connection.

Code:

23945 [12.20. - 00:32:11] [COMMON__THREADS] Ending TCPConnectionLoop with thread ID -93160592
23945 [12.20. - 00:32:16] [WORLD__LS] Connecting to login server: eqemulator.net:5998
23945 [12.20. - 00:32:16] [COMMON__THREADS] Starting TCPConnectionLoop with thread ID -76375184
23945 [12.20. - 00:32:16] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
23945 [12.20. - 00:59:34] [COMMON__THREADS] Ending TCPConnectionLoop with thread ID -76375184
23945 [12.20. - 00:59:39] [WORLD__LS] Connecting to login server: eqemulator.net:5998
23945 [12.20. - 00:59:39] [COMMON__THREADS] Starting TCPConnectionLoop with thread ID -59589776
23945 [12.20. - 00:59:39] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
23945 [12.20. - 01:26:57] [COMMON__THREADS] Ending TCPConnectionLoop with thread ID -59589776
23945 [12.20. - 01:27:02] [WORLD__LS] Connecting to login server: eqemulator.net:5998
23945 [12.20. - 01:27:02] [COMMON__THREADS] Starting TCPConnectionLoop with thread ID -42804368
23945 [12.20. - 01:27:02] [WORLD__LS] Connected to Loginserver: eqemulator.net:5998
23945 [12.20. - 01:54:20] [COMMON__THREADS] Ending TCPConnectionLoop with thread ID -42804368
23945 [12.20. - 01:54:25] [WORLD__LS] Connecting to login server: eqemulator.net:5998

That's on ubuntu 10.04 x86-64 running 1771

It looks like I can have a day or more of uptime between issues.

That looks like an issue with the worldserver.

Tabasco 12-21-2010 04:32 PM

Quote:

Originally Posted by Rogean (Post 195422)
That looks like an issue with the worldserver.

I'm seeing it in the logs from time to time now even when it's definitely up. It's a recent development, but I'll see if my host is having an issue.

trevius 12-22-2010 08:08 AM

Looks like EZ Server can be added to the list of servers having this connection issue as well:

Quote:

0.8.0 [View] Pending EZ Server
Most likely, the only reason PEQ and P1999 don't have this issue as well is because they are hosted on the same host as the LS is. This is almost definitely not a problem on the server-side, as it makes more sense that the single point of failure for all of these servers is the EQEmu LS.

melkor_41 12-22-2010 05:25 PM

add redemption to that, windows server here.

and it wasnt loginserver related, it was DNS related, your dns (eqemulator's) lost the entries or was not resolving for external hosts the entries for eqemulator.net

Angelox 12-22-2010 06:31 PM

For the record, I already tried eqemulator.net, eqemulator.org, and finally the ip address.

Huppy 12-22-2010 09:27 PM

I refreshed the server list about 20-30 times, in a row. The only listings
that kept changing status were the bottom 3 listings under Legends and
Preferred. Nothing ever seemed to change under Standard servers.

Rogean 12-22-2010 09:51 PM

Okay, so:

Servers that are having problems. Are they all kept up to date on latest SVN?

Rogean 12-23-2010 12:55 AM

Nothing has been changed on the loginserver in months, so you guys are going to have to throw some debug code into the worldserver and see what exactly is causing it to disconnect.

Something that may be happening though.. now that I think about it, the paths being taken to the data center for our server farm had been a little screwy lately. Can you guys post some traceroutes from the servers having the issue?

Thanks.

trevius 12-23-2010 01:30 AM

Odd thing is that I can run a continuous ping to the LS without a blip, so it doesn't seem like it is an issue with dropping packets or routing or DNS. I also confirmed (same as what AX already did) that I get the same result using eqemulator.net or .org or using the IP directly.

And some of the server do stay current, but some of them definitely do not. An example of one is THF, which is still on Rev 1452 or so and they seem to be having the issue as well as most of the other servers.

I am guessing the LS just got restarted now, cause I looked at the list and it looked like this:

Code:

Version        Server Information        Status        Legends Servers
0.8.0        [View]        594        Project 1999
0.8.0        [View]        Pending        EZ Server
0.8.0        [View]        Pending        The Hidden Forest
0.8.0        [View]        Pending        PEQ The Grand Creation
0.8.0        [View]        Down        Rogean Development
Version        Server Information        Status        Preferred Servers
0.8.0        [View]        Pending        Dragon Soul
0.8.0        [View]        Pending        Irreverent - The Solo Server
0.8.0        [View]        Pending        Storm Haven
0.8.0        [View]        Pending        Vallon Zek/Tallon Zek
0.7.0        [View]        Pending        Scars of Amerous
0.8.0        [View]        Down        Akka's Funhouse SoD Lvl 250
Version        Server Information        Status        Standard Servers
0.8.0        [View]        93        EQTitan [Legit PoP/LDoN/GoD]
0.8.0        Unregistered        5        Zzz -=[Cruel]=- zzZ - TESTING -
0.8.0        [View]        3        Alakamin
0.8.0        [View]        1        testlock
0.8.0        [View]        Pending        Sarus
0.8.0        [View]        Pending        The Order of Sin
0.8.0        [View]        Pending        Discordant Whisper
0.8.0        [View]        Pending        Last World
0.8.0        [View]        Pending        LOTR: The Years of Trees
0.8.0        [View]        Pending        Mythic Lair
0.8.0        [View]        Pending        Raid Addicts (Custom-Legit)
0.8.0        [View]        Pending        The Realm
0.8.0        [View]        Pending        The Redemption
0.8.0        [View]        Pending        Keegan/Voidd Classic PvP
0.8.0        [View]        Pending        under construction - testing
0.8.0        [View]        0        [EQ-Heroes] Under Development
0.8.0        Unregistered        Pending        Advents Shadow
0.8.0        Unregistered        Pending        Birthright
0.8.0        Unregistered        Pending        Club Zek - Legit PvP Progression -BETA-
0.8.0        Unregistered        0        Darkstar
0.8.0        Unregistered        Pending        Evolution Server (Custom-Legit) eqevolutionserver.com
0.8.0        Unregistered        Pending        Foray
0.8.0        Unregistered        Pending        Gemini (Legit-GoD) [www.geminiserver.com]
0.8.0        Unregistered        0        Haven
0.8.0        Unregistered        Pending        Morden Rasp - [PvE/Legit/Custom] In Development.
0.8.0        Unregistered        Pending        MortalQuest Redux
0.8.0        Unregistered        Pending        MyEQsf [Rev-1777Bots Spells GMcommands Legit]
0.8.0        Unregistered        0        Queen Of Love CEQ [In Testing]
0.8.0        Unregistered        Pending        RENO LOVES WIDDLE
0.8.0        Unregistered        Pending        Sandbox
0.8.0        Unregistered        0        Specialty's Dev Server
0.7.0        Unregistered        Pending        PEQ says change me!
0.7.0        Unregistered        Pending        ~Shattered Euphoria~
        Unregistered        Pending        Akka's X5 Fun House Test
0.8.0        [View]        Down        [PROJECT AXCLASSIC] The Rathe
0.8.0        [View]        Down        Fredsbox 51/50 Solo IntelliBots
0.8.0        [View]        Down        Stair's test server
0.8.0        [View]        Down        Infinity
0.8.0        [View]        Down        JJ's PEQ Testbed
0.8.0        [View]        Down        The Forge
0.8.0        [View]        Down        [ Keeper Of the Vale ]
0.8.0        [View]        Down        Blizz's Everquest Server
0.8.0        [View]        Down        An_Evil_Enchanter01 Production
0.8.0        [View]        Down        Aristoxenus
0.8.0        [View]        Down        AsianTime[Start 56lvl,Epic1.0]
0.8.0        [View]        Down        Asto's World
0.8.0        [View]        Down        Blood of the Akkadian
0.8.0        [View]        Down        Desdemona
0.8.0        [View]        Down        Discordant Whisper - Test
0.8.0        [View]        Down        EQEmu by Good Times Gang
0.8.0        [View]        Down        Infinity for Testing
0.8.0        [View]        Down        Morell-Thule: Resurrection
0.8.0        [View]        Down        ProjectZek
0.8.0        [View]        Down        Scarlet Horizon Reborn
0.8.0        [View]        Down        SirensBane
0.8.0        [View]        Down        That One Server
0.8.0        [View]        Down        Tunare (D'yc)
0.8.0        [View]        Down        Wanderlust - Legit/SoD/Fbook
0.8.0        [View]        Down        [BP]~ ~Burning Prince~ ~[CEQ]
0.8.0        [View]        Down        Taiwan-CEQ Sun's Style EverQuest


trevius 12-23-2010 01:42 AM

Also, here is a screenshot showing double listing of EZ and Irreverent servers:

http://stormhavenserver.com/download...le-servers.jpg

Rogean 12-23-2010 02:01 AM

Last 2 posts are me restarting the LS as I was working on the code.

Still not getting trace routes.

All you guys have done is claim it isn't working and not provided me with any help figuring it out.

You're going to have to do a little bit more. Start getting debug code in to figure out why the world is getting disconnected. Is it timing out, is the loginserver disconnecting it, etc.

And get me those trace routes from the servers.

Rogean 12-23-2010 02:18 AM

I've put in changes that will fix a world displaying as down due to making a second connection before the first connection was declared stale.

EZ and THF look to be reconnecting pretty often.

I can see right now the loginserver is having to remove a duplicate worldserver pretty much every time these worldservers are connecting. This means that it is NOT the loginserver that is disconnecting them. Something is causing the WORLDSERVER to reconnect on it's own.

Quote:

[20101223.011610] [World] Connected: (+2) [EZ (Linux) x4 Exp] EZ Server - Custom Zones, Vendors, Quests, Items, etc, <EZ-Login> 63.131.xxx.xx V0.8.0
[20101223.011610] [World Duplicate] Disconnecting FD (58,37) [EZ Server - Custom Zones, Vendors, Quests, Items, etc], WorldID(768,768)

trevius 12-23-2010 02:36 AM

Here is a traceroute from Storm Haven:

Code:

traceroute eqemulator.org
traceroute to eqemulator.org (67.23.190.71), 30 hops max, 40 byte packets
1 DD-WRT (192.168.1.1) 0.575 ms 0.833 ms 1.692 ms
2 10.167.160.1 (10.167.160.1) 10.798 ms 11.767 ms 12.060 ms
3 96-34-52-44.static.unas.mo.charter.com (96.34.52.44) 12.246 ms 12.372 ms 12.472 ms
4 96-34-48-57.static.unas.mo.charter.com (96.34.48.57) 16.408 ms 17.219 ms 17.501 ms
5 96-34-2-194.static.unas.mo.charter.com (96.34.2.194) 17.683 ms 17.780 ms 17.877 ms
6 96-34-0-130.static.unas.mo.charter.com (96.34.0.130) 33.103 ms 27.883 ms 28.071 ms
7 96-34-2-123.static.unas.mo.charter.com (96.34.2.123) 31.412 ms 29.735 ms 32.252 ms
8 96-34-78-89.static.kgpt.tn.charter.com (96.34.78.89) 33.349 ms 33.782 ms 96-34-72-5.static.unas.mo.charter.com (96.34.72
.5) 33.636 ms
9 96-34-72-39.static.unas.mo.charter.com (96.34.72.39) 38.576 ms 96-34-72-37.static.unas.mo.charter.com (96.34.72.37) 38.2
32 ms 96-34-78-33.static.kgpt.tn.charter.com (96.34.78.33) 37.299 ms
10 dtr02spbgsc-tge-2-3.spbg.sc.charter.com (96.34.64.40) 38.731 ms 38.865 ms 38.969 ms
11 96-34-67-181.static.unas.sc.charter.com (96.34.67.181) 39.294 ms 39.414 ms 39.499 ms
12 96-34-67-179.static.unas.sc.charter.com (96.34.67.179) 47.360 ms 48.357 ms 48.467 ms
13 96-34-67-175.static.unas.sc.charter.com (96.34.67.175) 36.332 ms 34.701 ms 37.690 ms
14 68-115-195-86.static.spbg.sc.charter.com (68.115.195.86) 39.402 ms 39.257 ms 39.486 ms
15 te-5-5.rtr2.avl1.netriplex.com (67.23.161.254) 38.752 ms 39.037 ms 42.180 ms
16 eqemulator.net (67.23.190.71) 43.860 ms 43.474 ms 43.192 ms
XXXX@stormhaven:~$

Not sure that will be of any assistance though. I haven't noticed any packet loss during continuous pings while the problem is happening. I do agree that better LS connectivity logging might help to isolate problems like this. Heck, for all I know, it could be my server that started the eqemu website server list to start having status display issues. Though if that was the case, it could mean that any server could have potentially caused it for all which would be a good thing to resolve/prevent anyway.

My best guess so far is that the problem is somehow related to the fairly new (6 months old or so) code that allows servers to connect to multiple Login Servers at the same time. I know that when I removed the config to connect to LSs other than the EQEmu one, my server was suddenly able to stay connected to the EQEmu LS much better.

And, Rogean, I don't mean to seem like I am pushing this issue all on you or anything. It is just hard to figure out a connection issue only looking at a single side of the connection. If there didn't seem to be any other oddities going on, I would assume that the problem was on my end only and would work on it quietly by myself until it was resolved. Since it seems that it may not just be my server, it doesn't hurt to investigate the possibility of the issue being with the EQEmu LS or even one of the other LS like tsahosting.net or peqtgc.net.

I am curious if one of the other LSs was disconnecting servers, maybe they would try to reconnect to all LSs they are configured to connect to even if they are already connected? I should check with Gaeorn to see if he is seeing anything odd on tsahosting.net lately.

Angelox 12-23-2010 09:26 AM

Here's some web tests that may or may not be of help;
http://www.indeep76.com/eqemulator.net/
http://www.indeep76.com/checks/eqemulator.net/6851/
I reported the problems starting a few days ago because that's when they started. I mentioned it was after you did maintenance work, as an observation that might be of help to you; Prior to that date, everything was running fine, and I haven't changed a thing. I noticed an impressive amount of servers with the same situation, so I deducted the problem was LS server side.

Quote:

My best guess so far is that the problem is somehow related to the fairly new (6 months old or so) code that allows servers to connect to multiple Login Servers at the same time. I know that when I removed the config to connect to LSs other than the EQEmu one, my server was suddenly able to stay connected to the EQEmu LS much better.
Not that it matters, but I can guarantee you this is new, as of a few days ago. currently, I'm only single connected (to AXCLassic LS) , because I can't keep a stable connection to EqEmu LS anymore. When I try to "Single LS" connect to EqEMu, it lasts a few mins and drops.

Tabasco 12-23-2010 11:10 AM

The host for my primary server (running 1771) reported packet loss issues from an over-utilized circuit, but are in the process of resolving the problem.
I've been connected for around 24 hours now with no problems so I'm inclined to think my issues were purely host related and just coincidentally timed, but I'm keeping a close eye on it.

I've updated my test server (completely different host) from 1771 to 1777 to see how it behaves.

Astal 12-23-2010 05:37 PM

Im also experiencing this problem

Ive changed nothing in my exes or anything, was working fine a few days ago.

trevius 12-24-2010 02:47 PM

Quote:

Originally Posted by Angelox (Post 195473)
Not that it matters, but I can guarantee you this is new, as of a few days ago. currently, I'm only single connected (to AXCLassic LS) , because I can't keep a stable connection to EqEmu LS anymore. When I try to "Single LS" connect to EqEMu, it lasts a few mins and drops.

Yeah, I agree that this is a very new issue. My theory was that with the addition of multiple Login Server connections, it might open up new possibilities where one Login Server could potentially bleed an issue into another Login Server. I don't know the details enough to really know if it is possible or not, but I think it could be something like this:

1. Login Server A is up and running perfectly fine without an issue.
2. Login Server B starts to experience an issue that is causing it to constantly drops world servers.
3. Multiple servers are configured to connect to both LS A and B.
4. As the servers start getting disconnected from Login Server B frequently, they start sending reconnects for both A and B at the same time even though A has remained stable.
5. Login Server A reacts badly to servers trying to register again that are already connected and registered. Somehow this causes all external servers (not on the same LAN as LS A) to experience connection issues with LS A.

Now, that example is probably not possible at all, but if nothing has changed to cause this problem, I am grasping at straws.

No servers have the info Rogean wants, because from a network perspective, everything looks perfect as far as I can tell. I never drop a ping to eqemulator.net and traceroutes and response times look great. I get about 35ms response time consistently, so that is not the problem.

I think part of the problem is that P1999 and PEQ both run on the same host as the LS, so they will never experience this issue if it is only from external networks, which is what it appears to be. And without P99 having this issue, I am sure it is harder for Rogean to troubleshoot from his end, or to even verify there is a problem at all. If P99 was experiencing this issue as well, it probably would have been resolved already.

Considering that so many servers are experiencing and voicing issues with the EQEmu LS only, I think it is pretty clear that the problem is there. Since PEQ and P99 don't have this problem at all, and all other servers seem to be having at least some sort of issue (some worse than others), I think that points to some type of issue with networking internal vs external. It could be a port issue, a routing issue, a DNS issue, or just about anything coming into the EQEmu hosting or anywhere up to that point. It could even be something related to the recent DDoS attack on P99 and related to how the hosting service assisted in mitigating the attack (assuming they did something).

Considering there was "hardware maintenance" on the 15th and a recent DDoS attack, it seems like one of those issues/changes could be to blame. Without us knowing exact details on issues/changes like that, it is hard to assist in possible suggestions for the cause and resolution of the issues.

erik_llewellyn 12-24-2010 03:38 PM

That makers a lot of sense actually since the easiest way to combat a DDoS attack is to only allow so many connection attempts from an address in a given amount of time before either banning the address or temp banning it. If the our external world servers loose sync and keep trying to reconnect, the LS may be seeing that as a DDoS attack and causing issues. Even more so if as you hypothisized about having multiple LS's syncing to one world server and if one goes out of sync it resend the world server login request back to all LS's in it's list.

Akkadius 12-24-2010 04:54 PM

I've mentioned these things to Rogean before this thread started. I've had all the same issues on two servers and I've seen them consistently. Not that they are ultimately keeping people from playing, but there still is a definitive difference AFTER whatever 'maintenance' happened. As far as what has changed specifically, that will have to be on Rogean's side. And I'm sure it's one of those problems that is very hard to reproduce or trace on his end.

mtgtnt 12-24-2010 05:44 PM

I am not familiar with EQEmu setup yet, but here is a simple test:

Login, continue to "Server Select" Screen.
Do NOTHING, wait 60 seconds.


If you get a pop stating "Error - A timeout occurred", your going no where.

You can hit OK, and wait again for 60 seconds to see if the server list refreshes or you get the error, but this is in vain. You will always get the error unless you restart the client.

If the list of servers updates, your good to go, you can get onto the server of your choice.

izmael 12-24-2010 06:45 PM

Hey folks, I'm just a player on one of the servers but this got so annoying I can't help but lurk over here. Basically, it's a MAJOR pain to log in right now, been this way for several days.

I have no knowledge about how the world server and the login server communicate, but (let me know if I'm wrong), I've got a feeling that:

1. The client establishes a TCP connection to the login server in order to submit login and password. This works maybe 30% of the time. I guess it gets some kind of token in return.

2. When the above works, the world server establishes a TCP connection to the login server to verify the token information given by the client. This appears to also work about 30% of the time.

Therefore on average you need like 10 attempts before logging in one character (if, like me, you box 11 chars... feel my pain.)


Do anyone with access to the server and/or the firewalls/routers immediately in front of the server can use tcpdump or a similar network tool to see why the connections are being dropped or reset? I have managed unix and IP networks for a long time and will gladly help if needed.

By the way, if the login server is behind a linux NAT gateway, I'd check the ip_conntrack table somewhere in /proc . It might be full (especially if you moved to a new host recently and the firewall changed or something). A full ip_conntrack would give exactly those symptoms.

It could be a million of other things though... and tcpdump is your friend in these situations.

Let me know if I can help.

izmael 12-24-2010 06:49 PM

As for the problem being "hard to reproduce or trace on Rogean's end" - if the login system runs in a Linux system (or other flavor of unix) system, I'll gladly help. There's a wonderful utility called "strace" that you can use to see what happens inside a running process. =)

Rogean 12-25-2010 03:18 AM

The fact that you guys are saying it works for a while after starting the worldserver, and then at some point it starts having problems... Then goes back to no problems after restarting the world server?

How is that not an issue with the worldserver?

There are a handful of servers having this issue but there are a LOT of servers not having any issue at all. I've connected a server from my house and it hasn't had any problems maintaining loginserver connections.

You guys are going to have to put some debug code into the TCP handling of the worldserver to figure out exactly whats going on.

izmael 12-25-2010 09:28 AM

Rogean,

I think people tend to "blame" the login server for that issue because it started happening exactly after it went down for "hardware maintenance".

trevius 12-25-2010 09:56 AM

The only servers I haven't seen an issue on so far for sure are P99 and PEQ and maybe VZ/TZ. Nearly all of the other preferred or Legends servers have been having an issue over the past few days. Not all of them have replied to this thread yet to report their issue. And as far as I can tell, many of the normal servers have also had connection issues too.

I was running Rev1769 on Storm Haven when this issue started a few days after I updated to that code. During the troubleshooting process, I downgraded to Rev1757 to see if that would help at all. My server ran on Rev1757 perfectly fine for at least a week when I was originally running it, so reverting back should have fixed it, but it did not.

If the issue was from the SVN, PEQ would also have the issue by now I assume, as they stay as current as Storm Haven, if not more-so. But being that they are on the same host as the LS, I highly doubt PEQ will see this issue ever.

Quote:

Originally Posted by Rogean (Post 195548)
The fact that you guys are saying it works for a while after starting the worldserver, and then at some point it starts having problems... Then goes back to no problems after restarting the world server?

How is that not an issue with the worldserver?

When a large number of servers are having the same issue all starting at the same time, I think it increases the chance that the problem is on the Login Server end. Yes, it does not sound like any normal type of networking issue, but it could be any number of things. Maybe the LS isn't replying to worldservers as quick as it should be or as often, I dunno.

BTW, Rogean, what is the name of the test server you have running at home and would it be possible to open it to the public for testing? I would like to check on it from time to time over a day or so to verify it remains on the LS list and also can all connections to the server on a regular basis without having intermittent issues or eventually dropping off of the list completely. I can't really monitor it much today, but probably starting tomorrow I could.

Also, when was the last time the LS was restarted? Looks like it is starting to get a bit off:

http://stormhavenserver.com/download...ginserver2.jpg

Tabasco 12-25-2010 11:53 AM

My main server stayed up for a little over 48 hours. For that duration it would drop to pending and then back to online frequently. It came right back up with an LSReconnect and stayed up for another 12 or so before going down again. That's the host with known packet loss issues. (r1771)

My test server, on a different host, hasn't even dropped to pending in the last 48 hours. (r1777)

I'd be glad to coordinate for some more specific tests.

Rogean 12-25-2010 03:23 PM

My test server is the Code Cave.

The servers showing up as [] is a known issue, I know what's causing it, and I'll fix it soon. It is not something that would affect other servers being disconnected.

The loginserver was rebooted last night to put more debug code in.

Rogean 12-30-2010 10:13 AM

Is this resolved?

I'm not seeing servers getting bumped off anymore.

Lillu 12-30-2010 11:43 AM

I don't want to jinx it but it seems all fine for 2-3 days now. Thanks for fixing it :)

Congdar 12-30-2010 05:32 PM

I'm not seeing those reconnect messages anymore.


All times are GMT -4. The time now is 03:34 PM.

Powered by vBulletin®, Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.