Server keeps timing out due to memory usage - How to fix this?

Sutra

Breathe the body deep
BuSo Pro
Joined
Oct 28, 2015
Messages
606
Likes
577
Degree
2
#1
Over the last few months my site has been sporadically going down. I spoke with support at Knownhost and they said the server's services were killed due to using too much memory.

They found there were a bunch of brute force attacks so I had them block the offending IP's. Then I added rate limiting and throttling via Wordfence.

That helped. The outages decreased, however it still happens sometimes. I spoke to support again and this time they said the 2 biggest factors they see are:
  1. Googlebot crawling excessively
  2. Admin-ajax.php called every time someone accesses a new page. Regarding the admin-ajax they said: "...while probably negligible as far as memory resources are concerned, it seems to be being queried very often (like someone is currently logged into the page, editing a post or creating a new post, and haven't logged out)..."
How do I fix this?
 

CCarter

If they cease to believe in u, do u even exist?
Staff member
BuSo Pro
Boot Camp
Digital Strategist
Joined
Sep 15, 2014
Messages
2,008
Likes
4,475
Degree
5
#2
How much RAM does your server have?
 

Sutra

Breathe the body deep
BuSo Pro
Joined
Oct 28, 2015
Messages
606
Likes
577
Degree
2
#3
My server has 3G of RAM.

Support told me I am using about half of that, and have a little bit more than half of the total available to use.

total used free shared buff/cache available
Mem: 3.0G 1.3G 1.3G 115M 434M 1.6G
Swap: 0B 0B 0B
 

CCarter

If they cease to believe in u, do u even exist?
Staff member
BuSo Pro
Boot Camp
Digital Strategist
Joined
Sep 15, 2014
Messages
2,008
Likes
4,475
Degree
5
#4
If your server isn't hitting SWAP then it's not a RAM problem. When your server/computer runs out of RAM it then starts writing to the harddrive, which is WAYYY slower than RAM memory - and that is called SWAP.

Since you aren't hitting SWAP Knowhost's support is at best guessing what the problem might be. You need someone to diagnosis it for real by getting SSH access and looking at what's running throughout the day.

LOL at Googlebot crawling being the problem. The problem is you've got a shitty plugin (or badly coded theme) that's calling that admin-ajax.php script and causing you problems. There is a chance that you've been hacked/compromised and your server is being used to send mass emails or something - perhaps one of those bruteforce attacks worked. But again this is all guesswork.

My first action would be to turn off all plugins and see if the site gets back to normal. If things stabilize then you know it's a plugin or a badly coded theme.
 

Sutra

Breathe the body deep
BuSo Pro
Joined
Oct 28, 2015
Messages
606
Likes
577
Degree
2
#5
According to G Webmaster Tools, I have a bunch of 404 errors (due to me removing pagination and redirecting everything a while back - obviously I didn't do it right, hah). Is it possible that the 404s/redirects are causing the overload? As in, googlebot is crawling and rapidly going in circles taking up a ton of resources?
 
Joined
Apr 7, 2016
Messages
237
Likes
142
Degree
1
#6
According to G Webmaster Tools, I have a bunch of 404 errors (due to me removing pagination and redirecting everything a while back - obviously I didn't do it right, hah). Is it possible that the 404s/redirects are causing the overload? As in, googlebot is crawling and rapidly going in circles taking up a ton of resources?
Google could be getting caught up those pages but they're not likely going to hammer your site. Do you have access to your server logs? Dump it into something like Screaming Frog's Log Analyzer and you can visualize what bots are getting caught up in and how frequently.

New Relic could also show you what pages and processes are using up the most resources and give you a time frame of when it happens.
 

Ryuzaki

女性以上のお金
Staff member
BuSo Pro
Digital Strategist
Joined
Sep 3, 2014
Messages
2,969
Likes
5,443
Degree
7
#7
@Sutra,

Every time your website is hit by a visitor, cron jobs as well as admin-ajax.php is run along side the Heartbeat API to check a bunch of nonsense. This was built in for sites that get very little traffic to trigger cron jobs instead of just running them when the timer runs up.

... Every single page load checks and runs crap unnecessarily. This is native Wordpress behavior that doesn't interfere often because most sites never get enough traffic for it to matter.

In your wp-config.php file at the root of your Wordpress installation, add this:

PHP:
/** Limit Chron Runs **/
define('WP_CRON_LOCK_TIMEOUT', 900 );
The number 900 is in seconds. 900 seconds is 15 minutes. You can set it lower than 15 minutes if you want, that's just where I set it because I don't have any crons that have to be run any more urgent than that. I could probably go higher.

But what this will do is stop this crap from running every single page load to ONCE and only once every 15 minutes. That should reduce that load drastically, if not fix your problem entirely.

This won't stop the Heartbeat API, which you probably want running, but I'm pretty sure it limits it's execution. It might goof up any real-time plugins you're using for stat gathering, but if you're running a normal content site it's fine.

If it turns out this makes a difference but not the full difference you need you can disable the Heartbeat API entirely or limit it like I showed with the WP_CRON_LOCK_TIMEOUT above. There's a plugin for it or you can find some functions.php code for it. Usually the Heartbeat API only fires from backend activity but it's available for plugin developers to abuse and fire from frontend users, which might be what's happening in your case.

Other things to consider if that doesn't fix it would be to make sure you're on PHP7 (if all your plugins are compatible), maybe run the P3 Plugin Profiler to see if it can spot any one or two plugins acting crazy, and you can try the Query Monitor plugin to see if you have any loops or queries going nuts and isolate that to a specific plugin.
 

Calamari

BuSo Pro
Boot Camp
Joined
Oct 6, 2014
Messages
744
Likes
869
Degree
3
#8
That's great specific advice from @Ryuzaki and it sounds like it may be the root of your problem, but I'd still get a dev to check out the server logs. I bet there are some other clues in there that combined with what Ryuzaki said will get you fixed up.

It's a shame you're paying a lot of money to knownhost for fully managed hosting and they can't offer you any real support.
 
Joined
Jul 30, 2015
Messages
131
Likes
70
Degree
0
#9
Over the last few months my site has been sporadically going down. I spoke with support at Knownhost and they said the server's services were killed due to using too much memory.
My server has 3G of RAM.

Support told me I am using about half of that, and have a little bit more than half of the total available to use.
Those two data points (services killed for excessive memory, using only half the available memory) obviously occurred at different timelines. So no point in analyzing further.

Ask support for memstat output at the time your services are being killed. Not when everything is running well.
 

Sutra

Breathe the body deep
BuSo Pro
Joined
Oct 28, 2015
Messages
606
Likes
577
Degree
2
#10
@Ryuzaki Thank you mucho. I just added that to the php file. Will see how it goes.

@CCarter @builder Thank you for the detailed info. I've just requested the memstat info to investigate this further.
 
Joined
Sep 3, 2015
Messages
15
Likes
6
Degree
0
#11
Sounds like an ajax script/plugin that's just not dying. Do you have any plugins that are using ajax to retrieve data? Calling admin-ajax.php at every page refresh shouldn't be an issue, it's the stacking of resources that some rogue script thats using the backend of WP is more than likely the issue.

Did you already try installing debugbar in WP and checking which functions, hooks and actions are being run on the page(s) that are causing issues?
 

Sutra

Breathe the body deep
BuSo Pro
Joined
Oct 28, 2015
Messages
606
Likes
577
Degree
2
#13
@Ryuzaki Funny you ask, I was actually going to post about this again.

I added the WP_CRON_LOCK_TIMEOUT', 900 like you suggested. That seemed to help but then over the last couple weeks the timeouts started happening again. Knownhost wasn't much help, basically saying the same things as before. However, this time they did recommend I use a plugin to change the login URL, which I did. Also on their suggestion I disabled Cron jobs through WP entirely. Now the jobs are only submitted via the server.

The last thing they suggested is to reduce the MaxRequestWorkers setting. They say it's a bit high for current resources. I told them to hold off on that for now though. It sounded to me like it would affect users even further - but I could be wrong.

After all that, the outages still happened a few times. So a few days ago I removed a few plugins I thought might be the issue and I also installed Query Monitor and P3 Plugin Profiler.

Yesterday and today there haven't been outages, however, at times the site goes really slow, on both the front-end and the Wordpress backend. So it seems like something may still be up. I tried running the scan for the P3 Plugin Profiler and let it run for about 90 minutes. The progress bar only showed about 15% done so I stopped it.

I asked Knownhost for the the memstat output as @builder suggested. Support replied, "The only outputs we have available are the OOM log entries, and limited output from sar -r."

So after all that I have some questions, hah:
  1. Should I reduce the the MaxRequestWorkers setting as support suggested?
  2. When I ran the P3 Plugin Profiler I thought something was wrong because it was taking so long, thus I stopped it. But should I just let it run overnight?
  3. If P3 Plugin Profiler completes, what should I be looking for in the report?
  4. What exactly am I looking for within the Query Monitor info?
  5. Anything else I should check/do?
 

Ryuzaki

女性以上のお金
Staff member
BuSo Pro
Digital Strategist
Joined
Sep 3, 2014
Messages
2,969
Likes
5,443
Degree
7
#14
@Sutra, P3 Plugin Performance Profiler should be completing in minutes, not hours. I'm not sure how it does it, but it takes a measurement of the runtime for all of your plugins and displays them in a pie chart:


It tells you your overall plugin loading time and the percentage of your total page load time associated with your plugins (and also if they're running a zillion MySQL queries. The fact that you can't get it to run at all seems like you may have a runaway plugin.

Query Monitor can help. What it does is add a drop down in the Wordpress menu bar that lets you see how long it took your page to load and how much of that was associated with Wordpress queries.


If there's anything very wrong it will let you know and identify which queries are the problem. With some searching you can identify whether a plugin is causing the issue or if there's some crazy queries in your theme that can't be cached. Anything like "most popular posts" that write to the database on each page load and change which are displayed on each pageload can't really be cached. Stuff like that will pop out.

Which model of VPS are you using at Knownhost? Could it be that it's time to upgrade? Can you tell us what plugins you're using, or if not have you done any searches related to this issue with each plugin involved in the search?
 

Calamari

BuSo Pro
Boot Camp
Joined
Oct 6, 2014
Messages
744
Likes
869
Degree
3
#15
You're just taking shots in the dark here and the performance of your site is directly tied to your traffic.

Why the resistance to hiring a developer to properly troubleshoot?
 
Joined
Jul 30, 2015
Messages
131
Likes
70
Degree
0
#16
I asked Knownhost for the the memstat output as @builder suggested. Support replied, "The only outputs we have available are the OOM log entries, and limited output from sar -r."
Can you ask for (1) sar output at the time of the event and (2) OOM logs?

If it is feasible, move the site to a different server and see if the problem still persists. Virtualization has brought with it a whole host of weird behaviors by VMs - sometimes as a result of incorrect configurations, sometimes as a result of bugs.
 
Last edited: