MYSQL Innodb versus MYISAM - My Experience

CCarter

Final Boss ®
Moderator
BuSo Pro
Boot Camp
Digital Strategist
Joined
Sep 15, 2014
Messages
4,160
Likes
8,507
Degree
8
I'm going to tell you right now, if you are building a system on mysql, the engine you use is more important then you realize. I just found this out the hard way.

First off, If you've got tables which are being written to constantly form multiple angles, MYISAM is the worse table for this, since it locks the whole table while writing. It may not matter on a small app that's being written to once a minute, but when you get to 5 to 10 times a second from multiple "workers" or processes, the toll will be there. It will be small at the beginning, but when you start factoring in that some customers might be on the other side of the world, that 0.5 second to load, just went to 2 seconds. Sounds small, but then you get a burst of traffic, and that 2 seconds is now 4 to 6 seconds. That's just to deliver the data to end users.

You're not even thinking about latency from going across multiple networks, or loading elements like images, javascript, and processing on your server's side AND on the end customer's end as well - easy 15 seconds IF you are using the wrong table type.

Here is the thing about all this theory and processing speed - it's hard to visualize and understand what's going on without actual charts. So I'm going to allow you to peak into the backend of a SAAS I run on how we monitor speed.

Every request is logged for data size, processing, and speed - for the worker drones. But for users, we concentrate on one thing - speed. Let me show you some gradual screenshots of what's going on in terms of speed at the beginning with MYISAM being used on 1 of the multiple tables users were accessing (users access multiple tables pe request):

image.png


The chart is between for the last 200 days up to the last 60 days. What you are looking at is the "Loading Time in Seconds" table, for requests for the main function of the Saas. You'll notice at the green arrow, we've made some improvement that reduced time once we saw things were getting a bit laggy.

We improved servers ram, speed, upgraded data connections, and moved to bigger servers, that's the cause of that overall drop. But you will notice when there were tons of people on the system, you will see spikes of average loading times hitting 5 seconds. But overall they are below 5 seconds after the green arrow.

But don't let that fool you, look closely... you see another increase happening, and that's one of those "long term" increases, that you have to watch, since as data grows, your overall time to pull data grows - at least so I thought. Here is a closer look at that gradual range:

image.png


That's going to be a problem cause it's not enough to keep upgrading servers, and ram to infinity, you have to optimize code - but optimizing code, you'll have to bend your mind on how to reduce overall pressure on the system, otherwise in 3 months, you'll be at 10 second requests, and then your customers will start complaining your system is slow. Remember these are people in the digital age, 3 seconds is too long - take also into consideration we've got international customers, so for the people in North America they were seeing 1-2 second loads, but when adding the international crowd, they were seeing way above 10 second loading times, in order to get to that 10 second average.

That's when I kept digging around the internet to find how to make a system faster, and I had heard of Innodb, but just thought, meh, it might cut down on 1 or two seconds. But I was desperate so I said fuck it, and switched the one table - not even the biggest table we were access but one that was getting hammered the 2nd hardest, to Innodb and prayed for a stay of execution, give me at least 2 seconds less, here were the results:

image.png


Wow, do you see the difference? An even closer look:

image.png


Now a look at it on single day, hour by hour:

image.png


Average times for this one interface is less then a second - never before had that happened at any level, and this table has more data then it's ever had. Simply switching a single table to Innodb dropped ALL pressure on ALL overall requests to the system, and it's so fast no one even thinks about "speed" anymore. When you click, it loads - period. When your customers aren't even in the realm of complaining about speed, their old 10 second averages are down to 0.3 second averages, you've hit a new threshold.

Going forward our mission was no longer about finding another tweak to stay the execution - we could focus on things like adding new features and continuing to improve the overall experience with other techniques. It's like going from a 56K modem to 1GB per second data, speed is no longer a problem, you've got it to a point where no one notices or can even complain so now you can concentrate on other things. That one thing, changing a single table to Innodb did this, imagine if I start using some voodoo like memcache or redis...

What's ironic, is the only reason I was using MYISAM was because it was the default setup. After 5.5 MYSQL started defaulting to Innodb... just double check what your system is on, and IF it makes sense to alter tables to Innodb. It made sense in our scenario.

There is one thing I will say about this whole experiment, I would not have been able to do any of this if I didn't monitor data requests and the system as closely as I do. So if you are running a system where you aren't even able to monitor what's going on you might want to get things in place, even basics, so you know what averages look like and what outliers are, cause without this information, you can't improve upon what you do not measure or know.

That's my contribution to the programming sections :smile:
 
Last edited:
Excellent post and great example of using system performance data to pinpoint bottlenecks. Back "in the day" we used MyISAM mostly because it was smoking fast for mostly-read use cases, such as blogs. InnoDb was a dog as far as reads went. These days the two are pretty comparable in read speed, although in certain cases MyISAM can still be faster in that dept. Of course the gain of the read speed is lost when many writes start to happen.

InnoDb works pretty good out of the box, but there are some parameters that will really make it fly, probably the most important being innodb_buffer_pool_size Here's a good link that describes the bulk of these: http://www.percona.com/blog/2013/09/20/innodb-performance-optimization-basics-updated/
 
Nice analysis.

There is quite a few default setting in mySQL that do weird things. This is not doing weird things, as the behaviour can be explained. MOST use cases of mySQL will have a lot more reads than writes, so MyISAM makes sense as default.
As we are moving to more and more interactive uses of websites, innoDB will become more important.

::emp::
 
Another thing that will catch you out with MySQL is that it defaults to latin encoding or something, rather than utf-8. If you put a system into production with this and then enter UTF-8 data, you'll end up with junk data in your database that you can never convert back correctly. I'd suggest starting any new database with utf-8 as the default encoding.
 
Oh the stories we could tell CCarter on what we learned with coding up SERPWoo...

InnobDB, persistent connections, writing tons of MySQL inserts in 1 write, effective uses of LIMIT and ORDER BY, etc...
 
In all honesty, just use Postgres. I really mean that. There is no reason to use MySQL anymore (except legacy apps etc). Postgres's performance is far superior for almost every query now, it has new more useful datatypes and is just generally nicer.

There are almost no reasons to not use Postgresql now, other than "I know MySQL, I've used it before" - ie: scared of changed.
 
Mysql works for my projects - I'm not a fan of jumping to new shiny objects "just because". Note - 98% of SW is in Perl - everyone assumes it's php, but it's good old fashion perl. If it works it works, adoption and bringing on developers that can work with readily known systems is easier then trying to hire and find technology specific developers. Changing from mysql to postgres or to php or ruby on rails is not going to make me a dollar more - in fact it might reduce my dollars cause of the new learning curve.

I'm definitely trying to avoid these Mysql versus XYZ debates, cause they lead to one end result - If you know Postgres and you've learned it better then a newbie at it, you'll know all the little things to make it work at an optimal level, versus the newbie. It's like everything else in life. If I know how to drive a Formula 1 race car and you don't, you probably are going to lose in a race with me until you learn.

So simply switching databases to do so is just resetting the clock on a whole project since there is a new learning curve that has to be dealt with - which NO END USER sees. That time can be better spent focusing efforts on other things like marketing, improving user experience by making interfaces easier, getting feedback to improve current features and getting ideas for future features, and doing guerrilla marketing tactics to generate more revenue. It's about time management for me, Does it work? Is it fast enough so users do not notice a thing - perfecto, let's move on to more pressing matters like generating money.

I see coders trying to improve upon things that absolutely no end user will see. It will not make a difference to the end user if I'm using Mysql, Cassandra or Postgres - If I don't know all the things that make each database special and how to optimize it for the best experience - and since the user experience nor revenue is not going to be improved at any level I'd rather spend time on things that drive revenue like marketing versus learning a new type of XYZ technology cause what I've got works and works fast for users.
 
Mysql works for my projects - I'm not a fan of jumping to new shiny objects "just because". Note - 98% of SW is in Perl - everyone assumes it's php, but it's good old fashion perl. If it works it works, adoption and bringing on developers that can work with readily known systems is easier then trying to hire and find technology specific developers. Changing from mysql to postgres or to php or ruby on rails is not going to make me a dollar more - in fact it might reduce my dollars cause of the new learning curve.

I'm definitely trying to avoid these Mysql versus XYZ debates, cause they lead to one end result - If you know Postgres and you've learned it better then a newbie at it, you'll know all the little things to make it work at an optimal level, versus the newbie. It's like everything else in life. If I know how to drive a Formula 1 race car and you don't, you probably are going to lose in a race with me until you learn.

So simply switching databases to do so is just resetting the clock on a whole project since there is a new learning curve that has to be dealt with - which NO END USER sees. That time can be better spent focusing efforts on other things like marketing, improving user experience by making interfaces easier, getting feedback to improve current features and getting ideas for future features, and doing guerrilla marketing tactics to generate more revenue. It's about time management for me, Does it work? Is it fast enough so users do not notice a thing - perfecto, let's move on to more pressing matters like generating money.

I see coders trying to improve upon things that absolutely no end user will see. It will not make a difference to the end user if I'm using Mysql, Cassandra or Postgres - If I don't know all the things that make each database special and how to optimize it for the best experience - and since the user experience nor revenue is not going to be improved at any level I'd rather spend time on things that drive revenue like marketing versus learning a new type of XYZ technology cause what I've got works and works fast for users.

Usually I'd agree, except with Postgres it's essentially just a drop in replacement for MySQL for 99% of cases (definitely so if you use an ORM), with sane defaults. This almost entirely eliminates the learning curve.
 
I really like that you can store JSON in postgres. I.e. on one of my projects I did to monitor some data, I added rules in postgres as JSON, then they have an ID, and I can reference them many-to-many with the campaign the rule i applied to.

Rules db (simplified)
ID | Rule(JSON)
1 | ....


where ... is something like:
Code:
{"scope": campaign, conditions: [{"param":"ctr", "logic": "<", "value": 1}, {"param":"cost", "logic": ">", "value": 40}], "action": "raiseNotificiation"}

so AND can be handled in adding another condition, and OR by adding a new rule.

Then I have another table for campaigns with many-to-many like this:

CampaignId | RuleId
1 | 1
1 | 2
2 | 1

So you have the freedom of JSON and unstructured data, while still being able to use it relational.
 
Back