Data Evolving Over Time.

In my previous blog post, I briefly touched on how I decided to switch to RethinkDB as I previously kept everything in memory and saved it to files on disk.

That's not the whole story.

Back when I started the bot, I needed a way to save user data, and a way to save the cooldowns in order to access them later, so I did what any inexperienced newbie would do. I stored it in the applications memory, the cooldowns as one global object, mapping each users ID to the timestamp of when they last did a command and saving the user data to a specific file on disk, having one file per user.

{
   "123":{
      "adventure":1543568602245
   }
}

And it worked.

"But if it worked, why did you have to change it?"

Well Joel, I lied. It didn't work.

...

Well, not perfectly at least.

Using this approach, with one file for each user wasn't great, in fact, it wasn't even good, it just led to issues. See, when you open a file with NodeJS, you use the built in `fs` API to open the file and write to it.

But, you have to close the file when you're done. Something I didn't know at the time, so I kept getting issues where I had too many open files and the bot would crash and loose a bunch of data due to it being unsaved.

So, I had to move the data.

Now, at this point I was already working with tons of throughput, and I needed a way of keeping the cooldowns in memory even if the bot crashed, which is another issue with storing them in the application memory. If it crashes, your data is gone.

And I went searching for a solution.

At first, I thought of using another NodeJS process that would keep the cooldowns in memory, but I decided against it due to not having a lot of RAM on the server I was using at the time, some server space I had been lent by a good friend of mine, along with the same issue as before, it wouldn't keep the data if it crashed.

I needed a database.

At the time, the only database I knew of was MySQL which I didn't want to work with due to the SQL, nor would it be suitable for my purposes of just storing a simple JSON object.

So I asked around.

At this point, I had gotten in several groups with other developers with bots larger than mine and more experience with these large-scale applications, which proved to be a huge help when I was starting out with this project.

Enter Redis, an in-memory data structure store, used as a database, cache and message broker, which one of my fellow developers recommended to me. Perfect for my use case, I thought.

And I went to town.

Moving the user data over to Redis gave me a big roadblock initially.

Since Redis is a key-value store, it means that the data isn't stored as it would traditionally be in a SQL database where you have a new row for each new dataset, but instead you map a specific value to another value.

This got me thinking. "How do I make this work with all my JSON data?".

"Ofcourse!", I thought, "I'll just store the entire JSON object with all the user data in one key!"

...

That was a bad decision.

At the time, I thought it was genius, it would solve all my worries of having the data in files, and it would be fast.

So, I started saving the data.

client.set("users", JSON.stringify(users));

Now, at this point, I was still keeping the user data in memory of the application, loading it from Redis on startup and modifying the object when a user executed a command, serializing it all and saving it back to Redis again.

This proved very slow after a while.

Once I got a lot of users, this system didn't hold up. Response times was in the 10s of seconds, all because of the way I was saving the data, and I needed another solution.

Enter RethinkDB

As I previously mentioned, I didn't want to deal with MySQL due to SQL being a pain and it doesn't do dynamic and relational data very well without some sort of ORM library.

This is where RethinkDB, a free and open-source, distributed document-oriented database that stores JSON documents with dynamic schemas, comes in.

I setup a server and wrote a script to migrate the data from Redis to RethinkDB, getting the data of every key in Redis, parsed the data and then send it to RethinkDB.

client.get("users", (err, reply) => {
		if(err){ throw err; }
		let d = JSON.parse(reply);
		console.log("redis Users");

		let now = new Date().valueOf();

	    r.expr(d).do(function(obj) {
	        return obj.keys().filter(function(key) {
		        return obj(key).typeOf().eq("OBJECT")
		    }).forEach(function(key) {
		        return r.db("DB_NAME").table("users").insert(obj(key).merge({"id": key}))
		    });
	    }).run(conn, (err, cursor) => {
        	if(err) throw err;
        	console.dir(cursor);
        	let n = new Date().valueOf();
        	console.log("Time taken: "+((now-n)/1000)+" seconds");
        });
});

So, we're good to go right?

Not quite.

I was still keeping the cooldowns in memory since I didn't have the need to move them yet, as I was still just running a single shard and they weren't mission critical to the operation of the bot.

Now, this worked great for when I only had a single shard, but as I got more shards, I noticed something.

The cooldowns weren't in sync across shards, meaning that users could do an adventure on one shard and then do it on another shard directly afterwards, completely bypassing the cooldown system.

What did you do then?

Well, after rewriting everything to use RethinkDB, I decided that it'd be the best action to take as I already had all other data there, and so I did, and it worked great.

...

Until I got more users.

See, on each new command, I had to read the value of when the user last executed the command from the database, meaning one read per command, and one write per command since I had to save it again.

Now, RethinkDB is build for high throughput of data, but it's not a good idea to have it read and write 10's of thousands of data rows each second.

So, you're out of luck then?

Not really, Joel.

See, I still had the Redis server running, and Redis is built for extremely high throughput of data, so I moved my cooldown data to it instead which reduced the load on RethinkDB a whole lot and improved the response times of the bot at the same time.

In conclusion

Use an appropriate data storage method for your application and research and test your options. Don't settle for one thing just because it looks the easiest, it might be worth putting in the extra effort in order to learn a new thing.