MongoDB

I want to apologize for the delay on this, I wrote 95% of this article months ago then life got busy. So I am back at it now with some free time finishing it and hitting on a few new areas of knowledge to share. This is a very simple subject so this won’t be long at all.

The Code

In the same fashion as all of my coding tweets I will first give you all of the code for the entry (just make sure you insert your own values into it before using it). So here is the code:

 Prerequisites

As with most Node projects there are some modules that need to be installed. So run the following in the directory of your project:

 Code Analysis

I will be skipping the part of the code where I open up the Twitter stream and the db connection. If you wish to know how these work please see my earlier entry here.

This small block of code defines the schema for our documents that we are inserting into Mongo. It will have the tweet ID (which comes from Twitter and is unique for each tweet and also very large so we must take it as a string), the date the tweet was sent at, the text of the tweet, and the username of the person who tweeted. It’s pretty simple and self exclamatory.

This will just create an instance of the model using mongoose. Simple as that. On the name passed in as the first argument make sure you remember this as that is now your object type that you will instantiate. Thank you JavaScript for making this kind of crazy on us that don’t use you often.

Note this first line where it instantiates the object. The name is the same name you passed into your model function previously, so don’t mistype this as it will throw some rather interesting errors on you.

Next is the easy part where we are just taking out the pieces of the tweet and putting them into our model. Nothing fancy here, but just notice that our tweet_id is a string not any form of integer.

All this does is save the model to Mongo. Very simple and straight forward.

Lastly just let this run for a few hours and then check your collection to see what has been inserted. I will continue on with a more complicated example later that does some more with the Twitter API.

The Code

Here is a simple example of how to create a simple Node.js client that reads tweets off of a user stream and inserts them as JSON into a MongoDB collection.

To make it simple for people that don’t need an explanation of the code here is the code in it’s entirety (just be aware there are parts you need to change for your own data):

Now if you need to start from the get go start reading here again as I will explain exactly what this thing does. If you are new to Node.js then please read my article here before continuing as it might help with the above syntax and understanding Node.js better.

To run this simply put it into a text file and run it by running: node <filename>.js

Pre-requisites

Before running this there are a few commands you should run first. These are to install the needed modules for this code, they are pretty self-explanatory. They are:

You will also need to setup a Twitter account with an application registered for this. If you are unsure of how to do this read this document. It’s a great intro that I found when doing mine since it is not really that intuitive so maybe Twitter will make this easier to find in the future.

Notes on MongoDB Hosting

If you don’t want to host your own MongoDB instance for whatever reason I will give a huge recommendation on a hosting provider that I use. It is called MongoLab and these guys are awesome. They give you a free instance for up to 512MB of stored data and in using them for a few months they are very reliable with no downtimes so far. Plus their editor works well for seeing JSON data as will be inserted by this program.

The Code Broken Down and Dissected

Now let’s do a break down on the code itself and explain what each section does.

This is pretty easy to look at and see what it does. It creates an instance of MongoClient from the just installed module and tries to connect with it. Makes sure you change my URL string to whatever it is for yours. If it fails then it will throw an error for you to see on the command line. Next it tells the Mongo module which collection to use (if you are new to Mongo this is a rough equivalent to a database from the SQL world). You will get this from your Mongo instance.

This is the part where we connect to Twitter. You will need to log into Twitter and find the application you registered earlier and on that page is a list of keys you will use above. Just match up the labels with the keys in the code. Simple as pie (well sometimes pie isn’t that easy).

The first line of this initializes a user stream from a Twitter account and registers a callback function. The second line sets up an event-handler for when data is received from the Twitter stream and registers a callback function passing the data from the tweet into this function.

The first line of this code inserts the data (which comes from Twitter as JSON) into your Mongo collection and registers a callback function. The code inside of that callback function simply counts the number of records in the collection and spits it out to the console. The nesting of multiple callbacks can get confusing, but after writing a few it becomes pretty easy to see and if you use a good editor it will show the indentation and syntax coloring for you. I use vim as an editor, but there are plenty more out there.

 Conclusion

This was a very simple example of how to insert tweets into a MongoDB collection. Just be warned if you run this for any extended amount of time it will take up a LOT of space. For example I inserted 124 tweets and that takes up about 360KB. So just scale that and you can see it will take up a ton of space if you let it go for an extended amount of time. I recommend breaking down the JSON and inserting only what you need if you want to run this for any significant amount of time. I will probably touch on a way to do this in a future posting once I get a design created and some code created for it.