Thursday, January 3, 2013

Connect to MongoLab with Pymongo

If you've read my blog before you'll know that I'm a huge fan of Heroku. Recently I've been working on a project that is hosted with Heroku and uses the MongoLab add-on (also a huge fan of MongoDB). The project stores documents in the database and then displays them to the user. Because of the nature of the project I didn't want to create an authentication/authorization system to post new documents to the database from the web. Pushing new docs from a command line utility on my computer was good enough. Why not use mongoimport? Well, the input file to mongoimport has to be formatted so that each JSON object is contained on a single line. You can imagine how tedious it would be to format a complicated document onto a single line and readability is just terrible. What I wanted was to create a nice, readable .json file with a document in it for the database and then do something like
someutil --server 1234.mongolab.com --port 8000 -db test --user "username" --pw "password" --file widget.json
The utility would then parse the .json file and update my MongoLab db with the object in widget.json


Enter Python

I just recently started tinkering with Python and love it for it's simplicity so I decided to write up a quick app to post JSON documents to a MongoLab instance. All told it took me about an hour to figure things out and I want to share the code as it may be useful to somebody else trying to connect to a MongoLab database using Python. If you want to skip the explanation just scroll to the bottom for the full source code listing.
The first thing we need to do is import the necessary modules
import sys
import pymongo
from pymongo import MongoClient
import json
The pymongo driver is the officially supported MongoDB driver from 10gen and works nicely. We'll need the MongoClient member from that module as well as the sys module for reading the file and the json module for parsing it. I'll skip the dirty argument parsing code I wrote but have included it below for reference. We'll pretend that all of the arguments are hard-coded.
The first thing we need to do is create a connection to our database.
connection = MongoClient("ds1111.mongolab.com", 37000)
db = connection["testdb"]
# MongoLab has user authentication
db.authenticate("username", "password")
I just made up the server and port values but you can get those straight from your MongoLab dashboard where it shows the connection URI. I also made up the database name testdb for this example but if you're using Heroku you can get the database name from your MongoLab dashboard as well.  Below is a sample from my MongoLab dashboard.
MongoLab Dashboard
Once you have a connection you can access your collections.
widgets = db.widgets
This will set the widgets variable to a collection in your testdb database called widgets. Now we're ready to read the .json file into a variable.
json_data = open("widget.json")
new_widget = json.load(json_data)
At this point you should have an object in new_widget that is ready to be loaded into the Mongo database. Now you could check first to see if one already exists with similar attributes and update it but for now we'll just create a brand new document each time and let the database assign it an _id
widgets.insert(new_widget)
That's it! That's all of the python code required to read a .json file and create a new document in a MongoLab database collection. I used the following file to test my code.
{
  "Name": "My Widget",
  "Type": 3,
  "Price": 19.95,
  "Description": "A simple widget that does practically nothing",
  "Parts": [
    {"Name": "Screws", "Quantity": 100},
    {"Name": "Speaker", "Quantity": 1},
    {"Name": "Batteries", "Quantity": 2}
  ]
}
And here is the resulting document showing up in my MongoLab database.
MongoLab Document
Here is the full listing of the quick and dirty application I wrote to test the concept.
#!/usr/bin/python

import sys
import pymongo
from pymongo import MongoClient
import json

numArgs = len(sys.argv)
args = {}
numDocsUpdated = 0
processingArg = False
currentArg = ""

for i, arg in enumerate(sys.argv):
  if i == 0:
    continue

  if processingArg == True:
    args[currentArg] = arg
    processingArg = False

  if arg.startswith("--"):
    currentArg = arg.lstrip('-')
    args[currentArg] = ""
    processingArg = True

connection = None
db = None
widgets = None

if (args.get("file") == None):
  print "You need to specify a JSON file to upload using the --file argument"
  exit(1)

if (args.get("server") != None and args.get("db") != None and args.get("port") != None and args.get("user") != None and args.get("pw") != None):

  # Use the command line args
  connection = MongoClient(args.get("server"), int(args.get("port")))
  db = connection[args.get("db")]

  # MongoLab has authentication
  db.authenticate(args.get("user"), args.get("pw"))

else:
  # Just connect to the local machine database
  connection = MongoClient()
  db = connection.test_db

widgets = db.widgets

json_data = open(args.get("file"))
widget = json.load(json_data)

# Add the widget to the database

print "Adding item to database"
widgets.insert(widget)
numDocsUpdated = numDocsUpdated + 1;

print ""
print ""

print "DBLoader updated", numDocsUpdated, "documents in the database"
Hopefully these instructions can be useful to someone out there. If you found this helpful the please let me know about it. Likewise, if you have any questions about this I'd love to help you get it working in your environment.

6 comments:

  1. Brilliant - after I wasted hours trying to integrate a simple python script with Azure's documentdb, I read this one article and get it running with mongolabs in about 20 mins. Thanks!

    ReplyDelete
    Replies
    1. Awesome Josh. Glad it was useful. Thanks for taking the time to let me know. I really appreciate it.

      Delete
  2. Hi,

    Thanks a lot..You solved my authentication failed issue.

    THanks,
    Charles.

    ReplyDelete

Keep it clean and civil. That's all I ask.