Monday, August 29, 2011

Mongo DB - Scalable Solution for Persistence of Media Files

MongoDB is a non-relational (schemaless) database that contains records in BSON format (binary representation of JSON). The best part of the MongoDB is its scalability, easy to integrate APIs (available in various web application development scripting languages) and easy usage via commands on various operating systems. The persistence of records in various collections (similar concept as of tables in relational databases) is non-transactional which makes the database operations quite fast as compared to relational databases. The point to be considered here that MongoDB should not be used incase a record miss cannot be accepted as in case of Banking systems and E Commerce Systems.

Some Use Cases of MongoDB

The schema-less and non-transaction property of MongoDB has made it to be usable in the following scenarios:

  • Archiving: the change in schema in relational databases over a decent amount of time makes it difficult to archive the data in non relational databases.
  • Logging: The insertion of records is fast in MongoDB because of the non-transaction property. However the same property is responsible for some insertion misses over a large number of insertions.
  • Real Time Analytics: it can be used to track the real-time performance metrics (page views, unique visits, etc.)  of a given website.
  • User Information Persistence for Identity Provider Systems: The user information such as registration, ratings, session data and profile can be saved in MongoDB in case of Identity Provider Systems or SSO systems.
Using MongoDB for File Storage

Grid File Store is a MongoDB specification based on which a large file is saved by splitting it into smaller chunks of data (256 K is size as default). The file is saved using two collections:

  • files: the meta-information like object id, size, insertion date and chunk id goes here.
  • chunk: the file data is saved in this collection.

Insertion API using GridFS

The insertion of a record required necessary meta-info that can be passed as key value pair in the form of Map. The other required parameter will be the file data in the form of bytes and the collection name which will be used to establish connection with the server.
public String insertContent(byte[] fileData,String myCollection, Map metainfo)
            throws MongoInsertionException, RepositoryConnectionException{
        mylogger.debug("Inserting record in Mongo Database.");
        DBCollection dbCollection = getMongoDBConnection(myCollection);
        DB db = dbCollection.getDB();
        GridFS myFS = new GridFS(db,myCollection);
        String mongoid=null;
        GridFSInputFile gridFileInput = myFS.createFile(fileData);
        for (Iterator iterator = metainfo.keySet().iterator(); iterator.hasNext();) {
            String key = (String);
            String value=metainfo.get(key);
            gridFileInput.put(key, value);
        mylogger.debug("RECORD ADDED SUCCESSFULLY IN MONGO!!!");
        if(mongoid==null) throw new MongoInsertionException();
        return mongoid;

Deletion of Record in GridFS

The deletion of record can be done by passing the object id to the GridFS instance.
private void deleteRecord(String id, String collectionName) 
                                             throws RepositoryConnectionException{
    DBCollection dbCollection=getMongoDBConnection(collectionName);
    DB db = dbCollection.getDB();
    GridFS myFS = new GridFS(db,collectionName);
    myFS.remove(new ObjectId(id));

Updation of Record in MongoDB using GridFS

Direct file data updation is not possible in MongoDB. For updation of a file data the logic can be:
  • Insert a record with the new file data.
  • In the meta-information of the newly added record (child) add information of the old data.
  • In the meta-information of the old data (parent) add  this information of the newly added record.
public boolean updateRecord(String id, String collectionName, byte[] updatedValue)
    throws MongoInsertionException, RepositoryConnectionException{
        Map metainfo=new HashMap();
        metainfo.put("parentid", id);
        String newid=insertContent(updatedValue, collectionName, metainfo);
        /*update meta information of old id*/
        DBCollection dbCollection=getMongoDBConnection(collectionName);
        DB db = dbCollection.getDB();
        GridFS myFS = new GridFS(db,collectionName);
        GridFSDBFile gridFSDBFile = myFS.find(new ObjectId(id));
        /*if some id exists previously, then delete it*/
        String currentUpdatedRecord=(String)gridFSDBFile.get("updatedversion");
        if(currentUpdatedRecord!=null && !"".equals(currentUpdatedRecord)){
            deleteRecord(currentUpdatedRecord, collectionName);
        gridFSDBFile.put("updatedversion", newid);
        mylogger.debug("The meta information updatedversion updated to value:"+newid);;
        return true;

Searching of Record in GridFS

With the above logic of updation the record can be searched using the child information saved in the parent record.

 public static byte[] searchRecord(String id, String collectionName) 
                                          throws RepositoryConnectionException{
        mylogger.debug("id="+id+" collectionName="+collectionName);
        DBCollection dbCollection=getMongoDBConnection(collectionName);
        DB db = dbCollection.getDB();
        GridFS myFS = new GridFS(db,collectionName);
        GridFSDBFile gridFSDBFile = myFS.find(new ObjectId(id));
        String newid=(String)gridFSDBFile.get("updatedversion");
        if(newid!=null && !"".equals(newid))
            gridFSDBFile = myFS.find(new ObjectId(newid));
        InputStream in = gridFSDBFile.getInputStream();
        byte[] bytes = IOUtils.toByteArray(in);
        return bytes;
    }catch(IllegalArgumentException e){
        mylogger.error("UNABLE TO SEARCH RECORD!!!",e);
        return null;
    } catch (Exception e) {
        mylogger.error("UNABLE TO SEARCH RECORD!!!",e);
        return null;

1 comment:

sumit.bhatt said...

good helps me a lot in understanding mongo...thanks