Fixed schema: RDBMS have nice goodness (referential integrity, relationships, triggers...) but force you to store any object to a fixed schema (migrations are a pain!)
Basically there are several different kinds of NoSQL database:
Key/Value (Scalaris, Tokio Cabinet, Voldemort): store data in key/value pairs: very efficient for performance and higly scalable, but difficult to query and to implement real world problems
Tabular (Cassandra, HBase, Hypertable, Google BigTable): store data in tabular structures, but columns may vary in time and each row may have only a subset of the columns
Document Oriented (CouchDb, MongoDb, Riak, Amazon SimpleDb): like Key/Value but they let you nest more values for a key. This is a nice paradigm for programmers as it becomes easy, specially with script languages (Python, Ruby, PHP...), to implement a one to one mapping relation between the code objects and the objects (documents) in the database
Graph (Neo4J, InfoGrid, AllegroGraph): stores objects and relationships in nodes and edges of a graph. For situations that fit this model, like hierarchical data, this solution can be much much faster than the other ones
MongoDb is a document oriented NoSQL database. With such a database it is very easy to map the programming objects (documents) we want to store to the database. JSON is a very viable standard to do this mapping, and so MongoDb does: it stores JSON documents in the database.
To makes performance better JSON is stored by MongoDb in a efficient binary format called BSON. BSON is a binary serialization of JSON-like documents and stands for Binary JSON.
For getting more information on the topic I highly recommend reading this interesting article: NoSQL Ecosystem
Can I use a NoSQL database to store GIS data?
In the last months I have progressively been getting interested in this topic and decided to test a way to store and read GIS data in a NoSQL database.
Managing GIS data with NoSQL in circumstances where performances and scalability are a major issue could be the way for the win.
To do so I have decided to use MongoDb, because of its nice Python API and its rich query expressions syntax.
But note that it should be very easy to reproduce my experiment with other NoSQL databases.
To manage GIS data it was not difficult to decide to use GDAL/OGR, the popular (and beauty!) FOSS toolkit for GIS developers.
I am not sure if documenting my experiments may be useful for anyone, but I have decided to assemble this tutorial, at least for documenting my experience if later I will need to consider for production a NoSQL technology.
Please note that there have been already several attempts to manage GIS data with NoSQL database (and surely I am missing some other ones so please feel free to add a comment or email me about this):
For using the code in this tutorial I have used this shapefile: Census usa counties 2000.
Download it if you want to follow this sample step by step.
Now it is time to run some code. Just copy and paste in a file the following scripts and execute it:
code is deeply commented, so understanding what is going on should be simple.
for accessing the MongoDB entities I am using the PyMongo API
for accessing the shapefile features I am using the OGR API
to import a shapefile to a MongoDB collection, I iterate using OGR all the features of the shapefile and I copy them in the MongoDb collection. Note that I use a query filter (in a SQL form) if I do not want to copy all the features
to export a shapefile from a MongoDb collection, I iterate using PyMongo all the documents of the database and I copy them in a new shapefile. If I do not want to export to the shapefile the whole collection I may provide a query filter (in a NoSQL form)
finally I provide some query on the features stored in MongoDb