Azure Table Storage – Hello World to Transaction in Azure Table Storage

“We need a database therefore we need SQL Server.” How many times have you heard this? This is so often the default position for people wanting to persist data on the server and as the old adage goes, this is the hammer to every database requirement which then becomes the nail. SQL Server has simply become “the standard” for many people.

SQL Server is a beast. It does a lot. It’s awesome at what it does, but it always does more than I actually need. Azure Table Storage is simple, at least relatively speaking. You have a table, it’s partitioned, it has rows. You put stuff into that table then you query is back out, usually by referencing the partition and row keys.

One of the things that attracted me to Table Storage is that it’s not constrained to a server or a VM or any logical construct that’s governed by finite resources (at least not “finite” within a reasonable definition), rather it’s a service. You don’t pay for CPU and RAM or put it on a particular server, you pay for the number of transactions and the amount of storage you need.

So what’s the “bad” bit? It’s something new to learn, a “foreign entity” to most people, if you like. It has its own idiosyncrasies, quite different ways of working with data and interacting with the system and you can’t SELECT * FROM Foo. But it’s easy and it is massively impressive and that’s what I want to walk you through here now.

Blobs, queues and tables

In days gone by, binary data loaded into a website (such as images in a content management system) would be stored either on the file system in IIS or within SQL Server. Both options cause problems of different natures and varying degrees. In Azure, you’d load it into blob storage and that’s one of the storage types available. There are other uses for blob storage as well but there’s no need to go into that here.

Another storage type is “queues” which are essentially just message queues. They’re very awesome and serve a very valuable role in providing asynchronicity between system components. Again, won’t go into it here but read up onHow to use the Queue Storage Service if it sounds interesting.

And finally there’s Table Storage which is obviously the focus of this post. The main point I wanted to make was that these three paradigms all exist within each and every storage account whether you elect to use them or not.

One storage account can have many tables.

Each table has many partitions.

Each partition has many rows.

Each row has a partition key, a row key and a timestamp.

You store an entity in the row that inherits from Microsoft.WindowsAzure.Storage.Table.TableEntity.

You create a partition and a row by inserting an entity with a partition key and a row key.

You retrieve a row by searching with the partition and row key and casting the result back to the entity type.

That it in a nutshell and it’s not even the simplistic version – that’s just how it works. There are features and nuances and other things you can do with it but that’s the bulk of it and it’s all I really needed in order to write this post.

Conceptually, partitions are not that different to the alphabetised sections of a phonebook (remember those paper ones they used to dump on your doorstep?) Rather than just chucking all the numbers randomly into the book, they’re nicely distributed into 26 different “partitions” (A-Z) then of course each person is neatly ordered alphabetically within there. It does wonderful things in terms of the time is takes to actually find what you need.

In Azure Table Storage, partitions can have a big impact on performance.

Let’s do the “Hello World” of Table Storage using the mentioned data structure.

Now we’ll use that access key and whack it into a connection string and it looks just like this:

<connectionStrings>

</connectionStrings>

Just like a SQL connection string (kinda). Note the endpoint protocol – you can go HTTP or HTTPS. Clearly the secure option is preferable.

Into the code itself, we begin by getting a reference to the storage instance:

var connString = ConfigurationManager.ConnectionStrings["StorageConnectionString"].ConnectionString;

var storageAccount = CloudStorageAccount.Parse(connString);

Now we’ll get a reference to the table within that storage instance and if it doesn’t already exist, we’ll just create it (obviously just a first-run thing):

var tableClient = storageAccount.CreateCloudTableClient();

var table = tableClient.GetTableReference("BreachedAccount");

table.CreateIfNotExists();

Before we can start chucking stuff in there, we need that entity I mentioned earlier so let’s create a BreachedAccount that inherits from TableEntity.

public class BreachedAccount : TableEntity

{

public BreachedAccount() { }

public string Websites { get; set; }

}

Notice how we haven’t created partition or row keys? They’re inherited from the TableEntity and what it means is that we can now do this:

var breachedAccount = new BreachedAccount

{

PartitionKey = "bar.com",

RowKey = "foo",

Websites = "Adobe;Stratfor"

};

var insertOperation = TableOperation.Insert(breachedAccount);

table.Execute(insertOperation);

If we were to add another attribute to the BreachedAccount entity then we’ll see that too even though we already have data there conforming to a different schema. That’s the neat thing about many NoSQL database implementations in that you’re not constrained to a single schema within the one container.

Now if I want to insert 154 Million records I don’t really want to do 154 million individual inserts as each connection does have some overhead. What I want to do is batch it and that looks more like this:

var batch = new TableBatchOperation();

batch.Insert(breachedAccount1);

batch.Insert(breachedAccount2);

batch.Insert(breachedAccount3);

table.ExecuteBatch(batch);

Batching is about more than just committing a bunch of rows at one time, it also has an impact on cost. Remember how Azure Table Storage charges you $0.0000001 per “transaction”? I want to bring that cost down a little and I can do this by batching because a batch is one transaction. However, there are some caveats.

Firstly, you can only batch records into the same partition. I can’t insert foo@bar.com and foo@fizz.com within the same batch. However, I can insert foo@bar.com and buzz@bar.com at the same time as I’m using the domain as the partition key. What this meant is that when I wrote the code to process the records I had to sort the data by domainso that I could keep all the records for the partition together and batch them. This makes sense in the context of Julie’s earlier comment about the partition being tied to a machine somewhere.

Secondly, you can only batch up to 100 rows at a time. Those 32 million Hotmail addresses? That’s 320,000 batches thank you very much. This meant my importer needed to not only enumerate through accounts ordered by domain, but each time it had a collection of 100 it needed it commit them before moving on. Per the previous point, it obviously also had to commit the batch as soon as it got to the next domain as it couldn’t commit to multiple partitions in the one batch.

Ok, so we’ve got data in the system, but that’s just the start. Of course it’s also the hardest bit so that’s good, let’s now pull records back out. Obviously I’ve designed the whole thing to be ultra fast in terms of reading data based on the email address. Remember that this is what I’ve compromised the partition and row keys out of.

It’s pretty much the same deal as earlier in terms of needing a storage account object and then a table client after which you can just invoke the “Retrieve” method and pass it the partition key (the domain) and the row key (the alias):

var retrieveOperation = TableOperation.Retrieve<BreachedAccount>("bar.com", "foo");

var retrievedResult = table.Execute(retrieveOperation);

var breachedAccount = (BreachedAccount)retrievedResult.Result;

Accessing File Stored in Windows Azure Blob Storage Using jQuery

Did you know it was possible to access the Windows Azure Blob Storage directly from JavaScript, for example using jQuery? At first, it sounds obvious, since Blobs are after all accessible from a public UR. But in practice, there is a very big hurdle: the Web browser’s Same Origine Policy or SOP, that restricts JavaScript code to accessing resources originating from the same site the script was loaded from. This means that you will never be able to load a Windows Azure Blob using XMLHttpRequest for example! Fortunately, there is a popular workaround called JSONP (“JSON with Padding”). The idea behind this technique is that the script tag is not submitted to the SOP: an HTML page can thus load a JavaScript file from any site. So, if you expose your data in an “executable” form in JavaScript, a page will be able to load this data using a script tag. For example: <script type=”text/javascript” src=”http://www.sandeepknarware.in/exemple.jsonp”> </script> But how can ...

Dot Net Deewane

Search This Blog

Azure Table Storage – Hello World to Transaction in Azure Table Storage

Labels

Comments

Post a Comment

Popular posts from this blog

Accessing File Stored in Windows Azure Blob Storage Using jQuery

gcAllowVeryLargeObjects Element

Support for debugging lambda expressions with Visual Studio 2015