DataTables Are Useful in SharePoint – So Why Not Use Them?

dataIf you need to query a data source many times in order to fulfill your requirements, you may consider storing part of that source in the memory of the server that is executing your business logic, and query the cached data instead of the database. This will obviously put an extra load on the server, but may prove beneficial system-wide, especially if you have multiple servers dealing with business logic.

Let me start with a truism and state that if you want to query SharePoint, the query should be run directly on SQL server. After all it’s designed to deal with large volumes of data using all kinds of high performance algorithms. That’s why running a CAML query, i.e. the most widely used kind of query in SharePoint, is de facto equivalent to running a native SQL query in the database. Plus, the general idea of having a whole server farm is that each server can specialize in a certain role: database retrieves data and front-ends handle client requests. It’s an orderly system where everything has its place, and usually it’s best to leave it like that.

Real life, being real life, usually provides surprises – or at least users with requirements that will poorly fit your otherwise perfectly designed data structure, made in different times and for different reasons. In a dynamic business environment it’s practically unavoidable, and it’s something we, developers, have to deal with. Problem is, depending on the specific requirements, an ill-fitting piece in your system can upset it entirely, that is – for everyone. The most obvious solution – and usually the most costly – is to redo your existing system so that the ill-fitting part will be a bit less ill-fitting.

That’s probably not the most appealing thought. In the case of an established system with years of life behind it, the sheer workload and potential risk for pretty much everything going wrong at some point is usually enough for decision makers to banish the idea right at the start. So there may be no choice besides trying to somehow fit the odd part into the bigger whole.

Even if the odd part has the potential to blow up the bigger whole entirely.

So how could you go about fitting said odd part that goes against set design? I would like to propose a certain approach that is often forgotten by SharePoint developers focused on using SharePoint features and missing a broader ASP.NET arsenal – an arsenal that’s still at their disposal, after all.

Let’s start with the following statement: if you need to query a data source many times in order to fulfill your requirements, you may consider storing part of that source in the memory of the server that is executing your business logic, and query the cached data instead of the database. This will obviously put an extra load on the server, but may prove beneficial system-wide, especially if you have multiple servers dealing with business logic (WFEs, for instance).

An example could be a fancy dashboard used by one department of your organization. As dashboards go, they can have different way of presenting the same data on one page, often juxtaposing data subsets in various ways allowing users to see it from different angles. A single dashboard, depending on its complexity, could very well require dozen or even several dozen queries on the data source in order to get everything it needs. If they aren’t too complex and don’t involve much data, it shouldn’t be a problem, assuming your database server is not underpowered and/or taxed with handling barrage of requests. But what if your database server is stressed? And, to make things even more interesting, the query you need to use would be throttled because of the list view threshold?

You can, obviously, force SharePoint to execute the query anyway, threshold or no threshold. Just set the SPQuery.QueryThrottleMode to SPQueryThrottleOption.Override and run the code in elevated context, like this:

SPSecurity.RunWithElevatedPrivileges(delegate()
{
using (SPSite site = new SPSite("http://MySharePointPortal"))
{
using (SPWeb web = site.OpenWeb("MySharePointSite"))
{
SPQuery query = new SPQuery();
query.Query = "";
query.QueryThrottleMode = SPQueryThrottleOption.Override;

SPList list = web.GetList("/sites/MySharePointSite/Lists/MySharePointList");
SPListItemCollection results = list.GetItems(query);
}
}
});

(Note that even with QueryThrottleMode set to SPQueryThrottleOption.Override you have to execute it in the context of an account having Full Read or Full Control permissions to the Web Application in question – the query will then execute with the limit for Auditors and Administrators instead of the standard List View Threshold. Executing the query within SPSecurity.RunWithElevatedPrivileges block is the most obvious way, since it’s then executed in the context of the System Account, which also has the benefit of being non-personal, but in your case it may be better to use some other account.)

Obviously the approach requiring you to use SPQueryThrottleOption.Override is definitely not the best, but in reality it might be the only one viable when considering other aspects, like the time needed to redesign the whole system from data structure and upwards, and the inherent risk it would involve. But if our fancy dashboard requires us to query the same big list 10, 20 or even more times in order to extract various subsets of data, and each time we’ll hit the List View Threshold limit which will force SQL Server to lock the entire table (i.e. the entire SharePoint content in the specific database), it may prove to be, well, quite disastrous.

Instead, if your WFEs have power to burn, you might consider moving larger amounts of data into their memory, and do all the querying therein.

Let’s use a simple sales portal as an example. Let’s assume our main focus would be an Orders list, containing a whole lot of information about orders our Sales department makes, which need to be analyzed continuously. As one might imagine, the list is quite extensive – result of years of good work on the Sales department’s part. Yet even though it’s only text data,  the sheer number of records registered makes working with it a challenge, as it nearly always hits the List View Threshold.

Let’s also assume that apart from the Orders list we’d need to include data on our company’s product portfolio, its customers and delivery partners that are involved in delivering said products to said customers.

Now let’s consider the following chunk of code:

SPSecurity.RunWithElevatedPrivileges(delegate()

{

using (SPSite site = new SPSite("http://MySharePointPortal"))

{

using (SPWeb web = site.OpenWeb("MySharePointSite"))

{

SPList lstOrders = web.GetList("/sites/SalesPortal/Lists/Orders");

SPList lstProducts = web.GetList("/sites/SalesPortal/Lists/Products");

SPList lstCustomers = web.GetList("/sites/SalesPortal/Lists/Customers");

SPList lstDeliveryPartners = web.GetList("/sites/SalesPortal/Lists/DeliveryPartners");

 

DataTable dtOrders = lstOrders.Items.GetDataTable();

DataTable dtProducts = lstProducts.Items.GetDataTable();

DataTable dtCustomers = lstCustomers.Items.GetDataTable();

DataTable dtDeliveryPartners = lstDeliveryPartners.Items.GetDataTable();

}

}

});

Doesn’t look like much, but because we retrieve data from several lists at once and then store it in the server’s memory, we don’t need to deal with database anymore. We can do all the querying in memory. Plus, we now can use all the versatility of ADO.NET, instead of the somewhat limited CAML language. Want to use LINQ to join two tables, even on non-lookup fields? Nothing simpler:

var OrdersDeliveryPartners = from orders in dtOrders.AsEnumerable()

join deliverypartners in dtDeliveryPartners.AsEnumerable()

on orders.Field<int>("DeliveryPartnerID") equals deliverypartners.Field<int>("ID")

select new

{

OrderNumber = orders.Field<string>("Number"),

OrderValue = orders.Field<decimal>("Value"),

DeliveryPartnerName = deliverypartners.Field<string>("Name"),

DeliveryPartnerAddress = deliverypartners.Field<string>("Address")

};

We can use the standard DataTable.Select method as well:

DataRow[] result = dtOrders.Select("Value >= 10000");

…and so on. And, since we’re using standard DataTable objects, we can easily bind them to pretty much any .NET data presentation control. Plus, no matter how much we query, we won’t ever hit the dreaded List View Threshold – that’s because we don’t deal with database anymore.

Ok, so if everything’s so rosy, what’s the catch? Well, obviously the workload this approach puts on WFEs (or other servers processing your business logic). The whole data retrieved from database has to be stored in memory (so if your WFEs utilize most of their RAM, this may not be viable solution for you), then CPU has to get involved in processing that data in order to find what you need. And ADO.NET is not as efficient in handling large data stores as SQL Server.

Yet the approach can still be useful in lots of scenarios, especially if you refine it a bit. For starters, we may trim the initial volume of data by focusing only on a subset of the whole. For instance, we may not need every order in the Orders list for our dashboard, since the Sales department is only interested in orders processed in the last 60 days. We can include appropriate Where clause in the initial CAML query to retrieve only relevant data:

string date = SPUtility.CreateISO8601DateTimeFromSystemDateTime(

DateTime.UtcNow.Subtract(

TimeSpan.FromDays(60)));

 

SPQuery queryOrders = new SPQuery();

queryOrders.Query = string.Format(@"

<Where>

<Geq>

<FieldRef Name=’Modified’ />

<Value Type=’DateTime’>{0}</Value>

</Geq>

</Where>", date);

queryOrders.QueryThrottleMode = SPQueryThrottleOption.Override;

DataTable dtOrders = lstOrders.GetItems(queryOrders).GetDataTable();

And how dynamic the dashboard should be in the first place? Does the Sales department require the data update as soon as it is changed on the list, or are they interested in a general picture with, let’s say, 24-hours window of precision? In the latter case we can cache retrieved data with appropriate time expiration policy in place, so that we need to do all the retrieval only once every 24 hours:

Cache.Insert("Orders", dtOrders, null, DateTime.UtcNow.AddHours(24), Cache.NoSlidingExpiration);

Cache.Insert("Products", dtProducts, null, DateTime.UtcNow.AddHours(24), Cache.NoSlidingExpiration);

Cache.Insert("Customers", dtCustomers, null, DateTime.UtcNow.AddHours(24), Cache.NoSlidingExpiration);

Cache.Insert("DeliveryPartners", dtDeliveryPartners, null, DateTime.UtcNow.AddHours(24), Cache.NoSlidingExpiration);

If you could cache data for even 15 minutes, it could significantly decrease the workload put on your database, especially if users have tendency to load the dashboard in their normal work. In the scope of a whole department even the 15 minute caching timespan could mean hundreds work-heavy queries less. Which, I believe, is good enough reason to at least give this idea a good thought.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s