Thursday, October 25, 2007

Managing an Agile team

Wow, it's been a while. Hopefully, I can get back into the swing of posting now that things have calmed down a little at the new job.

Speaking of the new job...

One of the biggest challenges I face daily is finding a practical methodology for executing on 'Agile' projects. I'm not talking about whether I choose SCRUM or Extreme or some other developer-focused methodology but rather a link between the 'project' side of software development and the 'code' side. Most of my reading and experience with Agile leans heavily toward the developers and writing code. However, there is much more to a software project than writing code. How does a Project Manager manage an agile project? How does he monitor progess and stay on budget?

I mean to dive into this over the next few months and I'll be marking my findings here.

Monday, May 07, 2007

The Siren Call of Web Services

Most of the solutions that I've worked on over the last few years have implemented a Service Oriented Architecture (SOA) and generally use web services to implement the services. Usually, the web services have worked out great but occasionally they have failed spectacularly. Why have they failed? Read on to find out.

The "Siren call"
The great thing about web services is that they are really easy to create and (i.e. web service project template) and just as easy to consume (i.e. web reference) due to tight integration in Visual Studio. They are so easy to use that even new developers can be productive in a short period of time.
Web Services also make traversing application or network boundaries easy since constructing a SOAP call is easy (thanks to Visual Studio) and HTTP over port 80 will get you past most firewalls.

The "Danger"
The thing to keep in mind is that web services are slow. The ease of development and simple usability mean that we pay a price in performance.
Any call to a web service needs to be translated into SOAP, sent over the wire as wordy HTTP, translated from SOAP, service logic executed, translated back into SOAP, sent back as wordy HTTP, and finally translated back from SOAP. This may be a simplification of the actual process but you get the idea. There is a lot of activity around even invoking a simple "hello world" web service.

The "Fate"
At best, a slow web service means that your speedy application needs to wait while the service is invoked. Most of the time this is the only problem you'll have and it can be managed in a variety of ways (i.e. caching, asynchronous calls, etc.).
At worst, a slow web service could take down your beautiful solution and duplicate or lose important data.

How could a web service do so much damage?

It generally comes down to threads. Specifically, not having any available threads to handle another service request. I'm not claiming to know exactly how the threading works but every call to a web service requires threads from IIS and .NET. These technologies maintain a pool of threads that service requests and then wait for the next one.
Normally, a request can be serviced quite quickly and the thread is freed quickly. However, under high load a thread might not free up in time to service the next request so another thread is needed to handle it. As requests hit the server the threads are allocated and freed as quickly as possible but under high load requests can 'pile up' until eventually there are no threads left to service the next request. When this happens you will begin to see longer delays in service response as the caller waits for the service to answer while the service waits for a thread to free up. Around this time you will likely start to see "request timed out" errors.
If the duress continues and requests continue to arrive then the server will ultimately give up and start throwing IIS and/or .NET exceptions such as "There were not enough free threads in the ThreadPool object to complete the operation."
It is at these times when I have noticed unexpected conditions to arise that can create unpredictable results. If the caller gives up on the service (timeout expires) you would expect the service to perish but there are certain boundary conditions by which a service can continue to execute even though the client has given up on it. The service may not fully complete executing but even a few lines can cause problems. We have seen duplicated rows inserted into a database due to this problem when the client retries the service thinking the service has timed out but the service continued to execute.

The "Safety Net"
The first thing to do when working with web services is have a good understanding of the maximum load that service will have to bear. Plan for that load and test the crap out of it before going live. Hit that service as hard as you can for as long as you can before you trust it with your data. I've seen very innocuous web services cause absolute chaos under extreme duress so don't assume that it will 'just work'.

The second thing is to actively work to reduce the execution time of that service to the smallest time possible. Threads are only used during execution so if the service executes quickly then it will free that thread up quickly.

Lastly, think about farming out your web servers. There are a multitude of configuration settings for IIS and .NET to tune threading but all of these settings will ultimately fail under extreme load. It might just die later but it will still die eventually if the traffic is extreme enough. The best way to give your application more threads to work with is to use multiple servers. Usually, these would be physical web servers but more commonly farms of virtual servers are being used quite effectively.

I hope you excuse my crude metaphor to hapless sailors but perhaps my little brain dump helps you plan your web services better in the future and avoid some of the pain I've experienced.

Wednesday, February 21, 2007

Avoid Deadlocks using the SQL Adapter and BizTalk Server 2006

As applications scale it is often necessary to add additional servers in order to take advantage of the power of parallel processing. For BizTalk solutions this means that there can be two or more BizTalk hosts running identical solutions and all sharing the same resources. BizTalk does a good job of managing multiple hosts internally (i.e. maintaining unique identifiers) but sharing external resources (i.e. a database) needs to be handled more carefully. This article describes a strategy for maximizing efficiency and minimizing conflict between multiple BizTalk Hosts communicating with a single database.

The primary reason why you would want to use the following approach is to avoid errors like this:
"Transaction (Process ID 56) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. Error Number : -2147467259."

The approach described in this article follows the pattern where two (or more) BTS servers are polling a single database using identical SQL Adapter port configurations but this approach also applies to a single BTS server where deadlocks are an issue under load.


The following goals were used to define how this strategy should behave:

  1. No duplicate records
  2. Maximize use of all BTS instances
  3. No SQL locking conflicts
  4. Acquire data as quickly as possible

The first three requirements can be handled by a carefully crafted stored procedure (see section 1.3) while the fourth is controlled by BizTalk. In the receive port properties set Poll While Data Found to True. This will make BizTalk keep reading the database until there are no more records to process. If you do not set this option then the solution will not scale to handle larger volumes.

The Aquisition Stored Procedure

Due to the fine-grain control that stored procedures provide it is strongly recommended that they be used over a direct SQL query. Especially since a direct query will not allow you to manage SQL locking which can be problematic with multiple BTS hosts competing for the same resource. The basic steps of the acquisition procedure are (complete source code available in section 1.5):

  1. Set the Transaction Isolation Level
  2. Acquire unique identifier of the record
  3. Mark the record as processed
  4. Return the contents of the record

Set Transaction Isolation Level

By default BizTalk wraps all calls made with the SQL Adapter in a transaction with an isolation level of SERIALIZABLE. This level is much more aggressive with resource locking than the default level of READ COMMITTED. As a result it is important to specify the Transaction Isolation Level for the transaction within the procedure. In the sample provided the transaction level of READ COMMITTED is.

In order to maximize concurrency while eliminating conflict it is often necessary to further refine resource locking by the use SQL locking. Locking hints provide a fine degree of control over how SQL handles the visibility/availability of records to other transactions.

It’s worth noting that there is more than one way to approach the way that SQL locking is handled and that a REPEATABLE READ solution with some modifications to the locking hints could also theoretically work. Although, since the sample solution discussed in this document has been implemented and tested in a real-world application so it is the recommended solution.

Acquire Unique Identifier

The first step is to get the unique identifier of the record to process. Generally, this is a select statement with any WHERE or JOIN expressions necessary to return a single row’s identifier where the processed flag is not set (see the next step details). In the sample provided TOP and ROWCOUNT are also used to insure a single row result. It is important to note that the acquisition procedure will only acquire and process a single row. It is left up to the BTS receive port to poll the database and control execution of the acquisition procedure to consume multiple rows.

Two SQL locking hints are used to lock the row (UPDLOCK) and read past (READPAST) any existing locks. The row is locked at this point (as early as possible) to prevent another SQL instance from also acquiring it for the lifetime of the transaction. The read past hint is used to skip past any existing locks (e.g. created by other instances of this procedure) without stopping and waiting for the lock to clear which is the default behavior.

It is recommended that the number of JOINS be minimized so as to reduce the chance of locking a row in a table required by other processes. One possible solution is to populate a ‘Staging’ table and then consume rows from there.

Mark As Processed

Using the identifier acquired in the previous step, a processed flag is set to indicate that this record has been consumed by BizTalk and should not be included in future acquisition attempts. The query used to acquire a unique record identifier specifically avoids rows that have been processed so once this is set it will not be consumed again.

In the sample provided, another field called ProcessedOn is set at the same time to date stamp the time that the record was consumed by BizTalk. This field is not strictly required but can be handy for auditing and testing purposes.

Get Message Contents

The final step of the acquisition procedure is to return the contents of the row to the SQL Adapter. The sample provided uses SELECT * but this is only for demonstration purposes and is not recommended for real-world scenarios.

Since the stored procedure is designed to facilitate the BTS SQL Adapter it is important to include the FOR XML AUTO clause in the SELECT statement. This will return the results of the query in a form (XML) that BizTalk can consume.

Finally, when the transaction ends all locks are released.

SQL Locking Hints

There are a variety of specialized locking hints available in SQL Server 2000. Below is a summary of hints that are useful for sharing a single resource by multiple hosts (descriptions taken from SQL 2000 Books Online go there for more information).

READPAST

Specifies that locked rows are skipped (read past). For example, assume table T1 contains a single integer column with the values of 1, 2, 3, 4, 5. If transaction A changes the value of 3 to 8 but has not yet committed, a SELECT * FROM T1 (READPAST) yields values 1, 2, 4, 5. READPAST applies only to transactions operating at READ COMMITTED isolation and reads past only row-level locks. This lock hint is used primarily to implement a work queue on a SQL Server table.

ROWLOCK

Specifies that a shared row lock is taken when a single shared page or table lock is normally taken.

UPDLOCK

Specifies that update locks instead of shared locks are taken while reading the table, and that they are held until the end-of-statement or end-of-transaction.

READ UNCOMMITTED

Implements dirty read, or isolation level 0 locking, which means that no shared locks are issued and no exclusive locks are honored. When this option is set, it is possible to read uncommitted or dirty data; values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the four isolation levels.


REPEATABLE READ

Locks are placed on all data that is used in a query, preventing other users from updating the data, but new phantom rows can be inserted into the data set by another user and are included in later reads in the current transaction. Because concurrency is lower than the default isolation level, use this option only when necessary.

SERIALIZABLE

Places a range lock on the data set, preventing other users from updating or inserting rows into the data set until the transaction is complete. This is the most restrictive of the four isolation levels. Because concurrency is lower, use this option only when necessary. This option has the same effect as setting HOLDLOCK on all tables in all SELECT statements in a transaction.


Sample Stored Procedure

/*
This stored procedure efficiently consumes a single row for multiple BizTalk without deadlocks.
Receive Ports using the SQL Adapter.

Author: Adam.Bowron@Visiphor.com
Created: October 25, 2004
*/
CREATE Procedure Shared_BTS_Select
AS

BEGIN TRANSACTION

--Explicitly set the Transaction Isolation Level since the procedure relies on it
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
DECLARE @Record_ID integer
SET NOCOUNT ON
--Acquire a single row and lock it so that another concurrent query will ignore it.
--READPAST is used to skip locked rows without waiting for them to release.
--UPDLOCK is used to lock the row for the lifetime of the transaction
SET ROWCOUNT 1
SELECT
@Record_ID = Source_Table_ID FROM
Source_Table WITH (READPAST, UPDLOCK) WHERE
Processed = 0
ORDER BY
Last_Update_Dt ASC
SET ROWCOUNT 0
--Mark the row as processed so that it is ignored by future queries
UPDATE
Source_Table
SET
Processed = 1,
ProcessedOn = getdate()
WHERE
Source_Table_ID = @Record_ID
--Return the contents of the row in XML form to the SQL Adapter.
SELECT
*
FROM
Source_Table
WHERE
Source_Table_ID = @Record_ID FOR XML AUTO

COMMIT TRANSACTION
GO


References

SQL 2000 Books Online (transaction/locking hints are copied verbatim)

Thursday, February 01, 2007

"Best Before" Phase

There are many software development methodologies and all of them have variations on phases with a project. There are names like "Envisioning", "Planning", "Stabilization", etc. All good names with valid, serious, meanings...

At this point, I'd like to officially introduce another phase that lays on top of all the stodgy official phases. I call it the Best Before phase. This is generally near the end of the project when you start to notice that things in the fridge expire on or after your project is due to be completed. There's a funny emotional reaction to seeing that the expiry date on the carton of milk is a few days after your project is scheduled to complete. The reaction ranges from relief ("Thank god it's almost over") to outright fear ("I'll never get it done in time"). Either way, you know you're in the Best Before phase.

"I don't know phases"
- Doctor Evil

Friday, January 26, 2007

One-Way Web Services with BizTalk Orchestrations

Today I want to pass on a nifty trick that gives you the ability to invoke 'fire and forget' one-way web services within a BizTalk orchestration.

My problem was that I wanted the port to handle delivery (including retries) and not have the orchestration wait until the result of the send was successful or not. Normally, doing this with ports is generally quite easy since you just use a one-way send port. However, with web services there is actually a trick to getting your web service to behave as a one-way service.

If you follow the standard procedure for hooking up a web service in an Orchestration you will get a request/response logical port shape. You get this even if the webmethod is a 'void' method that does not return anything. You can try creating your own port type but that causes other problems with web services since you'll get errors like AssemblyName context property was not valid unless you add special code to set some context properties.

The best solution is to inform BizTalk that the web service is 'one-way' so that the configured port type reflects this pattern. The way to do this is quite easy. All you need to do is add a parameter to your web method called SoapDocumentMethod with a specifier indicating the message pattern.

Sample code looks like this:

[WebMethod]
[SoapDocumentMethod(OneWay=true)]
public void OneWayService ()
{
}

When you add a web reference to your Orchestration project BizTalk now configures the port type to be one-way and the port shape reflects this. It's easy if you know the trick!

I'd like to thanks Matt Milner for posting the solution on the microsoft.public.biztalk.orchestration newsgroup.