Wednesday, June 23, 2010

Taking advantage of impulse purchases

At Hip Digital, we manage lots (and lots) of music stores. One thing that I've come to believe is that we are missing out on lots (and lots) of purchases because we don't capture the impulse when it hits our users. Currently, when a user wants to buy music they need to go through the tried-and-true add to cart -> checkout -> enter billing details -> confirm -> download process. While the process is very familiar it is not particularly quick so we lose lots (and lots) of buyers at each step. Based on attrition stats (about 50% bail at each step), it appears that only a fraction of users still feel the impulse by the time they download the music. That means we are missing out on revenue!

As a way to combat attrition, we looked at moving to a 'one-click' model similar to some other vendors out there. The ideal being a three-step click -> confirm -> download process. We wanted to keep the 'confirm' step so that people didn't accidentally buy something but otherwise we wanted music in the user's hands as quickly as possible. Technically, this is actually not that hard to implement. We need to set up some kind of registration process (with prompt to register if they have not already done so when they click to buy) which collects personal information and billing information. We then store that info permanently and reference it when the user has the impulse to buy. In this way the user avoids having to re-enter their information every time. Naturally, if we store all that info we also need to allow the user to manage it so some kind of account management interface is also needed.

What else? Well, we then change all of our 'add to cart' buttons to something like 'buy now' or 'download now' (the jury's out on which is better psychologically). When the user clicks the button they are prompted to login (once per visit)/register (once ever)/confirm (every time) instead of starting down the old process. Ideally, the prompt is some kind of pop-up so the page context is maintained. After confirmation, the pop-up presents the download link (or download manager for multi-track purchases like albums) and the user's impulse is satisfied!

So, why haven't we already done this since we've already figured it all out? Aside from some some possible patent issues (I can't believe someone can actually patent a 'one-click purchase' but that's another story) the main stumbling block is transaction costs. Yes, that's right. The cost to process a transaction is so big that we hope a user waits until they have a full shopping cart before buying. Our take on a $0.99 tracks is quite small (labels and publishers get most of it) so even a $0.25 transaction fee is deadly to our bottom line if they only buy one track. It's not so bad if they have $5 or $10 worth of music in their cart though. Thankfully, PayPal recently released a new micropayment transaction plan that goes along way to helping us sell individual tracks. It prices transactions at $0.05 + 5% which brings a $0.99 track transaction down to around $0.10 from close to $0.29.

Now suddenly the idea of capturing those impulse purchases is starting to make financial sense! Further analysis comparing the potential lost impulse revenue vs. the higher accumulated transaction costs for more, smaller transactions is needed but my gut tells me that we'd likely come out on top if we caught more sales. I'll try and post some of our analysis as a follow-up if I can.

UPDATE: I had this post in my drafts waiting to go when I read a compelling post by another blogger today. The post was titled How Paypal can help save media - and itself and it makes a very solid argument for further refining the one-click purchase process I described above by removing the need to register or even log in! PayPal could offer a cross-site solution similar to Facebook's 'Like' buttons that would provide safe, secure, convenient purchases. How great would that be!

Let's just hope the transaction prices are reasonable...


Thursday, June 03, 2010

Heavy Reads and Heavy Writes

One of the interesting challenges we have here at Hip Digital is managing our vast catalog of metadata. We have millions and millions of data points that all need to be collected, indexed, and ultimately made available for browsing and searching. We've spent a lot of time improving our technology and processes with some notable successes. For example, switching from SQL Server to SOLR to power our searches was a huge win with average search time dropping from 15 seconds to less than two (even under load) AND with search relevancy improving tenfold. However, what I wanted to write about today was how we turned our queries upside down and opened up a new world of performance and granularity that previously escaped us.

So, what do I mean when I say we turned our queries 'upside down'? Traditionally, with relational databases SELECT statements with JOINS and WHERE clauses are used to filter and locate the correct records. This works great for medium to small data sets with medium to small complexity on granularity (i.e. WHERE x = y) but, in our experience, starts to fail spectacularly when you move to large data sets with high granularity. Here's our story of how we evolved beyond the middling to the big time.

First, let's set the stage. Our metadata database is composed of the following entity tiers:
  1. Labels - a handful
  2. Sub Labels - thousands
  3. Artists - tens of thousands
  4. Albums - hundreds of thousands
  5. Tracks - millions
All are loosely related to each other in descending order of how they are listed so filtering higher in the order will filter out more records than filtering lower down. For example, if we filter at the Label-level we can eliminate vast swaths of the catalog but if we filter at the album-level we are only filtering the few tracks that below to that album. Ok, now we're all on the same page for structure so let's talk about how we go about filtering efficiently even at the lowest tier.

Filtering at the uppermost tier is quite easy and efficient just using good old fashioned WHERE clauses since there are only a few Labels and SQL is pretty good at handling two or three WHERE conditions even with fairly sizable tables. However, it completely falls apart if we were trying to filter at a lower level of, say, the middle with Artists. If we wanted to launch a music store with a small number of artists we could use the WHERE approach but what if we had 200 artists to include while excluding all the others? The initial approach would be to create a big WHERE IN clause with a big comma-delimited list of artists. This might work for a very low traffic situation but wouldn't handle scale very well. Besides, what if we then wanted to add another 100 artists? The query duration would continue to increase dramatically with every additional condition. Even getting creative with stored procedures, views, functions, etc. would not save us.

So what's the solution to adding more conditions without increasing query time dramatically? The solution is to embed filter information within the record itself. What do I mean by this? Well, we add a new column to all the entity tables with a value that describes in which music stores the content is available. Yes, we are talking about maintaining millions and millions of records - one for each entity. It's a big job, no doubt, since we release music stores constantly and for each we need to update all of the entities. Clearly, careful management is required but now we are doing most of the work up front before a user hits our site as opposed to when a user is browsing catalog and expecting the site to refresh instantly.

We have found that bitwise operations are extremely fast for this type of approach but have limitations with a maximum of 32 possible combinations which might be ok for some but at any given time we have over 100 live sites so we've developed an approach that uses a string value to represent larger numbers which are then translated into operable values. There's probably room for optimization here which we will look to include in a future iteration.

By embedding filter data within the record all of our queries' WHERE conditions are now very short and consistent no matter what level of filtering may be required for a given site; whether it's 100 or 500 artists the WHERE clause is exactly the same. This means that SQL sees fairly flat query durations across the board with all tiers of entity filtering.

The real beauty of this approach is that we can now filter at even the Track-level with an amazingly fine-grain control over what content is available on which site. Track-level filtering would have been impossible previously.

Now we have a solution that optimizes filter queries across the enterprise as well as opening up new opportunities for filtering that did not exist before. THAT's the power of heavy writes and I encourage you to investigate them for yourselves.