SQL Server 2005 Data Mining Page 1 asp:feature LANGUAGES:VB.NET SubscriptionServices ASP.NETVERSIONS: ALL Subscribe RenewSubscription asp.netNOWNewsletter SQL ChangeofAddress PayAnInvoice Server SubscriptionPackages asp.netPRO Articles 2005 AffiliateSpotlight 411asp.netDirectory Data NewProducts BookReviews ProductReviews Mining Opinion BackIssues CreateaWebCross- Reprints/E-prints Search sellApplication Downloads PremiumDownloads Informant ByRamanIyerandJesperLind ContactUs AdvertisewithUs Theconceptofcross-sellis WriteForUs familiartomostofus.What yourfriendlyneighborhood McDonald’ssalespersondoes LatestFeatures whenyouorderacheeseburger •NUnitASP isexactlywhatAmazon.comor •UndertheHood buy.comaredoingonlinewhen youadditemstoyourshopping •DynamicDropdowns cartandyougetalistofother •ValidateUser-entered itemsyoumightalsolike. Data Youcanaddthisfunctionalityto •TotalRecall yourASP.NETpageby ArticleRating employingthepowerofdata mining,usingsimpleSQL-like Ratethisarticleona queriestoproducehigh-quality scalefrom0to5 recommendations.Microsoft nmlkj 5Best SQLServer2005,currently nmlkj 4 availabletoover200,000 MSDNsubscribersinBeta2, nmlkj 3 includesadvanceddatamining nmlkj 2 capabilitiesthatareavailable nmlkj 1 programmaticallyviastandard interfaceslikeADO.NET.This nmlkj 0Worst willallowyoutointegratecross- sellintoyourWebstore Submit applicationwithminimaleffort. Email BeforedevelopingaWeb cross-sellapplication,weneed tobuildtheserver-side intelligencethatwillenablethe Tellafriend applicationtocomeupwith aboutthisarticle! smartproduct recommendations.This processinvolves: Preparingthedatayoualreadyhaveaboutpastcustomers formining; Designingaminingmodelforthepurposeofmaking recommendationstonewcustomers; DeployingthemodeltoAnalysisServerandtrainingitwith thedatapreparedearlier;and SettingupsecuritytoallowASP.NETtoquerythetrained model. http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 2 Thefirstpartofthisarticlewillexplainkeyconceptsand walkyouthroughtheaboveprocessusingtheSQL Server2005DataMiningtools.Inthelatterhalfwe’ll demonstratetheeasewithwhichyoucanintegratethe predictivepoweroftheminingmodelintoyourASP.NET Webapplicationusingstraightforwarddatabaseaccess code. BeforeYouStart InstallMicrosoftSQLServer2005AnalysisServicesBeta 2.Thiswillsetuptheservercomponentsaswellasthe designandmanagementtoolsthatwe’lluseinthisarticle: BusinessIntelligenceStudioandSQLServer ManagementStudio,respectively.Wealsorecommend yougooverthedataminingtutorialincludedwithBeta2. Note:We’llrefertoyourrunninginstanceofSQLServer 2005AnalysisServicesas“AnalysisServer”intherestof thearticle.Theserviceshowsupas“AnalysisServices (MSSQLSERVER)”underServicesinAdministrative Tools. MiningYourCustomerPurchaseData Hereweoutlinetheprocessofbuildingtheback-end frameworkthatminesyourhistoricalcustomermovie purchasetransactionsandextractstheknowledgeneeded tomakecross-sellrecommendationstonewcustomers. Thisknowledgeisembeddedintheminingmodelthat we’lldesign. Datapreparationisanimportantaspectofanydata miningprocess.WithSQLServerDataMining,however, itisalsopossibletominethetransactiondatainyour relationaldatabasedirectly.Forsimplicity,we’lltakethis approachandassumethatasingletransactiontable containsyourcustomerpurchaseinformation,asshown inFigure1. Figure1:Moviepurchasedatausedbysampleapplication. TheAnalysisServicesprojectinSQLServer2005 BusinessIntelligenceDevelopmentStudioprovidesthe frameworkformodelingdataandbuildingaminingmodel thatlearnscustomerbuyingpatternsfromexistingdata gatheredfrompriorpurchases.Wethenusethetrained modeltogeneraterecommendationsfornewcustomers. Thefirststepistoidentifytheentitywhosebehaviorwe areinterestedinanalyzingforthepurposeofourcross- sellapplication.Acaserepresentsallinformation(also referredtoas“attributes”)knownaboutthisentity.Inthis scenario,eachdistinctcustomerintheCustomerMovies tableandthesetofmoviestheypurchasedformsacase. SQL2005DataMiningusestheconceptofanested tabletorepresentavariable-lengthcollectionofattributes ofthesamekindassociatedwithacase.Foreach customerthereisasetofrowscontainingthelistof moviespurchased,whichcanberepresentedasanested table(asshowninFigure2). Figure2:Theminingmodel’sviewofthecustomermoviepurchasedata. http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 3 Thedefinitionofacaseanditsassociatedattributesis knownasaminingstructureinSQL2005DataMining. Thenextstepistobuildaminingmodelbyselectinga miningalgorithmandspecifyinghowthecolumnsinthe miningstructurewillbeusedbythealgorithmtoprocess theinputdataandextractusefulknowledgefromit.The AssociationRulesalgorithmisagoodfitforourscenario. Itlearnswhichitemsarelikelytobeboughttogetherand utilizesthatinformationtopredictotheritemsgiventhe itemsthecustomerhasselected.MarkingtheMovies nestedtableasbothInputandPredictableenablesthe modeltomakepredictionsusingthemoviesprovidedas input.NotethattheminingmodelinSQLServer2005 DataMiningisadatabaseobjectthatholdsboththe definitionoftheinputtotheknowledgeextractionprocess andtheoutput,whichconsistsofpatternsorrules learnedbyapplyingtheselecteddataminingalgorithmto processtheinputdata. Aswe’llsee,theMiningModelWizardinSQLServer 2005BusinessIntelligenceDevelopmentStudiobuildsa miningstructureandamodelusingthealgorithmselected onthefirstpage. Next,theminingmodeldefinitionissentaspartofa deploymentpackagetotheserverwhereitistrained. Trainingcasesconsistofinformationwehavecollected frompastpurchases.Thedeploymentpackageis generatedfromtheAnalysisServicesprojectthatwe designthemodelin,anditincludesbindingstothedata sourcethatAnalysisServerusesforobtainingthetraining cases. Toimprovethequalityofrecommendationsmadefornew customersthemodelcanbeperiodicallyre-trainedas morecustomerdataisaddedtothetransactiondatabase. Forlargedatasetsthiswilltypicallybecarriedoutduring off-peakhoursoragainstareplicaofthetransaction database.SQLServerDataTransformationServices (DTS)canbeusedtosetupapackagetoperformsuch periodicupdates. Finally,wemustsetuppermissionssoourapplication canquerythetrainedmodel. BuildingaCross-sellMiningModelinSQLServer 2005 Seetheendofthisarticleforinformationabout downloadingthecompleteAnalysisServerprojectbuilt usingthestepsoutlinedhereforproducingthemining modelwe’llutilizeintheASP.NETcodesamplelaterin thisarticle: 1) CreateanewAnalysisServicesprojectnamed MovieRecommendationsinBusinessIntelligence DevelopmentStudio. 2) AddanewDataSourcepointingto MovieData.mdb,theAccessdatabaseincludedwiththe sampleproject(availablefordownload;seeendofarticle fordetails). 3) AddaDataSourceViewbasedonthe DataSource.Selecttheonlytable,CustomerMovies, presentintheDataSource. 4) Right-clickontheMiningModelscollectionand selectNewMiningModeltolaunchtheMiningModel Wizard. 5) PickFromexistingrelationaldatabaseordata warehouseontheSelectDefinitionMethodpageandclick Next. 6) PickMicrosoftAssociationRulesasthedata miningtechniquetouseonthenextpage. 7) SelecttheDataSourceViewcreatedinstep3. http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 4 8) Onthenextpage,thesingleCustomerMovies tablepresentinourDataSourceViewisshown.Markitas bothCaseandNested. 9) ClickNexttogototheTrainingDatapagewhere weneedtospecifythecolumnsweareinterestedin includingfromeachselectedtable(seeFigure3).Inour scenario,theCustomerMoviestableservesasthesource forourcasesaswellasthenestedtabledataassociated witheachcase(thisiswhyitislistedtwice).Fromthe firstinstancepickCustomerIdasthekeysinceituniquely identifieseachcustomer.Fromthesecondinstancepick MovieasKey,Input,andPredictable. 10) ClickFinishtocompletethewizardandbuildthe MovieRecommendationsminingstructureandmodel(see Figure4). Figure3:Selectingcolumnsweareinterestedinmodelingforcross-sellintheData MiningWizard. Figure4:Thecross-sellminingmodelinBusinessIntelligenceDevelopmentStudio. Theabovestepscreateadefinitionofourcross-sell miningmodelandassociatedobjectsinthedevelopment environment.Therearetwostepstodeploythemining modeltotheAnalysisServerandtrainit: 1) Right-clickontheMovieRecommendationsproject intheSolutionExplorerandselectProperties.Verifythat theServerpropertyintheDeploymentsectionof ConfigurationPropertiespointstotheserverhostingyour AnalysisServicesinstance.Closethedialogbox. 2) Right-clickagainontheMovieRecommendations projectintheSolutionExplorerandselectDeploy.This sendstheclient-sidedefinitionstotheserverandinitiates trainingoftheminingmodel. WemustsetupaccesspermissionsinAnalysisServer forIISusingtheSQLmanagementtoolforourASP.NET applicationtousethetrainedminingmodel: 1) OpenSQLServerManagementStudio. http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 5 2) ClickConnectintheObjectExplorer,select AnalysisServer,andconnecttotheAnalysisServer hostingyourmodel. 3) LocatetheMovieRecommendationsdatabasein theDatabasescollection,rightclickonRoles,andselect NewRole.ThisbringsuptheCreateRoledialogbox. 4) OntheGeneralpageenterInternet_Userasthe rolename.ChecktheReadDefinitioncheckboxunder Setthedatabasepermissionsforthisrole. 5) ClickMembershipintheleftpaneandselecta page.AddyourIISuser(thedefaultis IUSR_machinename)totherolebyclickingAdd. 6) NowclickMiningStructureintheleftpane.This showstheCustomerMoviesminingstructureandthe CustomerMoviesminingmodelownedbythemining structure.DropdownthepermissionlistunderAccess andselectReadforbothobjects.AlsochecktheRead Definitioncheckboxforboth. 7) ClickOKtoaddtheRolewiththeabove permissionset. RecommendingProductsBasedontheUser’s ShoppingBasket Nowwe’rereadytoproducemovierecommendationsin ourWebapplicationbyrunningaSQL-likequeryagainst theAnalysisServerthatholdsourtrainedminingmodel. We’veputtogetheraminimalapplication(showninFigure 5)thatdemonstratestheideasbehindarealdeployment, focusingonthegenerationofthepredictionqueryfor gettingrecommendations.TheWebcustomeris assumed tohaveoneormoreitemsintheshoppingbasket,andfor simplicitywehaveatextboxwhereitemscanbeentered manually(separatedbysemicolons).ClickingAddItems toCartdisplaystheitemsintheshoppingbasketandalso showsalistofrecommendations. Figure5:Asimpleshoppingbasketapplication. ThecodebehindthebuttonclickisshowninFigure6; youcanseethatit’squitesimple. PrivateSubButton1_Click(ByValsenderAs Object, _ ByVal eAsSystem.EventArgs) 'Handles Me.Button1.Click 'ParsetheinputintoanArrayListofstrings. DimalInputItemsAsNewArrayList() DimsplitcharAsChar() ={";"c} DimszInputItemsAsString() = Me.TextBox1.Text.Split(splitchar,20) DimiAsInteger Fori = 0ToszInputItems.Length - 1 alInputItems.Add(szInputItems(i).Trim()) Next i 'Additemstotheshoppingbasket. dgShoppingBasket.DataSource =alInputItems dgShoppingBasket.DataBind() 'Gettop5recommendations. DimalRecommendedItemsAsNewArrayList(5) http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 6 GetRecommendations(alInputItems, alRecommendedItems) 'Displayrecommendations. dgRecommendations.DataSource = alRecommendedItems dgRecommendations.DataBind() EndSub'Button1_Click Figure6:Populateshoppingbasketandrecommendations. ThemethodbuildsanArrayListofstringsfromtheitems intheshoppingbasketandpassesittothe GetRecommendationssubroutine,requestingthetopfive recommendationsbasedontheinputitems.Weusetwo DataGridobjects:dgShoppingBasket,toholdtheitemsin theuser’sshoppingbasket;anddgRecommendations,to displaythegeneratedrecommendations.Thereal workhorseistheGetRecommendationssubroutine. ThecoreoftheGetRecommendationssubroutineisthe constructionofthepredictionjoinquery(seeFigure7) thatgetssenttoAnalysisServerandreturnsthelistof fiverecommendations. SELECTFLATTENED TopCount(Predict([CustomerMovies], INCLUDE_STATISTICS), $AdjustedProbability, 5) FROM[MovieRecommendations] NATURALPREDICTIONJOIN (SELECT (SELECT'StarWars' AS[Movie] UNION SELECT'TheMatrix'AS [Movie] ) AS[CustomerMovies] )AS t Figure7:ObtainrecommendationsusingDMXpredictionjoinagainstminingmodel. TheSQL-likequerylanguagesupportedbyAnalysis ServerforqueryingminingmodelsiscalledDMX.The DMXqueryinFigure7performsapredictionjointhat takesthetwomoviesfromtheuser’sshoppingbasket, formsacase,andjoinsitwiththeminingmodelto produceanoutputrowsetcontainingalistofpredicted recommendations. TheGetRecommendationssubroutine,showninFigure8, storesmostofthequeryincompile-timestringtemplates. Whatneedstobefilledinistheinputdata. PrivateSharedSubGetRecommendations( _ ByValvInputItemsAsArrayList, _ ByRefvRecommendedItemsAsArrayList) 'TemplatesforgeneratingDMXpredictionjoin statement. DimstrDMX1AsString = _ "SELECTFLATTENEDTopCount(" + _ "Predict([CustomerMovies], INCLUDE_STATISTICS)," + _ "$AdjustedProbability,5)From[Movie Recommendations] " + _ "NATURALPREDICTIONJOIN(SELECT(" DimstrDMX2AsString =")AS[Customer Movies])ASt" 'Iterateshoppingbasketandproduceinput case. DimcItemsAsInteger=vInputItems.Count DimstrDMXAsString ="" DimiAsInteger Fori = 0TocItems - 1 DimitemAsString =vInputItems(i).ToString() item=item.Replace("'","''") strDMX+="SELECT " +"'"+item+"'AS"+ "[Movie]" Ifi <cItems - 1Then strDMX+= "UNION " EndIf Next i http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 7 'PuttogetherDMXpredictionqueryto 'get5recommendations. strDMX =strDMX1 +strDMX +strDMX2 'ConnecttoAnalysisServerandexecutequery. DimasSessionAsNewAnalysisServerSession asSession.Connect() IfFalse = asSession.ExecuteAndFetchResult(strDMX)Then Return EndIf 'Readpredictionresultsandbuildlist 'ofrecommendations. vRecommendedItems.Clear() WhileasSession.asDataReader.Read() DimtypeAsString = asSession.asDataReader.GetDataTypeName(0) Iftype="DBTYPE_WVARCHAR"Ortype="String" Then Try DimvalAsString = asSession.asDataReader.GetString(0) vRecommendedItems.Add(val) CatcheAsException Console.WriteLine(e.Message) EndTry EndIf EndWhile 'DisconnectfromAnalysisServer. asSession.DisConnect() EndSub 'GetRecommendations. Figure8:Convertshoppingbaskettoinputcaseforminingmodelpredictionandget recommendations. Asexplainedbrieflyduringtheminingmodeldesign phase,dataispassedtominingmodelsintheformof cases.Eachcaserepresentsalltheinformationwehave aboutasinglecustomer.Whenwemakeapredictionfor anewcustomer,theknowninformationaboutthe customerispassedasaninputcasetotheminingmodel. Inthisscenariotheinputcaseisthelistofmoviesinthe customer’sshoppingbasket. Notethatthelistofmovieswasrepresentedasanested tableintheminingmodeldefinition;therefore,weneedto supplytheinputcaseinthesameform.Thisis accomplishedbythenestedsub-selectafterthe NATURALPREDICTIONJOINclauseinFigure7.The UNIONoperationisusedtoaddeachmovieasa separaterowinthenestedtable.Whenwesupplycases inlinelikethis,wedon’tneedtoprovideavalueforthe casekey,CustomerId,sincethatisnotusedforthe predictionoperation. Withtheaboveinputcasedataformatinmindit’snow easytowalkthroughtheGetRecommendationscodeand understandwhatit’sdoing. Westartthequerygenerationprocessbydeclaring templatesfortheDMXstatement.Wetheniterateover theinputitems,thecontentsoftheuser’sshopping basket,toproduceaninputcasewithanestedtable formatasdescribedabove.Theinputcaseisappended to therestofthetemplatetoproducethequerythatwillbe senttotheserver. Next,aninstanceoftheAnalysisServerSessionclassis instantiatedtoconnecttoAnalysisServerandexecutethe queryagainstthetrainedminingmodel.Thecodeforthis classisincludedinthedownload.The AnalysisServerSessionclassusesADOMD.NET,the managedproviderforAnalysisServer,toconnecttothe server.ADOMD.NETimplementsthesameconnection, command,anddatareaderinterfacesasstandard ADO.NET(System.Data.Oledb),sotherestofthecode http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 8 looksexactlylikeADO.NETcode—butoffersbetter performance.(TouseADOMD.NETyou’llneedtoadda referencetoMicrosoft.AnalysisServices.AdomdClient.dll, installedbytheSQLServer2005Beta2Client ComponentsSetup.Ifitisnotlistedunder.NETinthe AddReferencedialogboxinVisualStudio,browseto ProgramFiles\Microsoft.NET\ADOMD.NET\90andaddit fromthatlocation.) Thequeryresultsarefetched,againusingastandard datareaderinterfaceexposedbyADOMD.NETandthe arraylistofoutputitemspopulatedwiththe recommendationsreturnedbythepredictionjoin.Note thattheDMXquerywegenerateusestheFLATTENED keywordtoavoidhavingtoperformhierarchicalresult navigationtofetchtheresults. UndertheHood Inthissectionweexplainwhathappensontheserver whenitreceivesapredictionqueryandhowthe knowledgeacquiredbytheminingmodelmaybeexplored further. Duringthetrainingprocess,theAssociationRulesmodel learnsasetofrulesthatareusedtogenerateproduct recommendations.Ifarulesuchas“[Camera,Film]-> Batteries”wasdiscoveredandthecustomer’sshopping basketcontainsCameraandFilm,thenthisrulefires.Of course,theremaybeotherrulesthatpredictBatteriesas well,inwhichcasetherulewiththehighestscoreis used.Thescoreassignedtoarule,alsoknownasits Importance,takesintoaccountandcompensatesforthe factthattheprobabilityforarulemaybehighjust becausethetargetitemispopularinthedataset.Assume that10%ofcustomersbuy StarWars irrespectiveof otherpurchases,and BladeRunner isboughtbyjust3%. Iftworulespredict StarWars andBladeRunner withthe sameprobability,thescorefor StarWars willbelower sinceit’ssopopular.However,theadvancedusercan finetunethescoreusingalgorithmparameters. Finally,thetop nhighestscoringrulesareusedto generatetherecommendations.Thelastparameterinthe TopCountfunction(again,seeFigure7)setsanupper limitonthenumberofitemsreturnedbytheprediction algorithm. Therulesareorganizedbytheitemspredictedandsorted indescendingorderbasedonthescore.Theprediction algorithmcanavoidlookingatverylargesetsofrulesto achievegoodpredictionperformance. Additionalinsightintoacustomer’spurchasingbehavior maybegainedbyusingtheviewerssuppliedaspartof theBusinessIntelligenceDevelopmentStudio.These includetheAssociationRulesViewer(showninFigure9) forbrowsingtherulesandtheDependencyNetViewer thatgraphicallyshowstherelationshipbetweenitems. Thegraphlayoutalgorithmillustrateshowthestrongest correlatedproductsareclustered. http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 9 Figure9:Viewingmodelcontent. TheAssociationRulesviewerdisplaystheimportancefor eachruleandrankstherulesbythisscore.Asexplained earlier,importanceservesasabettermeasurethan probabilityforfindinginterestingrulesinyourdata. Conclusion Thisarticlemerelyscratchesthesurfaceofthe possibilitiesthatSQLServer2005DataMiningpresents foryourASP.NETapplications.Oneideaforextending thissamplewouldbetoaddothercustomerattributes suchasdemographicsasinputstopotentiallyimprovethe qualityoftherecommendations.OtherpossibleWeb applicationsinclude: Targetedadsbasedonanalysisofbrowsingbehaviorusing theSequence_Clusteringalgorithm. AnonlineHelpdeskthatfindsthemostappropriateanswer forauser’squerybyusingtheNeural_Networkalgorithmin conjunctionwithtextminingtechnologiesavailableinSQL Server2005DataTransformationServices. ThedataminingtutorialforSQLServer2005Analysis ServicesisavailableontheReadinessKitCDincluded withtheBeta2package.VisittheMicrosoftbeta newsgroupat microsoft.beta.yukon.analysisservices.dataminingorthe DataMiningforumat http://www.sqljunkies.com/Forums/ ShowForum.aspx?ForumID=38ifyouhavequestionsabout SQLServer2005DataMining.Thisarticleoffersan outlineofthesimplifiedminingprocessusedforthis specificscenario.Forageneralizedversionofthis methodology,refertoCRISP-DMat http://www.crisp- dm.org. Thesamplecodeinthisarticleisavailablefor download. RamanIyer isaSoftwareDesignEngineeratMicrosoft Corp.andafoundingmemberoftheSQLServerData Miningdevelopmentteamthere.Hecanbereachedat mailto:[email protected]. JesperLind isaResearchSoftwareDesignEngineerat MicrosoftResearchandamemberoftheMachine LearningandStatisticsteam.Hecanbereachedat mailto:[email protected]. http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 10 InformantCommunicationsGroup, Inc. 5105FlorinPerkinsRoad Sacramento,CA95826 Phone:(916)379-0609•Fax: (916)379-0610 Copyright©2005InformantCommunicationsGroup.AllRightsReserved.• SiteUseAgreement •Sendfeedback tothe Webmaster •Important informationabout privacy http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27