ebook img

to view abstracts in advance PDF

293 Pages·2008·0.92 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview to view abstracts in advance

Proceedings nd 22 IEEE International Parallel and Distributed Processing Symposium IPDPS2008AdvanceProgramAbstracts Abstracts for both contributed papers and all workshops have been compiledto allow authorsto checkaccuracyand sothat visitors to this Websitemaypreviewthepaperstobepresentedattheconference. Full proceedings of the conference will be published on a cdrom to be dis- tributedtoregistrantsattheconference. Summary of Contents IEEEInternationalParallelandDistributedProcessingSymposium 1 Session1: Algorithms-Scheduling 2 Session2: Applications-GeneralApplications 5 Session3: Architecture-Input/Output 8 Session4: Software-RedundancyandFaults 11 Session5: Algorithms-NumericalAlgorithms 14 Session6: Applications-P2PSystemsArchitecture 17 Session7: Architecture-Multi-core 21 Session8: Software-ImplementingMessagePassing 24 Session9: Algorithms-P2PandOverlayNetworks 27 Session10: Applications-Grids 30 Session11: Architecture-Supercomputing/SIMD 34 PlenarySession: BestPapers 37 Session12: Algorithms-CommunicationAlgorithms 41 Session13: Applications-P2PStructure 44 Session14: Architecture-Power/SMT/ILP 47 Session15: Software-TuningandPerformance 50 Session16: Algorithms-Theory 53 Session17: Applications-P2PReliabilityandTrust 56 Session18: Architecture-Networks 59 Session19: Software-LanguageFeaturesandImplementation 63 Session20: Algorithms-FaultTolerance 66 Session21: Applications-Sensors 69 Session22: Applications-WebApplications 72 Session23: Software-ResourceManagementandScheduling 75 Session24: Algorithms-SensorNetworks 78 i Session25: Applications-AdditionalApplications 81 Session26: Applications-ApplicationsandtheCellProcessor 84 HeterogeneityinComputingWorkshop 87 WorkshoponParallelandDistributedReal-TimeSystems 94 ReconfigurableArchitecturesWorkshop 101 WorkshoponHigh-LevelParallelProgrammingModels&SupportiveEnvironments 122 WorkshoponJavaandComponentsforParallelism,Distribution andConcurrency 130 WorkshoponNatureInspiredDistributedComputing 136 WorkshoponHigh PerformanceComputationalBiology 146 AdvancesinParallelandDistributedComputingModels 152 CommunicationArchitectureforClusters 162 NSFNextGenerationSoftwareProgram 168 High-Performance,Power-AwareComputing 198 High PerformanceGridComputing 204 WorkshoponSystemManagementTechniques,Processes,andServices 211 WorkshoponParallelandDistributedScientificandEngineeringComputing 218 PerformanceModeling,Evaluation,andOptimisation ofUbiquitousComputingandNetworked Systems 232 DependableParallel,DistributedandNetwork-CentricSystems 242 InternationalWorkshoponSecurityinSystemsandNetworks 250 InternationalWorkshoponHotTopicsinPeer-to-PeerSystems 256 WorkshoponLarge-Scale,VolatileDesktopGrids 263 WorkshoponMulti-ThreadedArchitecturesandApplications 271 WorkshoponParallelandDistributedComputinginFinance 277 WorkshoponLarge-ScaleParallelProcessing 284 ii IEEE International Parallel & Distributed Processing Symposium IPDPS 2008 1 SESSION 1 Algorithms - Scheduling 2 Efficient Resource Management Using Advance Reservations for Heterogeneous Grids ClarisCastillo,GeorgeN.RouskasandKhaledHarfoush DepartmentofComputerScience NorthCarolinaStateUniversity Raleigh,NC27695 Email: {ccastil,rouskas,kaharfou}@ncsu.edu Supportforadvance reservationsofresourcesplaysakeyroleinGridresourcemanagement asitenables thesystemto meetuserexpectationswithrespecttotimerequirementsandtemporaldependenceofapplications, increasespredictability of the system and enables co-allocation of resources. Despite these attractive features, adoption of advance reservations is limitedmainlyduetothefactthatrelatedalgorithmsaretypicallycomplexandfailtoscaletolargeandloadedsystems. In thisworkweconsidertwoaspectsofadvancereservations. First,weinvestigatetheimpactofheterogeneityonGridresource management when advance reservations are supported. Second, we employ techniques from computational geometry to developanefficientheterogeneity-awareschedulingalgorithm. OurmainfindingisthatGridsmaybenefitfromhighlevels of resource heterogeneity, independently of the total system capacity. Our results show that our algorithm performs well acrossseveraluserandsystemperformanceandovercomethelackofscalabilityandadaptabilityofexistingmechanisms. Online Scheduling in Grids UweSchwiegelshohn AndreiTchernykh TechnischeUniversita¨tDortmund CICESEResearchCenter RoboticsResearchInstitute 22860Ensenada,BajaCalifornia,Mexico 44221Dortmund, Germany [email protected] [email protected] RaminYahyapour TechnischeUniversita¨tDortmund ITandMediaCenter 44221Dortmund, Germany [email protected] ThispaperaddressesnonclairvoyantandnonpreemptiveonlinejobschedulinginGrids. Intheappliedbasicmodel, the Gridsystemconsistsofalargenumberofidenticalprocessorsthataredividedintoseveralmachines. Jobsareindependent, theyhaveafixeddegreeofparallelism,andtheyaresubmittedovertime.Further,ajobcanonlybeexecutedontheprocessors belongingtothesamemachine. Itisourgoaltominimizethetotalmakespan. WeshowthattheperformanceofGareyand Graham’slistschedulingalgorithmissignificantlyworseinGridsthaninmultiprocessors.ThenwepresentaGridscheduling algorithmthatguaranteesacompetitivefactorof5. Thisalgorithmcanbeimplementedusinga“jobstealing”approachand maybewellsuitedtoserveasastartingpointforGridschedulingalgorithmsinrealsystems. 3 Scheduling with Storage Constraints ErikSaule,Pierre-Franc¸oisDutotandGre´goryMounie´ LIG-MOAISTeam GrenobleUniversite´s,France {firstname.lastname}@imag.fr Cumulative memory occupation is a quite intuitive but not so studied constraint in scheduling. The interest in such a constraint is present in multi-System-on-Chip, embedded systems for storing instruction code, or in scientific computation forstoringresults.Memoryoccupationseenasaconstraintisimpossibletosolvewithapproximationalgorithms.Webelieve thattransformingtheconstraintintoasecondobjectivetooptimizehelpstodealwithsuchconstraints. The problem addressed in this paper is to schedule tasks on identical processors in order to minimize both maximum completion time and maximum cumulative memory occupation. For independent tasks, a family of algorithms with good approximationratiosbasedonaPTASisgiven. Severalapproximationratiosareprovedtobeimpossibletoachievewithany schedule. TheprecedenceconstrainedcaseisthenstudiedandafamilyofperformanceguaranteedalgorithmsbasedonList Schedulingisproposed. Finally,optimizingthemeancompletiontimeasathirdobjectiveisalsostudiedandatri-objective algorithmisgiven. Portioned Static-Priority Scheduling on Multiprocessors ShinpeiKatoandNobuyuki Yamasaki SchoolofScienceforOpenandEnvironmentalSystems KeioUniversity,Japan {shinpei,yamasaki}@ny.ics.keio.ac.jp Thispaperproposesanefficientreal-timeschedulingalgorithmformultiprocessorplatforms.Thealgorithmisaderivative oftheRateMonotonic(RM)algorithm,withitsbasisontheportionedschedulingtechnique. Thetheoreticaldesignofthe algorithm is well implementable for practical use. The schedulability of the algorithm is also analyzed to guarantee the worst-case performance. The simulation results show that the algorithm achieves higher system utilizations, in which all tasksmeetdeadlines,withasmallnumberofpreemptionscomparedtotraditionalalgorithms. 4 SESSION 2 Applications - General Applications 5 A Parallel Software Toolkit for Statistical 3-D Virus Reconstructions from Cryo Electron Microscopy Images Using Computer Clusters with Multi-core Shared-Memory Nodes YiliZhengandPeterC.Doerschuk SchoolofElectricalandComputer Engineering,PurdueUniversity WestLafayette,IN47907-1285 USA DepartmentofBiomedicalEngineeringandSchoolof ElectricalandComputerEngineering,CornellUniversity Ithaca,NY14853-5401 USA A statistical approach for computing 3-D reconstructions of virus particles from cryo electron microscope images and minimalpriorinformationhasbeendevelopedwhichcansolvearangeofspecificproblems. Thestatisticalapproachcauses high computation and storage complexity in the software implementation. A parallel software toolkit is described which allowstheconstructionofsoftwaretargetedatcommodityPCclusterswhichismodular,reusable,anduser-transparentwhile atthesametimedeliveringnearlylinearspeeduponpracticalproblems. Modeling and Predicting Application Performance on Parallel Computers Using HPC Challenge Benchmarks WaynePfeifferandNicholasJ.Wright SanDiegoSupercomputerCenter,LaJollaCA92093-0505, USA {pfeiffer,nwright}@sdsc.edu Amethodispresentedformodelingapplicationperformanceonparallelcomputersintermsoftheperformanceofmicro- kernelsfromtheHPCChallengebenchmarks. Specifically,theapplicationruntimeisexpressedasalinearcombinationof inversespeedsandlatenciesfrommicrokernelsorsystemcharacteristics.Themodelparametersareobtainedbyanautomated seriesofleastsquaresfitsusingbackwardeliminationtoensurestatisticalsignificance. Ifnecessary, outliersaredeletedto ensure that the final fit is robust. Typically three or four terms appear in each model: at most one each for floating-point speed, memorybandwidth, interconnectbandwidth, andinterconnectlatency. Suchmodelsallowpredictionofapplication performanceonfuturecomputersfromeasier-to-makepredictionsofmicrokernelperformance. ThemethodwasusedtobuildmodelsforfourbenchmarkproblemsinvolvingthePARATECandMILCscientificappli- cations. Thesemodelsnotonlydescribeperformancewellonthetencomputersusedtobuildthemodels,butalsodoagood jobofpredictingperformanceonthreeadditionalcomputerswithnewerdesignfeatures. Forthefourapplicationbenchmark problems with six predictions each, the relative root mean squared error in the predicted run times varies between 13 and 16%. ThemethodwasalsousedtobuildmodelsfortheHPLandG-FFTEbenchmarksinHPCC,includingfunctionaldepen- dencesonproblemsizeandcorecountfromcomplexityanalysis. ThemodelforHPLpredictsperformanceevenbetterthan theapplicationmodelsdo,whilethemodelforG-FFTEsystematicallyunderpredictsruntimes. 6 Optimizations in Financial Engineering: The Least-Squares Monte Carlo Method of Longstaff and Schwartz AnamitraRoyChoudhury AlanKing IBMIndiaResearchLab,Plot4 IBMT.J.WatsonResearchCenter, Block-C,Institutional Area 10598Yorktown Heights,NY,USA VasantKunj,NewDelhi,India SunilKumar YogishSabharwal Dept. ofElectricalEngineering, IBMIndiaResearchLab,Plot4 UniversityofSouthernCalifornia, Block-C,Institutional Area LosAngeles,USA VasantKunj,NewDelhi,India InthispaperweidentifyimportantopportunitiesforparallelizationintheLeast-SquaresMonteCarlo(LSM)algorithm, due to Longstaff and Schwartz, for the pricing of American options. The LSM method can be divided into three phases: Path-simulation, CalibrationandValuation. Wedescribehoweachofthesephasescanbeparallelized, withmorefocuson theCalibrationphase,whichisinherentlymoredifficulttoparallelize. We implemented these parallelization techniques on Blue Gene using the Quantlib open source financial engineering package. We achieved up to factor of 9 speed-up for the Calibration phase and 18 for the complete LSM method on a 32 processorBG/Psystemusingmonomialbasisfunctions. Massively Parallel Cosmological Simulations with ChaNGa PritishJetley,FilippoGioachin,CelsoMendesandLaxmikantV.Kale´ Dep. ofComputerScience,UniversityofIllinois atUrbana-Champaign, USA {pjetley2, gioachin,cmendes,kale}@uiuc.edu ThomasQuinn Dep. ofAstronomy,UniversityofWashington,USA [email protected] Cosmologicalsimulatorsareanimportantcomponentinthestudyoftheformationofgalaxiesandlargescalestructures, and can help answer many important questions about the universe. Despite their utility, existing parallel simulators do not scale effectively on modern machines containing thousands of processors. In this paper we present ChaNGa, a recently released production simulator based on the Charm++ infrastructure. To achieve scalable performance, ChaNGa employs variousoptimizationsthatmaximizetheoverlapbetweencom-putationandcommunication.Wepresentexperimentalresults ofChaNGasimulationsonmachineswiththousandsofprocessors,includingtheIBMBlueGene/LandtheCrayXT3. The paper goes on to highlight efforts toward even more effcient and scalable cosmological simulations. In particular, novel loadbalancingschemesthatbasedecisionsoncertaincharacteristicsoftree-basedparticlecodesarediscussed. Further,the multisteppingcapabilitiesofChaNGaarepresented,asaresolutionstotheloadimbalancethatsuchmultiphasesimulations face. Weoutlinekeyrequirementsforaneffectivepracticalimplementationandconcludebydiscussingpreliminaryresults fromsimulationsrunwithourmultiphaseloadbalancer. 7

Description:
The algorithm is a derivative Anamitra Roy Choudhury .. index to a key table, suffers from setup failures and static membership testing for keys. Abhinav Bhatelé1, Sameer Kumar2, Chao Mei1, James C. Phillips3, Gengbin
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.