Trending ▼   ResFinder  

IIT GATE Exam 2023 : CS - Computer Science and Information Technology : data warehouse

9 pages, 2 questions, 0 questions with responses, 0 total responses,    0    0
Ratul Banerjee
  
+Fave Message
 Home > cairos >

Formatting page ...

Spatial databases and text databases ---- spatial databases are derived from geospatial consortium and used to draw 3d geometric shapes, add onto or modify existing 3d geometric shapes, have observer reference and endowed with 3d spatial indexes text databases are available in words and texts and respond to Ad Hoc queries while data mining deals with off line, structured text or numeric data where pattern matching occurs, web mining deals with on line, unstructured or semi structured text or numeric data where no pattern matching occurs . Web usage mining where online data is stored in case of web logs, web content mining that deals with semi structured hyper link of www , web structure mining that deals with page layout of www. Challenges in web mining --- data segmentation , customers accessibility etc where data has outliers, decision tree induction finds use and binary trees are pruned to off set this kind of data binary trees can be pre or post pruned where in post pruned trees are a more reliable approach a separate set of tuples cannot be used to evaluate pruning as does not deal with training samples and does not address over fitted data . Outliers analysis can include clustering or regression testing where in clustering, outliers are detected as data points not belonging to any clusters and more preferred than regression testing data characterization --- a data mining task that deals with target attributes of a data sample/population data discrimination ---a data mining task that deals with target attributes of a data sample against contrary data olap or on line analytical procesing is a great sw technology that like dwh, performs just query + analysis olap performs faster indexing of precomputed summarized data has denormalized tables stores historical data blue print of operational intelligence can slice and dice data can roll up, drill down etc olap servers --- rolap, molap . Rolap servers are more scalable and less flexible than molap olap e.gs include erwin, cognos etc olap is in conformance of FASMI principles or fast access of shared multi dimensional information classification accuracy Is determined through ratio of correct no of test cases to sum of all test cases . We can again describe accuracy as ratio of true positives to sum of true positives + true negatives + false positives + false negatives precision = true positives /true positives + false positives sensitivity = true positives /true positives + false negatives f1 score = precision x sensitivity / 1+ precision + sensitivity min max normalization scales attribute values between 0 and 1 z score normalization where mean of sum of attribute values=0 and standard deviation =1 Nominal attributes Where order of data is not imp though imp is difference in intervals of data e.gs blood collection units Ordinal attributes Where order of data is imp though not imp is difference in intervals of data e.gs socio economic status Interval attributes Where imp is both order of data + difference in intervals Ratio attributes of database Where imp is order of data, difference in intervals of data and a 0.0 where 0.0 = none or no data e.gs enzymes, concentration, temperatures CRISP DM PROCESS MODEL OF DATA MINING stands for cross industry inter operable standards of data mining, a open source platform where the process flow model finds having 6 not that rigidly defined stages that can move back and forward and outermost layer Is a cyclical process encompassing data source + deployments and lessons learned at end of each stages are valuable bench mark principal component analysis ---- raw data gets transformed through orthogonal transformation of coordinates to give data as principal components meta data = data about data stored in meta data repositories, technical meta data accessed by DBA, DWH administrator and programmers and business meta data accessed by end users decision tree is a data mining algorithm and uses ID3 Algorithm that finds data homogeneity and entropy finds to which extent data is homogeneous , for perfectly homogeneous data entropy =0 and where data is split into attributes entropy =1 . we find entropy from a freq table , information gain is related to entropy reduction. We construct a decision tree by finding information gain for each variable summing up the information gains till a max that renders entropy a minimum dissociation rules = negative Association rules olap servers consist of rolap, molap and hybrid olap servers rolap where an intermediate server occurs between server side back end processing and client side front end user tools . Rolap is more scalable less flexible than molap molap or multi dimensional olap, where if no of dimensions of data cube >3 are poly cubes and storage in array based multi dimensional storage search engines . Molap is less scalable and more flexible than rolap hybrid olap or holap includes both higher scalability of rolap and greater flexibility of molap servers fasmi principles of olap : stands for fast access of shared multi dimensional information as follows : fast access ---- olap provides fast access to precomputed summarized data and access time <5 minutes shared ---- data sharing is enabled multi dimensional---- olap supports MD DBMS that have parallel procesing and improves scalability . ETL provides support to servers information ---- olap stores meta data and historical information discrete attributes are stat based continuous attributes are qualitative measures . discrete attributes are non overlapping, mutually inclusive, both limiting values occur and continuous attributes are overlapping, mutually exclusive, just one limiting value occurs . Discrete attributes are represented in isolated points and continuous attributes are represented on connected points on a graph dwh follows a layered architecture and 3 tier dwh architecture is common though 2 tier dwh architecture can also occurs 3 tier dwh architecture has foll layers ---bottom layer has centralized RDBMS servers and MDDBMS that have parallel procesing and improves scalability . ETL provides external functions to servers middle layer has olap servers top layer has user programs that interact with olap servers 2 tier dwh architecture where separation of a physical abstraction block occurs from user programs aggregate functions of olap on multi dimensional data include : roll up --- from low level abstraction gives rise to high level abstraction through generalization or common attributes drill down ----- from high level abstraction give rise to low level abstraction through specialization slice and dice data --- performs projection operations on dimensions of data pivot --- rotate and cross tabulate data sort data select nominal attributes etc ETL or extract transform -load is important pre processing In case of data staging and data acquisition components of a data warehouse and data mining the enunciation is data cleaning that entails noise removal. Data is thereafter extracted from dwh to a database extracted data is transformed . During load of data process, dwh is essentially off line 3 types of load include ---- initial load, incremental load and full refresh . Incremental load where changes reflected incrementally and initial load where all tables get a full refresh regression Is a data mining task based on past examples linear regression finds a straight line equation between one independent predictor variable and one dependent response variable . Non linear regression finds a quadratic equation between 2/more independent predictor variables and one dependent response variables classification Is a data mining task that deals with discrete unordered data and causes grouping of similar data . Unlike eager classification, lazy classification is unsupervised learning where target attributes and end result not explicitly described .unlike eager classifiers, lazy classification deals with complex data and / or incomplete problem domains . Examples of eager classifiers --- decision tree, Bayesian and examples of lazy classifiers ---- case based reasoning or CBR etc

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

 

  Print intermediate debugging step

Show debugging info


 

 

© 2010 - 2025 ResPaper. Terms of ServiceContact Us Advertise with us

 

cairos chat