In general, it is contrasted with classical data in which each data point consists of a single (categorical or quantitative) value, where symbolic data can contain internal variation (list, ranges, etc.) and can be structured. These summarized data are termed as symbolic data. SDA refers to summarizing a large dataset in such a way that the resulting summary dataset is of a manageable size and yet retains as much of the knowledge in the original dataset as possible. To incorporate new concepts in knowledge representation, Diday has introduced a phrase called Symbolic Data Analysis (SDA). Therefore the open challenge in big data is big data analysis which concerns in organizing and analyzing large sets of data to discover patterns and knowledge. However standard statistical methods cannot be made applicable directly to represent such huge data. Turning big data into knowledge becomes a challenging task. Such a massive volume of data storage is generally termed as “Big Data”. For instance observation reveals that 3 billion contents are being shared on Facebook every month the photos viewed every 16 seconds in Picasa could cover a football field. Īs we live in a digital world, there is a steady increase in accumulation of structured and unstructured data from various sources such as transactions, social media, sensors, digital images, videos, audios and click streams for domains including healthcare, retail, energy and utilities. In order to discover the knowledge from large volumes of data, undoubtedly the term data mining pops up, since data mining is the field which can handle large volumes of data and can derive useful knowledge from it. It has become a very important process to all the businesses and organization since it provides useful information that is most important to the business and future business decisions. Knowledge Discovery in Databases (KDD) is defined as the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. This demonstrate s that PLR is a powerful knowledge representative for very large database. Propose a distance measure for computing the distance between the PLR lines.Ĭase study is presented based on the real data of online education system LMS. On the other hand to carry out a cluster analysis, we Storage intensive while compared to th ose of histograms. (PLR) line object and suggest that PLR objects can be less computational and Propose an idea to transform the histogram object into a Piecewise Linear Regression Histogram has received significantĪttention as summarization/representative object for large database.īut, it suffers from computational and space complexity. One approach is to summarize large datasets in such a way that the resulting The wide variety makes theĪnalysis tasks of a generic database a strenuous task in knowledge discovery. Variety of data being stored in huge collections. Developments in database technology have seen a wide
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |