Knowledge Discovery
Knowledge discovery is the science of finding patterns within sets of data. This data can consist of anything from boolean values (did he go left? did he go right?), sports scores, web visit statistics, laboratory analyses, or other scientific measurement
By itself data is simply measurement transformed into numbers and statistics, but inherent in the data is a lot of knowledge that can be put to good use. Also known as data mining, knowledge discovery has enjoyed great success in many lucrative fields, including:
- Government surveillance
- Customer relationship management
- Digital mapping
- Medical risk assessment
- Market analysis
- Educational research
Depending on how large the data field is, knowledge discovery can require an automated program or not. The largest data field out there in this day and age, and certainly the most mineable for knowledge is the Internet.
Internet Knowledge Discovery
There are over one billion people who can connect to the Internet, on over one hundred fifty million websites, comprising over five trillion megabytes of data. That’s a lot of information, comprising the sum total of media produced through publicly networked electronic connection in the world. Internet Knowledge Discovery is the science of finding patterns in this massive stew of information, through such means as:
- User/purchase tracking
- Search popularity ranking
- Site architecture analysis
Once all the data is tracked, ranked and analyzed by programs whose job it is to sort out data into organized tables, it often the job of a specialist professional to extract or discover “knowledge” from the data. Knowledge, in this case, is defined as conclusions that can be drawn from the data, often in regards to a set purpose or question. For instance, “what look is hot right now?” or “what type of online behavior leads to suspicious activities,” or “what movie are people most interested in seeing?”
Three Sub-Fields
There are considered to be three main sub-fields of Internet Knowledge Discovery, also known as Web Data Mining. These are:
- Web Usage Mining
- Web Content Mining
- Web Structure Mining
They correspond to the three means listed above. So web usage mining is geared toward answering the questions, “How do people make their way through the Internet?”, “What do people publish to the Internet?”, and “How do people arrange what they publish on the Internet?”
These three fields of Internet Knowledge Discovery often inform one another in a cyclical sort of loop. For instance, if you’re trying to figure out what you should put on your website in order to make money, you need to do some web usage mining in order to understand the latest user trends. If you’re trying to understand how people use the Internet, you need to do some web content mining in order to get a fuller picture of just what’s available to use online.
Cyclical Internet Knowledge Discovery Loop
Many sophisticated web mining programs were written with an understanding of the cyclical nature of discovering deeper levels of knowledge. These run multiple data algorithms, first mining a set of data, then transforming it into a number of sets of abstracted knowledge about the data, then mining those sets of knowledge for even more data.