Data-mining is the practice of creating a database by extracting information from text or video documents of multiple formats. This can be done manually, algorithmically, or by a combination of both.
Manual data-mining begins by determining a set of organising principles for the data, including how the information should be categorised. These principles are determined not only by the content of the source material, but also by considering how that information will ultimately be analysed or presented.
The computer science algorithms in the domain of automated data-mining are known as ‘information retrieval’ (IR) algorithms. Algorithmic data-mining is a powerful mechanism to search vast amounts of content for keywords or phrases, and alongside machine learning processes can be employed to retrieve relevant images, audio, and video from large datasets.