Deep learning for financial services analytics: novel algorithms for industry classification, industry assignment, and patent valuation

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

University of Delaware

Abstract

Over the past years, the Financial Service Industry has witnessed an increasing amount of business solutions driven by Financial Technologies (FinTech). A key ingredient of FinTech is the utilization of diversified data sources that are either impossible or hard to be employed by traditional data analytics pipelines. Two notable data sources are network data and textual data, which have been demonstrated to provide useful information for applications such as credit scoring, fraud detection, asset pricing, etc. Albeit with promising application prospect in the Financial Service Industry, textual data and network data are underutilized by current practice: the representation of a financial document usually reduces to counting the frequencies of unique words in it, while the structure of a patent citation network is often summarized by the number of citations initiated or received by each patent. This thesis takes one step forward towards the direction of proposing a general framework for mining business insights from textual and network data based on deep learning methods. Specifically, it studies three related business problems that are fundamental in the Financial Service Industry. Namely, industry classification, industry assignment, and patent valuation. For each of the three problems, the relevant textual data and network data are identified, and a specialized deep learning model is designed by exploiting the unique structure embedded in the investigated problem. ☐ The first problem, Industry classification (IC), refers to the activity of identifying economically related firms. The availability of business descriptions from firms’ financial reports brings a new perspective to IC because industry boundaries are frequently reshaped by disruptive innovations, which is hard to capture dynamically without the capability of automatically digesting a vast volume of financial reports in real time. In this regard, one methodological contribution of the thesis is a novel text representation method which allows the semantic similarity between the business description documents of two firms to be measured more accurately. Armed with firms’ pairwise similarities derived from their business description documents, the most economically related firms of a focal firm can be easily identified by ranking other firms according to their similarities with the focal firm. The superiority of the resulted Industry Classification System is then verified empirically by benchmarking it against several state-of-the-art IC approaches. ☐ The second problem, industry assignment (IA), is rooted in traditional expert-driven Industry Classification Systems where groups of experts are responsible for defining the hierarchical structure of industry categories and then assigning firms to the predefined industries. Using the language of machine learning, IA can be formulated as a standard classification problem where firms are instances and an industry hierarchy defined by experts are labels. In terms of the leveraged data, while IC is purely text-based, IA takes one step further by simultaneously consider textual data and a special type of network data: the tree structure of an industry hierarchy. The resulted classification model of IA establishes the second methodological contribution of the thesis. ☐ While both IC and IA attempt to answer the question that which firms are related in their economic activities, patent valuation (PV) tackles a different but related problem: which firm is likely to bring disruptive technologies that are highly valued by the market and future innovation activities? Accurate assessment of the value of the patent portfolio hold by a company is critical for pinpointing the industry where creative destruction is about to occur. In comparison to IA, the network data employed by PV is much more complicated. The innovation network is heterogeneous (both patents and firms are nodes of the network; patents are connected via directional citations, while firms are linked by various kinds of economic relationships), dynamic (nascent firms keep entering the market, novel technologies are constantly being invented), and attributed (both patents and firms are annotated with side information such as patent documents and financial ratios). Simultaneously modeling the three properties of innovation network incubates the third methodological contribution of the thesis.

Description

"At the request of the author or degree granting institution, this graduate work is not available to view or purchase until August 31 2026."--ProQuest abstract/details page.

Citation

Endorsement

Review

Supplemented By

Referenced By