Method of Using Graph Model to Store Algorithm Dependencies
In the era of big data,the number of algorithms used for data processing is exploding.The current management method for a large number of algorithms is usually to classify and label the algorithms,or store task flows composed of algorithms on a task-by-task basis,while insufficient attention has been paid to the topological relationships between algorithms in the task set.With the accumulation of domain knowledge and task flows,the dependency between algorithms becomes increasingly important.Based on the requirement of massive algorithm management,this study proposes a management method for splitting branched dependencies into unbranched dependencies.By searching for topological relationships through pointers in an index-free adjacency graph database,it avoids Join operations and has innate advantages in managing algorithm dependencies.In addition,this study proposes connection points to highlight the reusability of algorithm modules,which are utilized to represent dependency edges in the graph model.The position of algorithm modules in different task flows can be distinguished,so that algorithm modules reused by multiple tasks only need to be represented by one algorithm module node in the graph.Finally,based on specific projects,the algorithm relationship management method proposed in this study is validated.It is proved that the algorithm relationship management method has significant advantages in scenarios where the number of algorithms is large and the algorithm modules are highly reusable.