A lead-lag analysis of the topic evolution patterns for preprints and publications

This paper applied LDA and regression analysis to conduct a lead-lag analysis to identify different topic evolution patterns between preprints and papers from arXiv and Web of Science (WoS) in astrophysics over the last twenty years (1992-2011). Fifty topics in arXiv and WoS were generated using an LDA algorithm and then regression models were used to explain four types of topic growth patterns. Based on the slopes of the fitted equation curves, the paper redefines the topic trends and popularity. Results show that arXiv and WoS share similar topics in a given domain, but differ in evolution trends. Topics in WoS lose their popularity much earlier and their durations of popularity are shorter than those in arXiv. This work demonstrates that open access preprints have stronger growth tendency as compared to traditional printed publications

