Patterns of interest change in stack overflow

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up-to-date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Scientific Reports volume 12, Article number: 11466 (2022) Cite this article
433 Accesses
6 Altmetric
Metrics details
Stack Overflow is currently the largest programming-related question and answer community, containing multiple programming areas. The change of user’s interest is the micro-representation of the intersection of macro-knowledge and has been widely studied in scientific fields, such as literature data sets. However, there is still very little research for the general public, such as the question and answer community. Therefore, we analyze the interest changes of 2,307,720 users in Stack Overflow in this work. Specifically, we classify the tag network in the community and vectorize the topic of questions to quantify the user’s interest change patterns. Results show that the changing pattern of user interest has the characteristic of a power-law distribution, which is different from the exponential distribution of scientists’ interest change, but they are all affected by three features, heterogeneity, recency, and proximity. Furthermore, the relationship between users’ reputations and interest changes is negatively correlated, suggesting the importance of concentration, i.e., those who focus on specific areas are more likely to gain a higher reputation. In general, our work is a supplement to the public interest changes in science, and it can also help community managers better design recommendation algorithms and promote the healthy development of communities.
In recent years, benefiting from the ongoing process of datafication, more and more data are being collected and analyzed to discover human activity patterns. Meanwhile, the constantly developing and cooperation of computer and social science prompt scientists to explore the essential features of these activities, e.g., innovation. Yet, little is known about the underlying strategies of exploring knowledge. Interest drives humans to explore knowledge in different domains, resulting in different exhibit strategies and further affecting future success. Current research in science shows that interest shift patterns are the representations when people decide to adopt what kind of knowledge exploration strategy. For scientists, shifting in interest will affect their productivity and the investment received Furthermore, with the growth of their careers, the probability of scientists switching research fields will increase. When scientists try to shift their interests, different exploration strategies result from the trade-off between stable productivity and creative innovation. Basically, these strategies can be divided into two branches, conservative and radical strategies. Conservative strategy prefers to select existing and more traditional research directions. These directions may help scientists maintain stable productivity. However, when the knowledge exploration strategy is within narrow boundaries, it is unlikely to be the source of the most fruitful idea. On the contrary, the radical strategy is prone to explore those new areas, bringing breakthrough results, praise, and success to scientists. Meanwhile, innovation and novel insights are more likely to be sourced from exploring areas Nevertheless, it is also a risky strategy, often associated with failure, reduced productivity, and the challenge of advancing ideas in the new academic world Despite the internal impact of the researcher’s characteristics, the external academic environment also affects scientists’ interest shift and thus adopts different exploration strategies, such as the investors’ strategie and team collaboration
Persistence is a noble quality for humanity. In science, for those researchers who stick to their research fields, persistence will bring them rewards, for example, the lower probability of interest switching is often causing a higher number of citations. However, recent research shows that shift interest is necessary for success, e.g., those scientists who accumulate ideas in the exploration stage and then concentrate on the focus research field in the development stage are more likely to lead to the “hot streak” emergence. Moreover, there are many other factors that affect interest shift patterns, such as genderobility, reputatio, and mento. Although there are many influencing factors, the macro patterns of interest shifting are regular, especially for scientists. Scientists have a high degree of regularity in their careers, e.g., Matthew effect in contribution random influence in publication. However, there are still questions about whether these patterns exist in the general public? To address this question, we study the interest change patterns of the general public in the Q & community.
Question and Answering (Q &A) websites provide a channel for the general public to seek knowledge. Although they may not be as professional as scientific journals, they are useful and popular with the general public. Stack Exchange is one of the most popular Q &A websites, containing many Q &A communities in different particular domains. In these communities, Stack Overflow is the one for people seeking questions and answers in programming, providing the convenience to developers for catching up with the rapid development of various skills and paradigms In order to attract more users, the Q &A communities introduce the gamification mechanism. This gamification mechanism motivates users to enhance their continuous learning ability and participation, e.g., reputation score, upvotes, downvotes, bounties, and badges. In the community, reputation also implies the authority of users, a high reputation will encourage users to contribute knowledge to the community and regards as a badge of glory. Because of the high contribution made by Stack Overflow to the programming ecology, lots of researchers begin to study the collective intelligence behind the Q &A communities, such as the network in the community, the unanswered questions, low-quality posts, and the quality of answers. This platform also provides us the opportunity to study the interest change patterns of the general public, e.g., will the general public shift their interest to get a higher reputation? And the interest pattern study is also meaningful for the community managers to better adjust the recommendation system to attract more users.
To investigate the interest change patterns of the general public, we explore the interest change patterns in Stack Overflow and analyze the features that affect the patterns, more detailly, the main contributions of our work are as follows:
Firstly, our study quantifies the changes in user interest in Stack Overflow and explores the overall pattern of interest changes.
Secondly, our study finds that changes in user interest are affected by three features: heterogeneity, recency, and proximity. The specific effects of these three features have been explored, and random experiments have been designed to prove them.
Thirdly, we study the relationship between users’ interest and prestige changes and find that users with high prestige have lower interest changes.
The rest of the paper is organized as follows. The “Methods” section introduces the dataset and presents the method to quantify the interest change. Then, the experiments and results are shown in the “Results” section. Finally, the “Discussion” section concludes with our works and future works.
Our work is based on the publicly available dataset in Stack Exchange(https://archive.org/download/stackexchange), the main focus is Stack Overflow Q &A community, and the time frame spans Jul. 2008 to Sep. 2016. As summarized in Table, the dataset provides all the posts, including questions and answers, tags, posting dates, and user reputation. The statistical distributions of users and tags are shown in Supplementary Fig. S1. It can be seen that both distributions are subject to a power-law distribution, which means that most users tend to ask few questions (such as less than 50) and a large number of submitted tags are used only a few times (such as less than 50 times). In order to quantify the pattern of interest change of the individual, there need to be sufficient questions. Therefore, our work focus on the active users who asked more than 50 questions, totaling 31,303 users. Furthermore, tags are the words selected by the users to cover the question’s domain broadly. To make sure the tags represent the technical directions of questions, only tags that occur at least 50 times are focused, totaling 19,978 tags.
Inspired by Jia’s work, in this work, we analyze the sequence of user questions in Stack Overflow and quantitatively show how individuals shift their interest focus over time. To capture the evolution of interest and systematically address the interest patterns of Q &A community users, we calculate each user’s topic vector. Furthermore, the question’s topic is abstracted to the tags. However, the tag is mainly determined by the poster, thus may cause custom labels that have never appeared before, which will result in too many tags. Thus, in order to further condense the topic, we construct the tag network. Specifically, the nodes represent the tags in the tag network, and the tags are connected if they co-occurrence in the same question. The tag network is then divided into communities by the Infomap algorithm, an efficient discovery non-overlapping community algorithm based on information theory. Finally, this tag network is divided into 327 communities and about 100 communities with high attention (Supplementary Fig. S2), i.e., containing lots of tags. The characteristics of the tag network are provided in Supplementary Tab. S1. Each community represents a topic or a main technical direction in Stack Overflow.
When a user submits a question (Q_i), the corresponding tags constitute a tagged tuple, e.g., (A1, B2, C3), where the capital letter indicates the topic to which the tag belongs. Further, the topic tuple can be represented by (A, B, C). Additionally, for a given set of questions submitted by a community user, the topic vector ({textbf {V}} = (t_1, … t_i, … t_N)) represents the user’s interest, ({textbf {V}}in R^N), N is the number of topics in the Stack Overflow. Where (t_i = 0) if the user has not submitted the ith topic, otherwise (t_i = sum _{q=1}^{m}f_{i, Q_q}/m), (f_{i, Q_q}) is the normalized frequency of occurrence of the ith topic in the submitted question (Q_q) and m is the number of questions in subsequence. As an example shown in Fig., taking two consecutive questions as a subsequence, e.g., ((Q_1), (Q_2)) with (m = 2), the tag tuples of the questions are (E1, N5, O10) and (E4, K5, E7) respectively, and the topic tuples are (E, N, O) and (E, K, E) respectively. Thus, the element value of topic E can calculate as ((1/3+2/3)/2=1/2), because topic E appears once in (Q_1) and twice in (Q_2). The detailed definitions of bolded words are provided in Supplementary Note 1.
An example is to calculate interest change (J (m=2)). Firstly, the technical topic of the question is determined according to the question tags. Then, the initial topic vector (V_{b}) and the final topic vector (V_{e}) are generated based on the initial and final m questions. Finally, the interest change J is measured according to the complementary cosine similarity between topic vectors.
The user’s interest may change over time, thus, to quantify this pattern, our study takes the first and last m questions to characterize the interest change. Specifically, as shown in Fig. 1, the beginning topic vector (V_b) and end topic vector (V_e) is calculated through the first and last m questions. Then the interest change can quantify by the complementary cosine similarity as:
Equation () captures the user’s interest change from individual activities in the Q &A community in a topic view. Extremely, if (J=0), the beginning and end m questions share the same topic, which means the user’s interest never changes. Contrarily, if (J=1), the beginning m questions’ topic is different from the last m questions, which means the user’s interest completely changes, in other words, the user no longer participates in the original topic of interest.
The dataset we used for Stack Overflow is publicly available(https://archive.org/download/stackexchange) and cc-by-sa 4.0 license. All methods were carried out in accordance with relevant guidelines and regulations.
To exhibit the overall scenery of the interest changes for the entire community, we plot the distribution of users’ interest changes in Stack Overflow. As shown in Fig, this distribution follows a power-law distribution, which indicates that most Q &A users have little changes in their topic interests, however, there are still users who significantly switch their topic interests, albeit very rarely. Furthermore, it is interesting to find that the distribution in the Q &A community is quite different from the academic i.e., the distribution of research interest in the academic follows an exponential distribution but in Stack Overflow follows a power-law distribution. Compare with the academic field, the proportion of users with large J in Stack Overflow is higher. In order to characterize what affects the pattern of interest change in detail, our study investigates three features: heterogeneity, recency and proximity, and the corresponding experiments demonstrate as follows. Furthermore, in our experiment, (m=15), however, the different values of m (such as (m= 5,10,20)) are also tested, and the power-law distribution characteristics of the J distribution remain unchanged.
Distribution of interest change J. The blue dotted line is the proportion of J. We choose least squares regression to fit the data, and the fitted result is shown with the red line ((P sim J^{-0.608})), P is the proportion of J. The proportion decreases with the increase of J and can be well fitted by the power-law function.
For an individual in the Q &A community, her attention to different topics may not be homogeneous, which means her interest range may contain the core interest subjects coexistence with the few other occasionally touched topics. For example, the mobile phone developer may use JAVA and Android tags and occasionally appears in Windows tag. To verify this, we plot the frequency of topic tuples in Fig… The power-law distribution clearly demonstrates the heterogeneity feature in the individuals’ interest topic. To further explore this feature, we remove the heterogeneity of the topic tuple sequence, i.e., only the topic tuples that appear for the first time are retained, and the remaining recurring topic tuples are replaced with zeros, thus the length of the sequence does not change (Figa), then exhibit the comparison result in Fig. The difference in distribution is quite significant for the original and modified J distribution. The modified J distribution shows a sharply rising trend followed by a slowly falling, eliminating the original data’s power-law decrease. This phenomenon is similar in the academic field, that is, after removing heterogeneity, the proportion of people with small J decreases significantly in the academic field. It implies that heterogeneity plays a role in limiting interest changes in both fields. The difference between the heterogeneity in academic publication and that in Stack Overflow is that the frequency of the number of questions with the same topic tuples submitted by users decreases slower than that of the papers with the same topic tuples published by scientists. Additionally, a jump occurs when (J=1), which is mostly because of our way of removing the heterogeneity. The high repetition between the beginning and end topic tuples causes the smaller end topic vector. Extremely, if all the elements in the end topic tuples have appeared before, then (V_e=0) and (J=1).
Frequency of topic tuple. K is the number of times that the same topic tuple occurs in a question sequence. The blue dotted line is the proportion of K. We choose least squares regression to fit the data, and the fitted result is shown with the red line ((P sim K^{-1.22} mathom {e}^{-0.028 K})), P is the proportion of K. The distribution shows a power-law distribution with an exponential cutoff, which indicates the heterogeneity of interest change.
Heterogeneity of interest change. (a) Removal of heterogeneity. Only the topic tuple that appears for the first time is retained, to ensure that the user’s attention to each different domain is the same in the modified sequence. The removed questions are replaced with zero vectors. (S_O) is the original sequence, and (S_R) is the modified sequence. (b) The distribution of J after excluding heterogeneity. The blue line is the J of the original data, and the orange line is the J of the data after removing heterogeneity.
Recency is the tendency to redo things similar to what has been done recently. To investigate this feature in Stack Overflow, we focus on the distance between the topic tuples, denoting as (Delta {d}), which is defined as the number of different topic tuples between two identical topic tuples. Calculating (Delta {d}) on the entire topic tuples sequence, we can get the (Delta {d}) sequence, as the example shown in Fig. . Then we construct a null model, which reshuffling the original topic tuple sequences of the user. For the question sequence, the length of the sequence is constant, but the order is shuffled (Fig. b). To compare the distribution (P(Delta {d})) with the reshuffled distribution (P_{0}(Delta {d})), we plot the distribution of ratio (P(Delta {d})/P_{0}(Delta {d})) as a function of (Delta {d}), as shown in Fig.. It is found that the ratio declines as (Delta {d}) increases, which implies the Q &A community users tend to submit questions in the same domain as they have recently submitted, and rarely return to their original interest after turning to a new interest, prompting users to explore the new domains continually. Furthermore, the reshuffled model eliminates the power-law decrease observed in the original distribution and behaviors a steeper decrease with an exponential distribution from the view of interest change, as shown in Find. The significant changes in the interest change distribution verify the recency feature does exist in the Q &A communities when users explore their interest. Compared to the academic field the trend of observed J distribution after excluding recency is similar in the small J range, the proportion of people with smaller J (near 0.2) is larger than the original distribution. This phenomenon implies that recency plays a similar role in increasing the proportion of J in both fields. However, excluding recency in Stack Overflow prompts the observed distribution from a power-law distribution to an exponential distribution, while in the academic publication, the distribution maintains exponential but decays steeper. As the form of distribution changes from power-law to exponential in Stack Overflow, the proportion of users with extremely large J decreases more significantly than in academic publications. Excluding recency changes the distribution of J from power-law to exponential in Stack Overflow but not in academic publications, which implies that recency affects the users more than scientists. To further illustrate the role of recency, we compare the proportion of the first m topic tuples repeated in the last m topic tuples before and after removing the recency of the sequence. The result shows that on average 17.74% of the topic tuples in the original data are repeated, and this proportion rises to 31.57% after removing the recency, indicating that the recency hinders people from returning to the older direction of interest.
The recency of interest change. (a) An example to get the (Delta {d}) sequence. The number in the sequence represents the interval between the questions with the same topic tuple, denoted as 0 if a topic tuple is the first occurrence. (b) Removal of recency. For the reshuffled sequence, randomly change the order of each question. (S_O) is the original sequence, and (S_R) is the modified sequence. (c) The ratio between the distribution of (Delta {d}) of real data (P(Delta {d})) and that of the reshuffled sequence (P_{0}(Delta {d})). The ratio implies that questions raised recently have had a greater impact on users than those raised long ago. (d) The distribution of J after excluding recency. The blue line is the J of the original data, and the orange line is the J of data after removing recency.
Unlike the recency feature describing the user’s interest pattern from a time perspective, the proximity feature studies the pattern from the topic’s geographic view. In the Q &A community, the proximity feature is reflected in the situation when users want to explore a new interest domain, the domain they chose is more similar to their current interest domain than a new field. To verify this, we focus on the proximity distance with the definition of interest change holds. Specially, we replace each distinct topic tuple by randomly choosing a topic tuple in the topic tuple pool which stores all topic tuples in the data, and keeps the length of the sequence not changed. It should be noted that, in the randomized sequence, the number of each topic tuple and the order the topic tuple is used are retained. For example, as shown in Fig. 6a, the original topic tuples sequence (S_O) is “(a, b, c), (a, b, c), (a, b, d), (a, b, c), …”, we replace (a, b, c) with (a, I, f) and (a, b, d) with (a, b, h), respectively. Where the topic tuples (a, I, f) and (a, b, h) are randomly chosen from the pool, which stores all topic tuples in the data. Finally, the modified sequence (S_R) is “(a, i, f), (a, i, f), (a, b, h), (a, i, f), …”, whose relative position of topic tuples has not changed. In this way, the modified sequence simulates that when the user changes their interest field, the new field has no relationship to the current field. The obtained distribution shows that excluding the proximity feature simultaneously reduces the proportion of users with small J ((J < 0.3)) and large J ((J > 0.7)), which fits the Normal distribution (mythical {N}(mu, sigma ^{2})) well (the value of chi-square is 0.0076, which is quite small), as shown in Fig.b, where (mu ) is mean and (sigma ^{2}) is the variance. The phenomenon is different in the academic publication, after excluding the proximity, only the proportion of scientists with small J decreased. The decreases in the proportion of users in Stack Overflow with small J and that of scientists with small J imply that proximity is one of the reasons for their interests to change slightly. However, proximity has different effects on different users in Stack Overflow. The decrease in the proportion of users with large J implies that the effect of proximity is also one of the reasons for their interests change vastly, they will explore fields that are less relevant to the initial field after being affected by proximity. In summary, the proportion both of small and large J in Stack Overflow is reduced after excluding proximity, which implies that proximity has the effect of limiting or promoting interest change, and the effect is different for different users.
The proximity of interest changes. (a) Removal of proximity. Replace each distinct topic tuple by randomly choosing a topic tuple in the topic tuple pool. The number and order of each topic tuple’s occurrences are unchanged. (S_O) is the original sequence, and (S_R) is the modified sequence. (b) The distribution of J after excluding proximity. The blue line is the J of the original data, and the orange line is the J of the data after removing proximity.
Scientists pay great attention to their research quality and impact, they collaborate and earn reputations in academia. Interestingly, reputation also prompted the Q &A community users to be more active in the community, e.g., submitting high-quality questions and answers quickly. These phenomena trigger us to explore the relationship between reputation and user behavior in exploring interest. To do this, we first check users’ average short-term interest change quarterly. Specifically, denoting (J_s) to quantify the short-term interest change of users. The calculation of (J_s) is similar to J, but topic tuples in two consecutive quarter-time windows are used instead of the beginning and end topic tuples. The short-term interest change refers to the interest change between the questions in the adjacent two quarters. When calculating (J_s), we use the quarter as the time window instead of m questions and calculate the topic vector, then calculate interest changes of adjacent quarters in the sequence as shown in Fig. . In order to calculate average short-term interest change (angle J_s rangle ) in the ith quarter, we calculate all users’ (J_s) in adjacent I and (i+1) quarter, and normalized them with the number of users. The time users post their first question is chosen as the start point of the quarter-time window for each user. Figure depicts the evolution of (angle J_s range ) over time, where the time window is selected as a quarter. The observed increasing trend indicates that users are accustomed to continuously switching interests. Scientists switch research fields for productivity, but it will negatively affect their influence Inspired by this phenomenon, we raise the question of how would users’ changing interests affect their reputation? To address this question, we study the relationship between reputation and J for active and inactive users, as shown in Fig. The active users are selected if a user raises questions every month from the beginning to the end during the whole career, conversely, the user who has not asked a question for a month is considered as an inactive user. The figure shows that the interest change J negatively correlated with user reputation, whether active or inactive users. Furthermore, the reputation of active users is always higher than inactive users when interest change J is small. One plausible explanation is that exploring new domains is a risky strategy, not all explorations are fruitful. Continuous switching of interests may make individuals impossible to develop knowledge and capabilities in the focal domain. Furthermore, the reputation also will be attributed to users who continuously contribute to the same domain. This pattern underlines the importance of concentration and may not be a particular case for the general public when exploring their interests. Similar patterns are observed among the scientists, e.g., Ref finds that the scientists with high citations have a lower probability to change their research direction in their career periods.
The difference between questions selected when calculating J and (J_s). To calculate (J_s), we use quarter window instead of m questions, e.g, in the first quarter, it contains three topic tuples, i.e., (a,b,c), (a,b,c) and (a,b,d), in the adjacent second quarter, it contains two topic tuples, i.e., (a,b,c) and (a,b,f).
Relationship between time and (angle J_s range ). The average short-term interest change (angle J_s range ) is linearly correlated with the time that users spend in the community.
Relationship between user reputation and interest change. The blue dot represents the average reputation of active users with similar J values (angle R_a range ). The orange dot represents the average reputation of inactive users with similar J values (angle R_i range ). The solid lines are the fitted lines of reputation and J. The blue line is the fitted line of (angle R_a range ), and the orange line is the fitted line of (angle R_i range ). The shaded areas are the confidence interval (0.95). The coefficient of determination (R^2) of active and inactive users is 0.594 and 0.261, respectively. In order to exclude the influence of career length on reputation, we only selected users whose career length is 40–60 quarters.
In summary, our work studies the Q &A community user’s interest change patterns. Interestingly, our findings show that the user’s interest change follows a power-law distribution, which is entirely different from the research interest change distribution of the scientists (exponential distribution), indicating that users in the community are more inclined to exploration strategy. Compared to scientists, due to scientists’ characteristics, i.e., the long-term accumulation of discipline knowledge, scientists are more inclined to explore in the previous research stage and then concentrate on their current topics. Despite this, the relationship between user interest change and reputation indicates that if users want to get a higher reputation in the Stack Overflow community, concentrating on the topic is still necessary. This phenomenon also highlights the difference between the general public and scientists in exploring knowledge strategies. Moreover, the user’s interest may shift to a new domain that is entirely different from the original over time, suggests that the community managers could consider the characteristics of user interest change when designing recommendation systems, e.g., pay more attention to the user’s current interests than consider all historical interest.
Furthermore, we study the three important features that significantly infer the observed distribution of interest change: heterogeneity, recency, and proximity. The heterogeneity makes users’ exploratory behavior more conservative in the Q &A community, while the recency feature has the opposite effect, it makes users explore new domains and result in a broader variety of interest change. The proximity feature prevents the interest change of users from presenting a Gaussian distribution. It increases the proportion of users with extreme interest change, e.g., the small-scale and large-scale interest change, which may be a reason for the power-law distribution of interest change. Moreover, the literature on research interest patterns of scientists also supports these trends of exploring knowledge. Furthermore, in this work, we only focus on the interest sequence and ignore the timescale, which is another important feature. In future work, we will consider the timescale and investigate the explosive interest emerging in a short time. Additionally, in this work, we only consider the most straightforward community algorithm, however, the division result of the tag network may be influenced by the hypernym-hyponym relations. Thus, in the future, to make our division results more accurate, we will consider the hypernym-hyponym relationship in the division algorithm.
In general, our results provide a supplement to human interest research, showing how these features affect the patterns of interest in the Q & communities and demonstrate the difference between the general public and scientific researchers in exploring knowledge. The current results would allow further expansion to uncover other interest behaviors in other communities as well as the relationships with different contribution types.
Iribarren, J. L. & Moro, E. Impact of human activity patterns on the dynamics of information diffusion. Phys. Rev. Lett. 103, 038702 (2009).
ADS  Article  Google Scholar
Kwan, M.-P. & Lee, J. Geovisualization of human activity patterns using 3d gis: A time-geographic approach. Spatial. Integer. Soc. Sci. 27, 721–744 (2004).
Google Scholar
Hasan, S., Zhan, X. & Ukkusuri, S. V. Understanding urban human activity and mobility patterns using large-scale location-based data from online social media. In Proceedings of the 2nd ACM SIGKDD international workshop on urban computing, pp. 1–8 (2013).
Fu, C. et al. A novel spatiotemporal behavior-enabled random walk strategy on online social platforms. IEEE Trans. Comput. Soc. Syst. 1–11 (2021).
Clauset, A., Larremore, D. B. & Sinatra, R. Data-driven predictions in the science of science. Science 355, 477–480 (2017).
ADS  CAS  Article  Google Scholar
Fortunato, S., Bergstrom, C. T., Börner, K. et al. Science of science. Science 359, eaao0185 (2018).
Toivonen, T. et al. Social media data for conservation science: A methodological overview. Biol. Conserv 233, 298–315 (2019).
Article  Google Scholar
Schmidt, A. L. et al. Anatomy of news consumption on Facebook. Proc. Natl. Acad. Sci. 114, 3035–3039 (2017).
ADS  CAS  Article  Google Scholar
Bromham, L., Dinnage, R. & Hua, X. Interdisciplinary research has consistently lower funding success. Nature 534, 684–687 (2016).
ADS  CAS  Article  Google Scholar
Zeng, A. et al. Increasing trend of scientists to switch between topics. Nat. Commun. 10, 1–11 (2019).
ADS  Article  Google Scholar
Bourdieu, P. The specificity of the scientific field and the social conditions of the progress of reason. Soc. Sci. Inf. 14, 19–47 (1975).
Article  Google Scholar
Foster, J. G., Rzhetsky, A. & Evans, J. A. Tradition and innovation in scientists’ research strategies. Am. Social. Rev. 80, 875–908 (2015).
Article  Google Scholar
Uzzi, B., Mukherjee, S., Stringer, M. & Jones, B. Atypical combinations and scientific impact. Science 342, 468–472 (2013).
ADS  CAS  Article  Google Scholar
Rzhetsky, A., Foster, J. G., Foster, I. T. & Evans, J. A. Choosing experiments to accelerate collective discovery. Proc. Natl. Acad. Sci. 112, 14569–14574 (2015).
ADS  CAS  Article  Google Scholar
Azoulay, P., Graff Zivin, J. S. & Manso, G. Incentives and creativity: Evidence from the academic life sciences. The RAND J. Econ. 42, 527–554 (2011).
Merton, R. K. Priorities in scientific discovery: A chapter in the sociology of science. Am. sociological review 22, 635–659 (1957).
Article  Google Scholar
Shape, D. The structure of scientific revolutions. Philos. Rev. 73, 383–394 (1964).
Article  Google Scholar
Wuchty, S., Jones, B. F. & Uzzi, B. The increasing dominance of teams in the production of knowledge. Science 316, 1036–1039 (2007).
ADS  CAS  Article  Google Scholar
Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology. Nature 566, 378–382 (2019).
ADS  CAS  Article  Google Scholar
Li, W., Aste, T., Cacciola, F. & Livan, G. Early coauthorship with top scientists predicts success in academic careers. Nat. Commun. 10, 1–9 (2019).
Article  Google Scholar
Liu, L., Dehmamy, N., Chown, J., Giles, C. L. & Wang, D. Understanding the onset of hot streaks across artistic, cultural, and scientific careers. arXiv preprint arXiv:2103.01256 (2021).
Larivière, V. et al. Bibliometrics: Global gender disparities in science. Nat. News 504, 211–221 (2013).
Article  Google Scholar
Ley, T. J. & Hamilton, B. H. The gender gap in NIH grant applications. Science 322, 1472–1474 (2008).
CAS  Article  Google Scholar
Franzoni, C., Scellato, G. & Stephan, P. The mover’s advantage: The superior performance of migrant scientists. Econ. Lett. 122, 89–93 (2014).
Article  Google Scholar
Sugimoto, C. R. et al. Scientists have the most impact when they’re free to move. Nat. News 550, 29–31 (2017).
CAS  Article  Google Scholar
Deville, P. et al. Career on the move: Geography, stratification, and scientific impact. Sci. Rep. 4, 4770 (2014).
CAS  Article  Google Scholar
Petersen, A. M., Fortunato, S., Pan, R. K. & other. Reputation and impact in academic careers. Proc. Natl. Acad. Sci 111, 15316–15321 (2014).
Liénard, J. F., Achakulvisut, T., Acuna, D. E. & David, S. V. Intellectual synthesis in mentorship determines success in academic careers. Nat. Commun. 9, 1–13 (2018).
Article  Google Scholar
Jia, T., Wang, D. & Szymanski, B. K. Quantifying patterns of research-interest evolution. Nat. Hum. Behav. 1, 1–7 (2017).
Article  Google Scholar
Merton, R. K. The Matthew effect in science: The reward and communication systems of science are considered. Science 159, 56–63 (1968).
ADS  CAS  Article  Google Scholar
Sinatra, R., Wang, D., Deville, P. et al. Quantifying the evolution of individual scientific impact. Science 354, aaf5239 (2016).
Liu, L. et al. Hot streaks in artistic, cultural, and scientific careers. Nature 559, 396–399 (2018).
ADS  CAS  Article  Google Scholar
Barua, A., Thomas, S. W. & Hassan, A. E. What are developers talking about? an analysis of topics and trends in a stack overflow. Empir. Softw. Eng. 19, 619–654 (2014).
Article  Google Scholar
Fu, C., Zheng, Y., Li, S., Xuan, Q. & Ruan, Z. Predicting the popularity of tags in StackExchange qa communities. In 2017 International Workshop on Complex Systems and Networks (IWCSN), 90–95 (IEEE, 2017).
Fu, C., Zheng, Y., Liu, Y., Xuan, Q. & Chen, G. Nes-to: Network embedding similarity-based transfer learning. IEEE Trans. Netw. Sci. Eng. 7, 1607–1618 (2019).
MathSciNet  Article  Google Scholar
Wang, X., Ran, Y. & Jia, T. Measuring similarity in co-occurrence data using ego-networks. Chaos: Interdiscip. J. Nonlinear Sci. 30, 013101 (2020).
Papoutsoglou, M., Kapitsaki, G. M. & Angelis, L. Modeling the effect of the badges gamification mechanism on personality traits of stack overflow users. Simul. Model. Pract. Theory 105, 102157 (2020).
Article  Google Scholar
Zhou, J., Wang, S., Bezemer, C.-P. & Hassan, A. E. Bounties on technical q & sites: A case study of stack overflow bounties. Empir. Softw. Eng. 25, 139–177 (2020).
Article  Google Scholar
Seaborn, K. & Fels, D. I. Gamification in theory and action: A survey. Int. J. Hum.-Comput. Stud. 74, 14–31 (2015).
Article  Google Scholar
Jin, J., Li, Y., Zhong, X. & Zhai, L. Why users contribute knowledge to online communities: An empirical study of an online social q &a community. Inf. Manag. 52, 840–849 (2015).
Article  Google Scholar
Gyongyi, Z., Koutrika, G., Pedersen, J. & Garcia-Molina, H. Questioning yahoo! answers (Tech. Rep, Stanford InfoLab, 2007).
Google Scholar
Asaduzzaman, M., Mashiyat, A. S., Roy, C. K. & Schneider, K. A. Answering questions about unanswered questions of stack overflow. In 2013 10th Working Conference on Mining Software Repositories (MSR), 97–100 (IEEE, 2013).
Ponzanelli, L., Mocci, A., Bacchelli, A., Lanza, M. & Fullerton, D. Improving low-quality stack overflow post detection. In 2014 IEEE international conference on software maintenance and evolution, 541–544 (IEEE, 2014).
Shah, C. & Pomerantz, J. Evaluating and predicting answer quality in community QA. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, 411–418 (2010).
Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105, 1118–1123 (2008).
ADS  CAS  Article  Google Scholar
Oettl, A. Honour the help. Nature 489, 496–497 (2012).
ADS  CAS  Article  Google Scholar
Dong, Y., Ma, H., Shen, Z. & Wang, K. A century of science: Globalization of scientific collaborations, citations, and innovations. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 1437–1446 (2017).
Wang, Y. Understanding the reputation differences between women and men on stack overflow. In 2018 25th Asia-Pacific Software Engineering Conference (APSEC), 436–444 (IEEE, 2018).
Calefato, F., Lanubile, F., Marasciulo, M. C. & Novielli, N. Mining successful answers in a stack overflow. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, 430–433 (IEEE, 2015).
Bosu, A. et al. Building reputation in StackOverflow: an empirical investigation. In 2013 10th Working Conference on Mining Software Repositories (MSR), 89–92 (IEEE, 2013).
Tan, Y., Wang, X. & Jia, T. From syntactic structure to semantic relationship: Hypernym extraction from definitions by recurrent neural networks using the part of speech information. In International Semantic Web Conference, 529–546 (Springer, 2020).
Download references
This work was supported by the National Natural Science Foundation of China under Grant (62103374) and Zhejiang Fundamental Public Welfare Research Project under Grant (LGF21G010003 and LGF20F020016).
Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou, 310023, China
Chenbo Fu, Xinchen Yue, Bin Shen & Shanqing Yu
College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
Chenbo Fu, Xinchen Yue, Bin Shen & Shanqing Yu
Computational Communication Research Center, Beijing Normal University, Zhuhai, 519087, China
Yong Min
School of Journalism and Communication, Beijing Normal University, Beijing, 100875, China
Yong Min
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
C.F. and X.Y. wrote the main manuscript text. C.F., S.Y., and Y.M. conceived the experiments. C.F., B.S., and X.Y. conducted the experiments. All authors analyzed the results and reviewed the manuscript.
Correspondence to Chenbo Fu.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
Reprints and Permissions
Fu, C., Yue, X., Shen, B. et al. Patterns of interest change in a stack overflow. Sci Rep 12, 11466 (2022). https://doi.org/10.1038/s41598-022-15724-3
Download citation
Received: 12 December 2021
Accepted: 28 June 2022
Published: 06 July 2022
DOI: https://doi.org/10.1038/s41598-022-15724-3
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.
Advertisement
Advanced search
Scientific Reports (Sci Rep) ISSN 2045-2322 (online)
© 2022 Springer Nature Limited
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Leave a Reply

Your email address will not be published.