With the rapid expansion of social media users and the ever-increasing data exchange between them, the era of big data has arrived. Integration of big data generates enormous benefits, making it a hotspot for research. However, big data demonstrates the heterogeneity brought on by multiple data sources. Big data integration is constrained by multi-source heterogeneous data. Moreover, the rise in the volume of social media data is affecting the efficiency of data integration. This study is concerned with developing a novel framework for data integration system that can manage the heterogeneity of massive social media data. The framework is comprised of four layers data source layer, application layer, resource layer, and visualization layer. The framework establishes correlations between data stored in distributed data sources.