The development of information and communication technologies has produced massive human sensing data sets, such as point of interest, mobile phone data, and social media data sets. These data sets provide alternative human perceptions of urban spaces; therefore, they have become effective supplements for remote sensing tasks. This letter presents an exploratory framework to examine the scale effect of fusing remote sensing and human sensing. The physical and social semantics are extracted from raw remote sensing images and human sensing data, respectively. A dynamic weighting strategy is developed to explore the fusion of remote sensing and human sensing. Taking urban function inference as an example, the scale effect is evaluated by weighting remote sensing and human sensing. The experiment demonstrates that fusing remote sensing and human sensing enables us to recognize multiple types of urban functions. Meanwhile, the results are significantly affected by the scale.