Abstract The most common mass transit modes in metropolitan cities include buses, subways, and taxicabs, each of which contribute to an interconnected complex network that delivers urban dwellers to their destinations. Understanding the intertwined usages of these three transit modes at different places and time allows for better sensing of urban mobility and the built environment. In this article, we leverage a comprehensive data collection of bus, metro, and taxicab ridership from Shenzhen, China to unveil the spatio-temporal interplay between different mass transit modes. To achieve this goal, we develop a novel spectral clustering framework that imposes spatio-temporal similarities between mass transit mode usage in urban space and differentiates urban spaces associated with distinct ridership patterns of mass transit modes. Five resulting categories of urban spaces are identified and interpreted with auxiliary knowledge of the city’s metro network and land-use functionality. In general, different categorized urban spaces are associated with different accessibility levels (such as high-, medium-, and low-ranked) and different urban functionalities (such as residential, commercial, leisure-dominant, and home–work balanced). The results indicate that different mass transit modes cooperate or compete based on demographic and socioeconomic attributes of the underlying urban environments. Our proposed analytical framework provides a novel and effective way to explore the mass transit system and the functional heterogeneity in cities. It demonstrates great potential for assisting policymakers and municipal managers in optimizing public transportation facility allocation and city-wide daily commuting distribution.