Program evaluation often implicitly aims to be informative about an intervention’s effects if rolled out in the real world, independent of an impact evaluation. This chapter highlights generalizability and scalability as two characteristics of evidence that are relevant for these “real-world” applications of interventions and provides concrete guidance to researchers to improve both. Generalizability can be maximized by clearly specifying the real-world scenario researchers hope to learn about as part of the study design and then randomly sampling participants, program providers, and program sites from their real-world counterparts. Assessing scalability is more subtle, but we describe two experimental designs that make progress. We conclude by describing two non-experimental research designs that can be used to directly evaluate programs that are implemented independent of an impact evaluation.