ABSTRACT
This chapter recasts design and data collection for AI-rich realities. It covers conversational agents and web crawlers for responsive, multi-language surveys; computer-vision-/ASR-enabled observation (video and audio) for fine-grained behavioral capture; and automated preprocessing (outlier detection, denoising, and synthetic data augmentation) to fortify analysis. We distinguish when synthetic data, agent-based simulations, and “virtual pilots” are valid surrogates and provide checklists for provenance, versioning, and findable, accessible, interoperable, and reusable documentation. Validity threats – coverage bias, platform skew, and annotation error – and ethical constraints – privacy/consent in digital spaces – are paired with mitigation tactics (reweighting, secure enclaves, and differential privacy). The goal is a design mindset that blends automation with methodological rigor and ethical compliance.
