Computational methods for detecting textual similarity provide powerful tools for exploring linguistic patterns, formulaic language, and textual transmission in historical corpora. However, in
Ancient Greek studies, these approaches have mostly been tested on small datasets or employed
in targeted search tasks. In this paper, we investigate what insights emerge when similarity measures are applied across a large and diverse corpus of Greek texts spanning multiple periods and
genres. Our approach is fully unsupervised and does not rely on prior assumptions or predefined
queries. We make use of well-established approaches, applying MinHash with locality-sensitive
hashing (LSH) to identify repeated orthographic patterns and transformer-based embeddings to
capture semantic relationships across texts. We first explore our approaches on the Database of
Byzantine Book Epigrams (DBBE), a curated dataset with verse- and epigram-level similarity
clusters. Its relatively compact and structured nature makes it an ideal testbed for probing the
behavior of the similarity algorithms. We then scale up to a larger, more heterogeneous corpus
of Greek texts spanning roughly 400 BC to 1500 AD. Applying MinHash-LSH reveals repeated
formulae across textual traditions, while clustering transformer-based embeddings uncovers conceptual and thematic relationships, highlighting recurring motifs and ideas despite orthographic
variation. Our findings demonstrate how unsupervised methods suited to high-volume data can
uncover structures and relationships that targeted, query-based studies may overlook.