Just the Right Touch: How Synology Photos Stacks Your Similar Photos, Letting Algorithms Help Organize Your Collection

To capture your child’s brightest smile, you might press the shutter ten times in a row. To achieve the perfect sunset tone, you may save five filtered versions. While modern photography allows us to record beautiful moments, it also makes our photo albums cluttered and hard to manage.

This scenario is familiar to us all. But an interesting question follows: when we talk about “organizing similar photos,” what exactly defines “similar”?

Is it a series of sunset shots from the same scene with slightly different sun positions? Is it the same composition with one person added or removed? Or is it multiple versions of the same photo after being cropped and edited with different software? In different contexts, everyone’s judgment criteria may vary significantly.

So how can a fixed algorithm respond to such flexible, even somewhat subjective human organizing logic? Today, we’ll take you behind the scenes to examine the trade-offs between technology and user experience that we encountered when developing the “Stack Similar Items” feature in Synology Photos.

Trade-off One: How to Calibrate the Algorithm’s Standard for “Similarity”?

At the beginning of development, the first philosophical question our team faced was: what should be the relationship between algorithm-defined “similarity” and “similarity” in human eyes?

We discovered that human judgment is extremely flexible. When organizing photos from a child’s birthday party, we might consider several sequential shots with slightly different expressions as “similar.” But when selecting a landscape photo to print, even minor differences in cloud positions might be enough for us to view them as two separate works.

An algorithm cannot fully replicate this complex human mindset. It needs an objective, consistent standard. Therefore, before teaching the computer how to “understand” similar photos, we must first teach it how to “grasp the essentials.”

We introduced the PDQ (Perceptual Hash) algorithm. This process is remarkably clever—it automatically ignores unimportant “surface details” in photos, such as whether you saved the image as PNG or JPG, whether the resolution is high or low, or if there’s some digital noise. The algorithm focuses only on capturing the core outlines, lines, and light-dark distribution of the entire photo, quickly creating a unique “soul sketch” for each of your photos. This “algorithmic artist’s” sketching process generally consists of several steps:

Focus on the Essence: First, the algorithm converts color photos to grayscale. This is like an artist ignoring colors when drafting, focusing instead on capturing the subject’s outline, lights, and shadows. This step ensures we focus on the visual structures that the human eye is most sensitive to, rather than being distracted by variable colors.
Unify Perspective: Next, regardless of whether your photo is a portrait shot on a phone or a landscape from a camera, the algorithm scales them all to a fixed small size. This is like an artist placing all reference photos on the same canvas for comparison, ensuring that the basis for comparison is consistent and won’t be misjudged due to differences in original dimensions. The challenge here is finding the “optimal resolution.” If scaled too small, although calculation speed is extremely fast, too many image details will be lost, causing clearly different photos to be misidentified as similar. Therefore, we adopted a relatively larger standard size that both retains enough key features and maintains efficient computation, achieving the best balance between accuracy and performance.
Deconstruction and Extraction: This is the most critical step. The algorithm uses a mathematical tool called “Discrete Cosine Transform (DCT)” to break down the image into the most basic “visual textures” and “layout patterns.” You can imagine it as an artist breaking down a person’s facial features into basic geometric shapes and analyzing their relative positions. This process effectively extracts the most important visual features of the image.
Generate Fingerprints and Quality Scores: Finally, the algorithm generates a code consisting of 256 zeros and ones based on these extracted core features. This is the unique “soul sketch” of this photo—its visual fingerprint. More interestingly, the PDQ algorithm also gives this photo a quality score, evaluating its clarity. It’s like an artist noting after completing a sketch: “This reference photo is a bit blurry.” This score helps us determine if this “sketch” is clear enough for meaningful comparison.

With the “soul sketch” as a basis for judgment, the next key question is: how “similar” must two sketches be to consider them part of the same group? This is a threshold setting issue that has been the core of countless debates and tests within our team.

Technically, we use a method called “Hamming Distance” to quantify the differences between two “sketches.” You can imagine that each “sketch” consists of 256 key feature points. We stack two sketches together, compare these 256 feature points one by one, and then count how many points are different. This “number of differences” is the Hamming distance between them.

A Hamming distance of 0 to 15 means he ttwo sketches are identical, almost certainly copies of the same photo.
A small Hamming distance (e.g., between 16 and 40) typically indicates that the photos have only undergone minor cropping, scaling, or brightness adjustments, with highly similar visual content.
A larger Hamming distance (e.g., between 41 and 80) means that although the two photos may be thematically related, they already show significant differences in composition, people, or scenery.

If the threshold is too strict, even slight adjustments to brightness or cropping would cause the algorithm to consider them as two different photos, defeating the purpose of automatic organization. If the threshold is too loose, photos with similar compositions but vastly different subjects (e.g., a beach during the day, a snowy landscape at night) might be incorrectly stacked together, creating even greater organizational confusion.

After countless tests and discussions, we established a core design principle: better to stack more than to miss. We chose a relatively loose but rigorously verified threshold.

Our goal is to ensure that similar photos found by the algorithm have “a high probability of being related” in the user’s eyes. Users might think, “Although these look alike, I’d prefer to view them separately,” but rarely would they think, “These two photos have nothing to do with each other; why are they stacked together?”

The thinking behind this decision is simple: the time required for users to manually “remove” a few photos from an already organized collection is far less than that needed to manually “find and merge” photos scattered throughout their library. This is also why we currently only provide “Unstack” and “Remove from stack” options, and haven’t enabled users to manually create stacks. We position the algorithm as an efficient “screening assistant,” not a “boss” making final decisions for you. It completes 80% of the tedious work, while preserving that final 20% of organizational pleasure filled with personal preferences for you.

Trade-off Two: How to Efficiently Complete Comparisons in Massive Data?

After defining “similarity,” the next challenge is engineering: how to efficiently complete comparison work in a digital library that may store millions of photos without slowing down the entire NAS?

If every newly uploaded photo were compared against all photos in the database using “visual fingerprints,” it would be a computational disaster. For this reason, we designed a multi-level task allocation and improvement mechanism.

First, we added limitations to the algorithm’s “vision.” We observed that the vast majority of similar photos (such as burst shots and edited versions) are created at close time points. Therefore, we stipulated that whether it’s a newly uploaded photo or a system background scheduled comparison task, it will only be compared with photos taken within a 24-hour window (±12 hours). This “time window” design narrows the comparison range by several orders of magnitude, avoiding unnecessary performance waste from the start.

Second, we fully utilize every bit of computing power in the ecosystem. On the mobile app side, when you upload photos via your phone, we leverage your phone’s powerful processor to pre-calculate the PDQ sketch locally, then upload it to the NAS along with the photo. This means that the most computationally intensive step is handled by your phone, and the NAS only needs to focus on receiving data and performing fingerprint database comparisons.

Finally, we seamlessly integrated the backend comparison and stacking tasks into the DSM core system scheduler. This design makes Synology Photos’ comparison tasks part of the entire system’s intelligent resource scheduling rather than operating in isolation. It ensures that similarity calculations don’t frequently wake up hibernating hard drives, maintaining your NAS’s energy efficiency.

In this way, it coordinates with background tasks from other packages (such as Download Station’s download schedules or Hyper Backup’s backup schedules). This means that Photos’ organization work is scheduled at the most appropriate time, avoiding multiple background tasks competing for system resources simultaneously and enhancing the overall stability and predictability of your NAS.

Conclusion: Smart Assistance, You Decide

The “Stack Similar Photos” feature in Synology Photos was built around a core idea: to make technology your capable assistant, not a replacement for your decision-making.

We’ve invested substantial resources in optimizing the algorithm’s accuracy, improving the efficiency of large-scale comparisons, and ensuring the entire process doesn’t burden your NAS performance. At the same time, we’ve given you maximum freedom—from setting stack covers to one-click unstacking to removing any photo from a stack.

We believe that the best automation completes the tedious preliminary work for users, then returns the fun of final organization to them. We hope that as you enjoy a tidy timeline, you can also feel this “just the right touch” that we’ve carefully designed for you.