The Commons Machinery project is taking the next step towards bringing order into the arcane world of media files metadata on the web. With Elog.io, they aim to unclutter the chaos somewhat and help credit creators. It’s an uphill battle, but the good news is: you can help.
After all these years preserving metadata in media files and properly crediting authors is still a major issue on the Internet. Image editing software could be smarter, and popular social networks could stop wiping credits and descriptions from pictures (let along claim they own them). It’s a huge mess, and we are in it up to our necks.
The Commons Machinery project was launched in 2013 to tackle the problem of fixing this unhealthy state of affairs. The project was backed by the Shuttleworth Foundation early on.
By the end of 2013, they had enough proof-of-concept code to make an impression that this bold enterprise was for real, and the suggested solution was, in fact, just about doable. At least for public domain content and the content available under any of Creative Commons licenses.
At the time they had rough, yet working preliminary patches for Inkscape, GIMP, and Aloha HTML editor, as well as extensions for Mozilla Firefox and LibreOffice.
However, eventually they quit this leg of the race and refocused on Elog.io, a web service that takes “fingerprints” off knowingly public domain and CC-licensed works of art and matches user-submitted media files against its database of such fingerprints.
Elog.io will help you, should your fancies take you there, to discover that a terrifying painting of a handsome young woman cutting the head off a bearded man at the advice of some old scoundrel is, in fact, “Judith Beheading Holofernes” by Caravaggio.
As of November 1024, Elog.io had taken digital fingerprints off ca. 23 million photographs from Wikimedia Commons. That is to say, you should not expect the service to help you discovering the true origin of a photo you saw on Facebook where a humorous cat is taking a teddy bear to the cleaners. At least not just yet. But old art, photos from public archives etc. are definitely in there already.
With extensions for Google Chrome and Mozilla Firefox to help you launch the matching of images in your brower window against Elog.io’s database, the team, now comprised of Jonas Öberg (Sweden) and Artem Popov (Russia), is certain they can take it to the next level. Hence the newly launched Indiegogo campaign.
The team is asking for $6,000 in return for 4 months of development and getting 18 million additional photographs under open licenses or in the public domain into Elog.io.
In the best traditions of talking rather than doing, LGW spoke to Jonas Öberg, CEO of Commons Machinery.
At some point in the past you stopped working on your libraries for end-user software to aid crediting collaborators, abandoned your patches for Inkscape and GIMP, and completely switched to work on Elog.io. How do you currently prioritize projects that are part of Commons Machinery?
The prototyping we did for Inkscape and GIMP had one particular goal in mind: to figure out, how much work it would be to implement support for persistent metadata, and how to get it in & out of the applications.
We sidelined those projects, when it became clear that (1) it’s a hell of a lot of work, and (2) it’s even more work, when you consider that you also need to engage the community in this work to get any changes into the core.
It’s not so much that it takes time (it does that as well, but not only), but that it’s an expensive context switch. Two persons working four hours a week, consistently, on Inkscape and GIMP, respectively, could accomplish much more over time than one person working on both Inkscape and GIMP at the same time.
What has to be done to resume the work on the projects that was put aside?
From our side, we’d love if someone would take the ownership of what we started by simply forking our code on Github and continuing development within each respective community. We’ll happily give what support we can to that person, and that person should ideally come from within each respective community.
The other options is that we eventually come back to this and start working on it again. Two things need to happen for that to work:
Either we find funding to do that particular work. But if someone is willing to fund that work, it’s better that they get someone from the Inkscape/GIMP communities than to pay us to do it.
OR, we manage well enough with Elog.io that we eventually get to the point of having tighter integration between Elog.io and applications like Inkscape/GIMP, at which point this would become relevant in part again.
That’s so far into the future though that we haven’t even begun thinking about when that could be.
What’s the status of libraries that you developed for end-user content authoring software to simplify development of metadata preservation and crediting of collaborators — libcredit and others?
I do believe they’re useful, but they’re not hooked into the Elog.io infrastructure idea. Their use would likely be in ensuring that end-user software supports and can manage with metadata at all, which is a good precondition for later hooking it into Elog.io, of course.
(Artem Popov chimes in to clarify…)
Librecontext is the new name for libremix. It works with RDF metadata (as libcredit does too), but for the catalog we switched from RDF to MediaAnnotations. So, yeah, while they’re probably useful, they’re not related to our current work on Elog.io and not useful for integrating elog.io with desktop software.
Libgetmetadata was originally written to collect metadata “on the fly”, i.e. work directly in a browser, but has been rewritten to work server-side and is seldom useful outside of very specific kind of web applications. Best way to work with the catalog now is the API.
Let’s talk about Elog.io. I’m getting an impression that Elog.io focuses on what seems achievable — helping people who have good intentions crediting other people’s work, but also on something that’s more difficult to do — finding the origin of a picture you stumble upon. This suggests fighting against, at the very least, two gorillas.
The first one is discoverability, that is, immediate availability of Chrome and Firefox extensions by default — there’s a high risk that Elog.io may not become a mainstream tool, unless it’s shipped by default and is, generally, just there. How are you going to deal with this?
You’re right, and my hope is that we’ll see not only our own plugins (which I guess can be seen more as a demonstration of what can be done) but also plugins and implementations into the main trunks of other software directly.
For instance, it doesn’t seem as if it would be a lot of work to implement support in GIMP such that when you load an image, its metadata is looked up in Elog.io and offered as default metadata within GIMP.
Oh-kay. The second creepy anthropoid would be modifications with all kinds of intent. Right now the FAQ states that the Blockhash algorithm you picked doesn’t work very well with modifications, and that you’ve deliberately set the bar at verbatim copying.
However it’s a fair use to take a CC-BY-SA licensed image and modify it to make it fit e.g. dimensions of a featured image in a blog post.
And then there’s malicious intent. As you know, technology is often abused. A common tactic to create an image to aid as a proof for e.g. fake news is to mirror the original image and then slightly retouch it, so that it doesn’t look doctored and yet cannot be used for Google image search. But that means you’d start dabbling with forensics one way or another. So, is setting the bar at verbatim copying a temporary technical decision, or a permanent ideological one, or something completely different?
It’s mostly technical, and a matter of solving more easily solvable problems first.
There are, of course, algorithms that can match images more broadly, including finding derivative works. What happens though is that as you match more derivative works, you also match more images that aren’t actually derivatives or copies of the original (false positives). For instance, a photo of a church matching another church (or even the same church, but taken by a different user).
The use case we started with requires us to be more authoritative. When we match an image, we want to be reasonably certain that we’re showing a true match. It’s less of a problem for us, if we miss a few matches that we could have made, as long as we don’t present potentially false information.
That said though: I think you could see this as a ideological stance too. We don’t believe in policing derivative works. As you say, there will always be ways around it such that algorithms can’t detect it. We don’t want that kind of arms-race. Instead, we think the focus should be on those who do want to do good.
For such cases, we can also do a lot more that doesn’t involve algorithms! Such as implementing application support that automatically registers derivative works in Elog.io and sets the right source works. That would be much better than any algorithm!
Wouldn’t that count as policing users instead of policing derivative works though?
We’re trying to do neither, of course. If we really wanted to police users, we’d implement tight restrictions in Elog.io regarding what you can and can not do, or enforce certain actions. That’s not something we’re keen to do.
We see the information in Elog.io and our presentation of it more as an informational signpost on the road, and a helpful guide as you work. An application might ask and offer the user to register their derivative work, but it wouldn’t do so automatically without consent.
Another application may tell the user that they’re about to alter a non-derivative licensed work, but it won’t disable editing functionality just because of it.
Did you discuss your technology with largest publishers of CC-licensed content such as Flickr, Soundcloud, Vimeo, YouTube etc.?
Flickr, yes. They’re very happy about what we’re doing and they don’t mind us using their research data to start with to load information from Flickr into our database. After that, we’ll need to hit their API a bit more, but this also doesn’t seem to be a concern for them. We’ve poked them a bit about getting more/easier access, and while they can do that, it’s not their top development priority yet.
What about the largest publishers of proprietary content such as Facebook, Pinterest, Instagram and others (admittedly, YouTube et al. too), some of which even drop metadata entirely? Is there some light at the end of the tunnel?
Yes and no. It depends on what your goal is. We’ve spoken to PicScout which holds a lot of proprietary content from Getty Images and many others. They don’t have a problem as such sharing information with us so we could match images from their catalogs too. The problem is that they have a legal department: they think that just by virtue of having a “Copy” button, Elog.io would indirectly support fraudulent use of images from them.
So they would like to see Elog.io disable the copy button for works that are matched against their database. That’s something that could perhaps be done, but it’s a slippery slope, because the next issue is that, as we discovered last we peeked in their database, that their catalogs also contain some public domain works which some copyright registry claims is owned by someone. So we can’t fully trust what’s in their catalog without risking excluding some public domain works.
It’s possible we can negotiate something around this, such as making sure that if a work is matched in both Wikimedia Commons and PicScout, we give preference to the Commons image, but they aren’t too happy about that either, since they feel it might lead to people uploading “their” images to Wikimedia Commons under a false license.
There’s a lot of negativity regarding YouTube’s policy of blocking content because of alleged reuse of copyrighted material by the users. Do you think Elog.io could become an additional tool to improve the matching of soundtracks against the database that specifically covers a large body of public domain and otherwise free to use works of art?
The Sintel case on YouTube does come to mind, indeed. What would happen in our world is that the Blender Foundation or a member of the community would have added Sintel to Elog.io (or their own Elog.io catalog installation; we’ve built with a distributed design in mind). YouTube, upon receiving a notice of infringement, would look up that same work in the Elog.io catalog, and if there’s a match with conflicting information from the infringement notice, it would keep the video up and trigger a flag in YouTube’s system that a person, rather than a computer, needs to evaluate this conflict and determine what to do.
At the moment, content holders are completely trusting towards information given to them by content creators like Sony. Having an open catalog of information about what digital works are in the public domain or openly licensed would be useful for content holders to have something to push back against the content creators with. Whether that would happen in practice remains to be seen, but we continue to be hopeful.
Let’s say, we are now in 2016. There are hundreds of millions items in Elog.io database. Your next step?
Read/write. I’d hope we’d get there sooner than 2016, but I’m sure reality would get in the way. This is where we make it possible to accept and curate user-contributed information to Elog.io, both for your own personal use and to help curate more and higher quality metadata about works being shared and used.
It’s hard not to notice that Commons Machinery is only two people now which, presumably, contributed to lower “appetites” with regards to requested funding (as compared to your first campaign). What happened?
Partly money: we did the basic research & development with the support from the Shuttleworth Foundation, as you know. They supported this for a period of two years, but decided that now that we’ve shown that the technology work and it’s just a matter of scaling up and continuing development, they don’t learn anything new from continuing to support us.
So we’re slowly learning to stand on our own now. If we’re successful with the IndieGoGo campaign, I think that sets us on a good path forward.