Discussion Forums > Technology

Danbooru Image tagger/renamer << NEED HELP!!!

<< < (5/9) > >>

Kyrdua:
wee, i've tagged 70% of my gallery now. this thing works like a charm!



i haven't tried tagging with XP though.

UPDATE:  tagged with XP now, works perfectly!

Jorin:
That looks awesome. I've been looking for something like this for a while, especially since I want to keep artist metadata in the files. I gave it a try, but I got this output:


--- Code: ---
D:\Download>python sheska.py 85c0fc90513b9a2ff6ccbd4f314eb329.jpg
Processing 85c0fc90513b9a2ff6ccbd4f314eb329.jpg
Checking for original hash...  hashing... Warning: Tag 'md5sum' does not exist
Nothing to do.
 ok
Fetching tags from the internets...
Retrying with hash implied from filename...
Warning: Tag 'md5sum' does not exist
Nothing to do.
FILE_NOT_FOUND
--- End code ---

The weird thing is I downloaded this image right off Danbooru, so the md5 should be ok. Other images I've tested have pulled maybe 1 or 2 tags, but not all of them. I used the updated script with the tag and Exiftool is in my path, so I'm not sure what's missing.

Kyrdua:
sheska uses sankaku to search for tags by default. you have to change the json url to this:

unparsedJson = urlopen('http://danbooru.donmai.us/post/index.json?tags=md5%3A' + image.hash)

it got "file not found" because some pics that are in sankaku are not in danbooru and vice versa.

ph4zr:
Hm... it's not necro if I post after someone else bumped it, is it? Also, I apologize in advance for this huge friggin' wall of text.

Re: alt-text as file name (old): The problem with that is there seems to be a 255* character limit on file names. The total character length of all tags, combined with spacing, can easily reach well beyond that on over well tagged images. If you're going to go that route, it's also a pain to do it for each individual image if you're more than just a casual browser (i.e. if you batch download), so you'd want to automate the process somehow. At that point, you might as well just put the tags into the image itself, if it's supported, or store them locally in some kind of custom database.

(click to show/hide)The exact number I hear varies, anywhere from 249(?) to 260, as well as there being an artificial X character limit on full path name using most programs to name files. The latter can be circumvented, although it's a pain if you're moving things using the default Windows copy handler, but I'm not sure about file names specifically. I also don't know if it's -just- a Windows limitation or a file system limitation.

--- Quote from: Xiong Chiamiov on February 04, 2011, 08:01:55 PM ---
--- Quote ---and i think a photo organizer for imageboards is a great idea, definitely would be great if you'll make one.

--- End quote ---
I'll post here if I start working on it.

--- End quote ---
If you do that (or if you open a new thread for the project), please do make a note of it in this post so I see it in my updated topics. ...'twould be most enlightening (and useful). XD
(click to show/hide)Also, if you ever happen to implement an open-source, full, standalone booru browser that supports downloading and tagging images, as well as filtering off images you already own, that'd be awesome! -hint hint- For boorus that support the API, it'd probably mostly be an issue of designing a functional GUI, since any code you need for pulling images and tag data could just be ported into a module and re-used willy-nilly. Boorus conveniently have a unique post-id for every image uploaded, so once you match those against your own images, existence is easy enough to verify for a given collection. That's just me being greedy (and lazy), though.

Heck, if you felt really ambitious, you could even have the browser pull the images from your local store rather than retrieving them remotely, for an ever so slight performance boost. But there are a lot of features I'd love to see in such a browser. That's just the tip of the iceberg.

The farthest I've gotten is extracting tag data for the files as saved by DD, and extracting tag data from Picasa .ini files, cross-checking file names between what I have and what I originally downloaded, looking for a match (as well as unmatched files). At some point I toyed with a perl script for parsing a file and adding tags to an image, or going in reverse and checking for existing tags, but I have since lost that script, so it's as if I never did it to begin with. Matching the pieces together is mostly an issue of modularizing the code and committing to a contract for each piece of functionality.

What I haven't actually tried is importing tag data into the images from the DB, or parsing file names for the original md5 hash (where it exists). The former because I've been using PHP for everything, the latter because my naming conventions have changed over time, and finding the hash is thus non-trivial at this point. In any case, I lost my previous image collection, so for now I've just been trying to rebuild it rather than (re-)tag everything or make sure I don't download duplicates.

Besides that, any GUI I touch turns into utter trash, so the best I could ever do is a PHP based web interface, or a really clunky and bloated Java one. PHP obviously requires the user has it installed (if you want to manage your own images), and clunky and bloated is... obviously worthless. The infrastructure is another story, and I already have PHP and a server installed for development, so for me that's a non-issue (beyond actually creating it, oy). Such a project would obviously never be fit to go live for personal use, though.
As I got sidetracked into mentioning in the previous spoiler... I've been toying with the idea of a photo retagger or tag manager to work w/ the Danbooru Downloader's** SQLite data. I didn't want to start the project during the semester b/c it'd suck me into its world and I'd never do my HW, but this summer I'd like to do something about my ~10GB collection of (untagged) images. If you actually started such a project, the code could potentially be re-purposed or modified to both scan over images for updated tags and check for updated posts to specific tag sets. Useful for finally finding out who that "character-request" actually belonged to, for example.

My project would mostly be focused on applying data I already have, whereas yours would be/is/seems to be mostly about getting data the user didn't grab in the first place (i.e., what you want to do is more complicated than just parsing text and importing modules)*. XD
*Outsourcing, FTW!

(click to show/hide)If you haven't used it before, Danbooru Downloader is a Firefox addon for downloading from boorus. It stores all of the tagging information it recognizes from the site in a SQLite DB, making it a perfect resource for tag harvesting (well tagged images, anyway). So far the only problem I've had with it is that it doesn't detect ratings properly on gelbooru, making browsing images in mixed company potentially... interesting, to say the least. Well, that and having to remember to reset the "always show original image" flag on gelbooru after clearing my cache/cookies, as the "download full-size/original image" functionality apparently doesn't work properly there either. Instead it downloads the samples until I reload the page w/ the flag properly set.

Related to the "storing tags in file names" idea, it can also be configured to attempt to name the file with as many tags as will fit. I mostly find it useful for keeping tabs on characters and artists, and sorting into folders by copyright. Actually saving the tags can get rather messy, with only a partial match on anything too far down the tag list. Configuration for (limited) whitelisting and blacklisting, as well as special naming conventions for whitelisted tags, is also available.

One other "quirk" it has is that it stores tags in a single field, separated by commas. I'm not sure what the official specification for tag data is, but preserving any tags with commas in them requires you pay attention when you extract them. So far I've never found a tag starting or ending with a space, so I usually just look for that.
But I'm not so sure about pulling tag data directly from boorus for existing images that aren't already associated with a post (on the client side)—i.e. any file a user downloaded manually and renamed. Without the exact hash value you'd have a hell of a time figuring out what's what on a one-by-one basis. You could use iqdb to find the original image pretty easily for an individual, unmodified image—preferably sized down a bit for upload—but does iqdb have an API you can use to do it via a script? Managing any more than a few dozen images this way would be painstakingly slow.

Python, hm? Never got past "Hello world" with it, although you hear great things. I was planning on using perl for the exiftool package, and php for dealing with the SQLite DBs, dumping tag data into a more friendly format for parsing with perl, or just perl if I export to a csv format first. How do you handle tagging with python? It looks like a system call to the standalone binary version of exiftool, but I'm not familiar enough w/ python to rule out something else.

If you actually want -all- of the tags, and not just ones you whitelist (or don't blacklist), it probably wouldn't be too difficult to just have perl connect to said downloader DB and tag any supported images you downloaded via DD using the exiftool module. I'm a bit pickier about which tags I accept, though. Some I have no use for.

(click to show/hide)PNG and GIF files are problematic, but Picasa's .ini files are simply formatted enough that adding -known- tag information is a fairly trivial task, provided you have a way of locating the relevant file. All the data is stored without any pathing information, so it's easy enough to find the relevant entry once you're in the right directory. If you want to keep any tags you already manually entered, it might be slightly more complicated, but not significantly so.

Mostly the issue is maintaining any special information Picasa put in there itself, but you can probably safely ignore those lines and copy them verbatim into the new .ini file. Doing it efficiently for crowded directories might be less straightforward. Going image by image could potentially result in thousands of reparses and edits to the same file if you don't group images by directory when you update tagging information.

Also (IMO) non-trivial is merging directories or moving files to new directories automatically. Due to the way Picasa stores information in .ini files, you would you to parse the relevant .ini files, updating them to add or remove entries from the list as you add or remove files from the directory, respectively. So far Picasa doesn't seem smart enough to detect that a file has moved and update the .ini files, and for some reason I've never had luck telling it to put all images from one directory into another without jumping through various hoops.

For other programs I'm not sure, since it would largely depend on how they handle tag data that can't be imported into the image itself.
(click to show/hide)On a related note, I remember coming across a batch downloader for booru sites on one of the imageboard forums. It was written in perl, but horribly commented (i.e., not commented at all). The code itself looked simple enough that if you were familiar w/ HTTP requests in perl you could probably make a bit of sense of it. I've never used them, though (or OOP in perl, for that matter), so I got lost fairly quickly trying to add comments for eventual modification.

If I find it again I'll link to it if there's any desire. If I can't I probably have the code here somewhere, with whatever comments I added to it. The poster did note that it was broken for some sites, although I'm not sure which ones in particular. Probably gelbooru, for one... =/

Jorin:

--- Quote from: ph4zr on April 26, 2011, 09:01:00 AM ---Hm... it's not necro if I post after someone else bumped it, is it?

--- End quote ---

Sorry if I was off base for bumping the thread. It was only on the second page and seemed reasonably new, so I figured I'd give it a shot.

Kyrdua, replacing that URL like you showed fixed the problem. It looks like I had confused something in the script previously, because there were two json url's. I also needed to replace instances of xmp:TagsList with xmp:Subject in order to get Picasa to properly read the tags. It works awesome now (except Picasa displays the tags in reverse-alphabetical order for some reason...).

Thanks a lot for the help! :) This will make it so much easier to manage my thousands of images. Any suggestions on how to feed multiple filenames into the command line aside from typing them manually? * wildcards seem to cause a problem.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version