Discussion Forums > Technology

Danbooru Image tagger/renamer << NEED HELP!!!

<< < (7/9) > >>

ph4zr:

--- Quote from: Tiffanys on April 30, 2011, 07:12:54 AM ---I use Tagbooru to download whole gobs of junk: http://code.google.com/p/tagbooru/

--- End quote ---
I tend to shy away from batch downloaders, just because I like to see what I'm putting in my library. There are some images I'd just as soon not add, that would be incredibly hard to find if I batched all of Touhou... as I'm sure I'd do first thing. ;D

(click to show/hide)
--- Quote from: Jorin on April 30, 2011, 06:06:17 AM ---I didn't actually know there was a standard way to store tag types/prefixes.

--- End quote ---
Well, what I was talking about is just the convention the boorus seem to use to determine tag types. "copy:*" is just how they tell you to tag images when you upload or edit images. If the booru just has a separate table where it stores tags of a certain type, they might not be flagged or prefixed in any special way when you request them via API.

One reason for separate tables would be that it's likely more efficient to search copyrights if they're all in one table than it is if you're doing pattern matching on literally thousands of tags.


--- Quote from: Jorin on April 30, 2011, 06:06:17 AM ---I noticed that DD didn't pull the tags when I tried to use it. The subject field is empty when I read it with Exiftool. Should it be tagging the images, or just only downloading them?

--- End quote ---
DD doesn't do image tagging. I'm not sure if Firefox extensions can be given file modification privileges, so it might not even be possible to tag them as you download them. Then there's the issue of finding a Javascript module that understands image tagging in the first place.


--- Quote from: Jorin on April 30, 2011, 06:06:17 AM ---It would be incredibly useful to link the functionality of DD and the Sheska script (or just Exiftool if that's possible) if the latter situation is the case, that way you'd get images downloaded into organized folders, and also searchable with the internal tags in an image viewer.

--- End quote ---
You could probably write a script to grab the tags from DD's SQLite DB and pass them to Exiftool. If you're saving the md5 hash in the filename, Sheska should be able to successfully look up the file information. Alternatively, DD stores hash and site information in its SQLite DB, so you could look it up there, too. At that point you're already interacting with the DB, though, so it'd probably be easier to use Exiftool.

Unless of course you need updated tags. Poorly tagged or less recently downloaded images would be good candidates for a re-lookup.

--- Quote from: Jorin on April 30, 2011, 06:06:17 AM ---I noticed DD has an option to control which sites trigger it, and that information ultimately directs images to different local folders. If it's possible to run Sheska or a Sheska-ish thing on an image automatically after it downloads with DD, it should also be possible to slipstream the correct json url for whichever site the file came from, that way the tags could be included automatically from their source as well.

--- End quote ---
You'd have to be monitoring the folders for changes to detect new files. I've never used it myself, but with logging enabled you might be able to pull new file names from there, which is probably easier than monitoring entire directories. Well, I think it is. I've never actually tried monitoring a folder for changes. Picasa seems to do okay detecting new images, though.

Sorting into folders based on site is also just a default setting in DD. You can disable it so all images get lumped together.


--- Quote from: Jorin on April 30, 2011, 06:06:17 AM ---If a standalone *booru viewer is in the cards, colour-coding the prefixes and making sure they're searchable would be great. Blacklisting and whitelisting is already accounted for within DD, so various tags could be filtered. Also, image viewers like Picasa should allow you to detect unwanted tags and mass-remove them from your collection as well.

--- End quote ---
I didn't think my original white- and black-list comment through entirely when I phrased it as such. I actually meant I'd like it if such a tagger would ignore tags for metadata purposes, i.e., keep the image, but don't tag it with certain tags. Mainly I'd want it because some tags I don't have many images for, and I'm not a big fan of orphans. On the topic of DDs white- and black-listing, the only problem I have with that is it limits how many tags you can list.

/rest is down at bottom under "/general dev"


--- Quote from: Jorin on April 30, 2011, 06:06:17 AM ---It looks like almost all the ingredients are here to make a useful app. What's missing still is a process to mass-tag a bunch (like, hundreds or thousands) of files already downloaded, provided they're not renamed and/or they have their original md5 hash.

--- End quote ---
I haven't looked at Sheska yet. If a filename has the md5 hash (and no really random 32 character A-Za-z0-9 sequences), it should be easy enough to go through a list of files and extract it from the name. DD seems to rely entirely on text parsing and the DOM, which is useful, but I think using the APIs (when available) would allow for faster and more efficient downloading/re-tagging. If a site doesn't expose its API that's another story, but I don't know if that's true of any of the booru sites at the moment.

Then of course there's the ethical issues surrounding code re-use. If it's all open source it's no big deal, but I'm not sure if DD is released under such a license. The gel-rating's fix was minor enough I didn't see an issue, but technically I probably should have asked for permission. I haven't seen any newer versions, though, so I'm guessing it's either been abandoned or deemed "final"*.

*IMO, no project is ever actually "final" or "done". At most they reach the end of their development life-cycle and are no longer worked on.

/general dev
The main issue I have with such a thing is that I'm horrible with front ends. Well, actually, I'm not that slick in general to begin with, but my front ends tend to fall noticeable lower on the quality scale. The other issue is my actual follow through is terrible. I remember posting months ago that I'd like to take a look at a piece of code for generating OST offers... I think I opened it once, commented two lines, and haven't looked at it since. x.x I probably should take a look at it again, while text parsing and HTML generation is still "fresh" from this semester's coursework.

Even assuming I got as far as starting it, once I had a working script or makeshift API for my script to interact with, that can be tweaked as I need it, I'd probably start to lose focus and go play with something else. =/

School is getting out soon, though, so I dunno. I still have to get around to tagging my own library, so at the very least I'll be trying to use the DD SQLite DB, and probably interface with and/or re-purpose Xiong's Sheska a bit. At this moment I have Xiong's Sheska, tagbooru, and the old DLer I found in the danbooru forums, as well as DD itself and some kind of JS... thingy, to look at. There's no lack of code inspiration, I'm sure.

Jorin:
Just a quick reply because I'm busy applying for a job (in user experience design!), I'm interested in user interfaces and I'd have fun working on this. It wouldn't have to be extremely powerful, just slick and functional enough to do the job well. New features could be added over time anyway.

ph4zr:
Hmm...? What kind of languages would work best? The only ones I'd be comfortable drafting it in are Java, PHP, perl, or possibly python if I can pick it up easily enough. C++ for an eventual port, maybe, but I'm not sure I'd want to start it there b/c there's so much more I can screw up. I haven't dealt with pointers or OOP in C++ in literally years, nor have I used it for any kind of non-trivial application or network communication. Then there's the fact it's easier to port the others to other platforms than it is a C++ project. I doubt most would want to compile the source themselves, after all.

(click to show/hide)Java is probably the most portable, but also has the most overhead. C++ is probably the fastest, but its compiled code is also the least portable. PHP, perl, and Python are somewhere in the middle, with each having their own strong suits. AFAIK all of them have the necessary modules for doing most of the tasks, although PHP and Python may have to use system calls to exiftool to actually tag images. There probably is an app module to do it directly, but I haven't come across one.

What would be best for GUIs I have absolutely no idea. Since you do that kind of thing, your perspective will probably differ. For me at least, all of my Java interfaces have been godawful ugly, but I'm not a very graphic-y person, either.
Honestly I'm not very dynamic with languages. I stick to what I've used for the most part. Still, if there's another language you had in mind, it never hurts to learn something new, either. And there are almost certainly users on the forum who have the necessary expertise to do it in some language of choice, though AFAIK xiong is the only one who has expressed even potential interest in a similar project, with fairly low priority at that.

If a project did get started, it might be appropriate to either create a dedicated thread for it or move the discussions/planning to a different environment. If it doesn't violate site rules, maybe they wouldn't really care, but there's still the fact that trading code via DL links isn't very efficient. I've never attempted a collaborative project, so I wouldn't know how to go about it, but it can't be too too hard to setup github or svn. /famous last words?

In any case, I couldn't even think about doing anything at all for at least another week. Posting in forums is a tad less time consuming than coding. ...the latter of which I should already be doing for my final... =/

Jorin:
I figure it's good to be as lightweight and portable as possible, and it would be cool if it could work on mobile devices as well (kind of like the Pixiv app). I think I'd rule out Java and C++ to start at least. I'm not sure whether PHP could do this on its own. Perl and Python might be the best for compatibility. Whichever can do this without needing Exiftool would be a big help, since exif metadata isn't present in anything besides .jpg files. There are a lot of .png and .gif files in this collection and it would be necessary to write metadata for them too.

My problem is I'm good on the research and design side, but I have a gaping hole in my skillset when it comes to actually programming. :( I could prototype it and design the overall architecture and interface down to the mouse-click, but I wouldn't know where to start for actually building it. I'm starting to learn how to program, but it's a bit of a learning curve for me.

But for now it's always possible to brainstorm. If and when it's possible to make this a project (perhaps if anyone else is interested too) it would be fun to get involved. It would be possible to at least get started with a wiki or something. Those are good for version control and note taking and are especially helpful for multi-person projects. No joke: wikis got me through grad school.

ph4zr:
/formats
I don't really know enough about image formats to be familiar with the metadata PNG and GIF files can store. Exiftool is convenient because it provides a library for writing metadata to JPEGs. Even Picasa doesn't store any metadata in PNGs or GIFs, so my guess is there isn't an accepted tagging standard. A more general approach would be to just store the metadata externally, but then obviously it's not stored with the images themselves, making re-organization somewhat of a hassle.

/mobile|porting
Your comment about mobile devices perplexes me slightly. Is a smartphone or tablet type device appropriate for hosting a user's image library? If you are just browsing that's one thing, but if you are also downloading it seems you'd want a desktop environment of some sort.

I've never actually programmed for a mobile device before, but my understanding is they tend to differ a fair bit in terms of what your available options are for languages and design. If you were designing a mobile app I suspect you'd have to build it with a specific mobile device in mind, although the underlying logic could probably remain untouched.

Honestly I'd be happy just to get it working on Windows in whatever language was most convenient. Porting could come later, although designing with portability in mind would definitely simplify it. As long as you don't rely too heavily on system calls or language/platform specific libraries and functionality, everything else is just translation.
Which is obviously no small feat even when you understand both languages, for which fansub TLs provide more than sufficient evidence.

/experience
Experience probably wouldn't really matter, at least not to me. I'd approach it more as I would a hobby or learning experience than a job. Besides that, even just designing the interface would be a step up. What was or wasn't possible, or was simply too much of a hassle to actually implement, could likely be dealt with on a case by case basis.

I tend to start with a top down approach for design and prototyping and then implement it from the bottom up, so knowing what functionality the GUI requires would make it easier to design the back-end. I.e., knowing what you want to do makes it easier to figure out what you need to do to make it happen.

Heck, they actually pay people to design pretty looking interfaces, so it's not like "just" designing an interface is necessarily a trivial task.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version