Author Topic: [proj] Booru standalone browser, (re)tagger, and cataloguer [STG: brainstorming]  (Read 2189 times)

Offline ph4zr

  • Member
  • Posts: 346
  • Izaaaaaaya-kuuun!
Project
Booru standalone browser, (re)tagger, and cataloguer
(spawning thread)
Recent changes, current subject(s)

Stage
Brainstorming (2011.05.15)

Abstract
The project, if a project gets started, aims to allow a user to:

  • Tag or re-tag images they have downloaded from boorus
  • Browse boorus and download images
  • Maintain a library of images they have downloaded from boorus, and use this to prevent re-downloading of old images

(Hypothetical) Project members (updated: 2011.05.17)
(click to show/hide)

Similar or (concept) related projects/scripts/et cetera (updated: 2011.05.20)
-.- d-list BBcode not supported on SMF, it seems.
(click to show/hide)

Since this is just a brainstorming stage, and a project isn't even guaranteed to get underway, most of the following is speculative or otherwise not finalized.

In Depth (Not currently applicable)
(click to show/hide)
(click to show/hide)

Some points to discuss
(click to show/hide)

A final note
Even if this project got started, I wouldn't necessarily be a/the project lead. Someone with more experience in coding, design, or leadership would likely be better suited.
/Plus I'm a bit of a control freak in those kinds of situations.

/I'm running out of steam here, so I'll leave it at that for now. 2011.05.15

MODS: /reserving next post "just in case". Hopefully that's acceptable.
« Last Edit: May 20, 2011, 06:20:03 AM by ph4zr »
Oh flickering blaze burn...
Why use skill when you can just spam fireball? /mage <3

Offline ph4zr

  • Member
  • Posts: 346
  • Izaaaaaaya-kuuun!
/reserved as per final line

Updates
Similar [...] (2011.05.20)
project members (2011.05.17)

Current
/general /APIs
For Danbooru, Gelbooru, and iqdb.org, I have specifications/examples for the APIs. I haven't looked at the other boorus yet. As I understand it, Gelbooru uses a custom implementation of the booru engine. I don't know if other boorus (kona, imouto, et cetera) use the same version or API as Danbooru, or even if Danbooru uses the "official" booru engine.

As for reverse lookup, I don't think other image search engines would be particularly relevant to this case.

/boorus
The only ones I've considered are danbooru, gelbooru, konachan, imouto, and sankaku. Ideally lookups will be modularized to the point that any additions can either be done via a configuration file or a swapping of the module handling them, but if that isn't possible doesn't happen, are there any others I should consider?
« Last Edit: May 20, 2011, 06:21:58 AM by ph4zr »
Oh flickering blaze burn...
Why use skill when you can just spam fireball? /mage <3

Offline tomoya-kun

  • Member
  • Posts: 6374
  • Reporting for duty.
Sounds cool, but I am without programming skill.


BBT Team Riko Suminoe #000002

Offline Jorin

  • Member
  • Posts: 12
I'd be up for GUI development and metadata storage!

(click to show/hide)

Offline ph4zr

  • Member
  • Posts: 346
  • Izaaaaaaya-kuuun!
I'd be up for GUI development and metadata storage!

(click to show/hide)
/noted+updated
I won't really be doing anything this week, though. Just finished with finals week and I feel like a break. Plus I need to get the summer routine sorted.
It probably seems like I'm always on break, but yeah.

/metadata
You wouldn't have to use an external file for JPEGs, at least. You definitely can store metadata in PNGs, and I -think- exiftool supports it, but there is no standard for storing tags. So, yes, an external file would be the easier solution there.

As for metadata storage, by this you mean storing and querying tag information, yes? So API queries resulting in tag writes would defer the writes to you, but still handle the queries themselves? What about local searches?

/Probably too many questions considering the "hypothetical" stage, but I might as well bring them up while we're on topic.

(click to show/hide)

/general /APIs
For Danbooru, Gelbooru, and iqdb.org, I have specifications/examples for the APIs. I haven't looked at the other boorus yet. As I understand it, Gelbooru uses a custom implementation of the booru engine. I don't know if other boorus (kona, imouto, et cetera) use the same version or API as Danbooru, or even if Danbooru uses the "official" booru engine.

As for reverse lookup, I don't think other image search engines would be particularly relevant to this case.

/boorus
The only ones I've considered are danbooru, gelbooru, konachan, imouto, and sankaku. Ideally lookups will be modularized to the point that any additions can either be done via a configuration file or a swapping of the module handling them, but if that isn't possible doesn't happen, are there any others I should consider?
Oh flickering blaze burn...
Why use skill when you can just spam fireball? /mage <3

Offline Jorin

  • Member
  • Posts: 12
And I'm back! I had a bogus PC crash to deal with, but I'm good to go now. Yes, still very early, but to keep the ball rolling:

As for metadata storage, by this you mean storing and querying tag information, yes? So API queries resulting in tag writes would defer the writes to you, but still handle the queries themselves? What about local searches?

Exactly, I think that's what I was meaning. For local searches we'll want to depend on a local database that matches MD5 hashes with tags and local file paths. I also figure it's best to only call the API when you really need to, in order to minimize server load. If the file is local, just use the local database to retrieve it (all the tags will be there). If the file is new (ie. just downloaded from the booru) or needs to be updated (still probably too early to elaborate on this), use the API.

/general /APIs
As I understand it, Gelbooru uses a custom implementation of the booru engine. I don't know if other boorus (kona, imouto, et cetera) use the same version or API as Danbooru, or even if Danbooru uses the "official" booru engine.

As for reverse lookup, I don't think other image search engines would be particularly relevant to this case.

Yeah, Gelbooru seems to play differently. I noticed that with the string for JSON queries in DanbooruDownloader:

Danbooru: /post/index.json?%_query% - this looks normal, like most of the others.
Gelbooru's is just empty... It just goes to index.html or something. That confused the Sheska script, and it confused me too. So I don't know how to pull tags off of Gelbooru yet. (Edit: Afterthought. I wonder if Gelbooru just forgoes JSON and uses XML only.)

/boorus
The only ones I've considered are danbooru, gelbooru, konachan, imouto, and sankaku. Ideally lookups will be modularized to the point that any additions can either be done via a configuration file or a swapping of the module handling them, but if that isn't possible doesn't happen, are there any others I should consider?

For the reasons above, I fully agree about making lookups modular. That's great for functionality, and for giving users more choice. I'm not sure about any other providers to include. We can already set up modules for the big ones you mentioned, and let feedback guide us from there. :)
« Last Edit: May 31, 2011, 12:43:20 AM by Jorin »

Offline ph4zr

  • Member
  • Posts: 346
  • Izaaaaaaya-kuuun!
I've been playing the hell out of Vesperia (still), so I haven't really taken much time to look at it further. As far as tag lookups go, as long as you rate limit them it shouldn't be too too bad. I installed a plugin to delay loading of tabs just because I have a serious case of tabitis (1-200+, even without boorus), but even before that I had a couple dozen booru tabs loading simultaneously and didn't get locked out. Granted, that was fairly sporadic behavior, and I wouldn't recommend hammering the server on purpose.

I figure a few more days and I'll either get seriously tired of Vesperia, or "complete" the game. God help me* if Minecraft gets patched again, though.
*Poorly chosen figure of speech. Could be Thor, Zeus, or Haruhi, though.

/Searches: I actually meant "will you be handling the searches along with/as part of the metadata aspect?"

/Gelbooru API: Yes. They only provide documentation for an XML based API. Pulling tags off of it isn't too bad, and I actually prefer XML queries anyway, since they could be directly cached client side for processing. Every major language has a package to handle XML parsing, so it's not really a big deal to get the information. In any case, their search functionality is down at the moment.

/Modular: I'm generally in favor of modular designs. It allows me to just swap things out or change things without touching program code. The best I could do with my ability though is configuration files describing how to get information, actual coding for plugin functionality is beyond my level of experience. Still, I suspect a simple XML file or similar detailing the fields and query format would more than suffice.

/Updating: IMO the easiest solution would probably be to store some program specific metadata in a database, including per image information on updates, et cetera. From there you could offer the user the option of automatically updating tags on older images, and customize things like how often, whether to stop after a set number of non-updates, et cetera. I'd probably set the default to "don't update", since it could get out of hand with larger libraries if you tried to update 10k+ images every five days... all at the same time, no less. The user can always request a tag update anyway.

As you say, though, it's still way too early to go into too much depth on that, since the basic framework would need to be in place first. My plan is to just start small, and set up a temporary GUI to test small searches and finding tags for single images and small sets. Once that works, I'll probably move on to tracking local files and matching them to booru queries, with dummy calls to update tag information.
Oh flickering blaze burn...
Why use skill when you can just spam fireball? /mage <3

Offline Jorin

  • Member
  • Posts: 12
/Searches: I actually meant "will you be handling the searches along with/as part of the metadata aspect?"

Oh, will I be able to handle searches as well as metadata? Sorry, I had misunderstood that! I think I'd only be able to do the local side of the search process. That would probably just amount to SQL queries or something along that line. Accessing the API would be a lot harder for me at this point, though I think it's also possible to search simply by generating a URL query too. If that's all it takes I could probably set something up.

/Gelbooru API: Yes. They only provide documentation for an XML based API. Pulling tags off of it isn't too bad, and I actually prefer XML queries anyway.

That sounds good. Definitely want to have support for both JSON and XML. I was wanting to get Sheska going with Gelbooru, but I'm at a loss as to how to edit it to use XML rather than JSON. I have a lot of images downloaded from Gelbooru that I can't tag right now.

I like the way you've thought about starting this. Make it simple at first, then build on it step by step.

Offline bloody000

  • Member
  • Posts: 1401
All similar software has been, is being and will be abused by OCD individuals who just have to get all the pictures they will never look at. Image Post sites will suffer, just ask dovac for first hand experience.
All you have to do is study it out. Just study it out.

Offline Jorin

  • Member
  • Posts: 12
All similar software has been, is being and will be abused by OCD individuals who just have to get all the pictures they will never look at. Image Post sites will suffer, just ask dovac for first hand experience.

First hand experience on this would be great. I wonder whether the number of people who use this software irresponsibly would exceed the number of people who use it responsibly, especially since there's already other software out there that allows this kind of behaviour. This project would also emphasize managing a local collection, which means people theoretically would have an easier time looking at all those pictures they've downloaded.