Discussion Forums > Technology

[proj] Booru standalone browser, (re)tagger, and cataloguer [STG: brainstorming]

(1/2) > >>

ph4zr:
Project
Booru standalone browser, (re)tagger, and cataloguer
(spawning thread)
Recent changes, current subject(s)

Stage
Brainstorming (2011.05.15)

Abstract
The project, if a project gets started, aims to allow a user to:


* Tag or re-tag images they have downloaded from boorus
* Browse boorus and download images
* Maintain a library of images they have downloaded from boorus, and use this to prevent re-downloading of old images
(Hypothetical) Project members (updated: 2011.05.17)
(click to show/hide)~don't be silly, there isn't even a project yet.

Expressed interest:

* Jorin: gui, meta-data storage
* ph4zr: back-end
*** for anyone I haven't confirmed wants their username associated with the (hypothetical) project

Similar or (concept) related projects/scripts/et cetera (updated: 2011.05.20)
-.- d-list BBcode not supported on SMF, it seems.
(click to show/hide)I have not actually tried using most of these, although I will probably test them out in the near future.

* tagbooru
A command line based mass downloader for boorus. As of last I checked, tags are saved in file names, which makes them subject to name and path length limits. Could likely be easily modified to save tags to a file.
* Sheska (see thread)
A script by BBT site dev Xiong Chiamiov, which looks up image tags from a booru using an md5 hash.

* Danbooru client for linux systems
A GUI browser for some booru sites. I have not tried it personally. It appears to support downloading and querying tags, as well as tag blacklists.
* booru download script(s) posted in danbooru forums (search link)
There are a few scripts and revisions posted in this thread. Its functionality seems largely similar to tagbooru, in that they are both command line scripts which save tags to file names. No direct link as boorus typicaly host BL images.
Er, I meant "blacklisted", but "boy's love" is also accurate.
* Danbooru Downloader Firefox Addon (brothersoft dl link / post about gelbooru ratings fix)
What it says on the tin: it allows you to download images from boorus. Obviously you could always do this, but DD can auto-download on page load, has some whitelist and blacklist support, and saves information on downloaded images to its own SQLite DB.
/whether you want it to or not

* (updated: 2011.05.20) Danbooru Downloader (unrelated to above)
A GUI for downloading images from supported booru based sites.
I just found this on 2011.05.15
(updated: 2011.05.20)
I checked out Danbooru Downloader (the non-addon). Pretty nifty. A bit buggy—it crashes on me every once in a while, and it likes to hang during certain queries.

Anyway, it allows you to perform a tag search and modify a few parameters, as well as load thumbnails. It does provide links to the source and the *booru image link, but not the post page—you can infer that from the post ID, anyway. You can also choose which images to download, if you're willing to sift through all of the thumbnails, or you can just select all and add them to the download queue. There is a also a seemingly nifty batch download feature, but I haven't tested it much due to the paging issue with gelbooru, noted below. The batch option does allow you to select multiple boorus, though.

For gelbooru the paging options don't work as coded, but you can get around that by using {searched-tag}&pid={page-number}, since the query options get joined together. It doesn't seem to support singling out things like copyright or character tags, although you can have it add tags to the file-name. You'll still have the max path limit, though, so you might not get the tags you need to sift through it all.

Since this is just a brainstorming stage, and a project isn't even guaranteed to get underway, most of the following is speculative or otherwise not finalized.

In Depth (Not currently applicable)
(click to show/hide)
* Tagging for PNG and GIF images
Tagging, while possible, does not seem to be standardized for PNG images. I have no idea about GIF
Possible solution(s): Store tag data in external files. Examples: flat files (think Picasa .ini style); application local DB (SQLite, flat file); /other?
* Modified images
Re-tagging an image will change its md5 hash, which is the easiest way to look up an image. Re-naming an image that has been modified may make it impossible to use any hash that may have been stored with the filename.
Possible solution(s): Defer to an image lookup site, such as IQDB; look up images based on user defined tags/features, after filtering off matched images; /other?

* Request rates/program hangs
If the program attempts to update tags on a massive library, depending on implementation the program may hang and be completely unusable. Also, it may make too many requests to the server, resulting in ... who knows? /never hit it a limit manually.
Possible solution(s): Rate limiting server requests (play nice). Offload lookups and image pulls to a (single) separate thread, so the program is still usable.

* /other (click to show/hide)Features brought up in the other thread:

* Use the library to load local versions of images rather than wasting time pulling them from the servers.
* Use the library to filter off posts of images already possessed. I.e., to see only new images while browsing.
* Whitelist and blacklist tags.
* Batch download images in tag sets
* /other
/shameless
Features I personally want, either as part of this project or as part of a side-project:

* Pull tags from Danbooru Downloader's SQLite DB.
* Preserve user added tags
* Offer to overwrite user added tags
* Offer to only add tags specified in a list (avoid mostly useless tags)
* Pull tags from Picasa .ini files /for existing libraries, and possibly other programs, if desired. Write out Picasa .ini files, as well as files for other programs, if desired. /fairly straightforward, but:
Concern: Is this even legal? Picasa is closed source, after all.
* Maintain a list of tags for monitoring, which will be used to automatically pull new images in the tag set.
* Offer option to -not- tag images directly, i.e., to keep image tags in external files, so as to preserve hashes.
* Offer option to use an externally specified library configuration file, for example, for privacy reasons. /e.x., explicit versus safe set
* /other
Some points to discuss
(click to show/hide)Language of choice
While a multiple language option might be viable, keeping it to one language (at a time) would probably simplify things. If portability or performance is an issue, it could likely be ported once the program logic and GUI design was there.

Concerns:

* Tagging
Ideally, the language should offer a package for tagging support. The only one I know of offhand is perl, which has exiftool. System calls could be made, but this is... not entirely desirable.

* Text
If external files or a DB type implementation are used for saving tags, something strong in text-processing is probably ideal. /Also, if the XML style API is used, it will be important here, as well.

* Image
As a browser of boorus, it has to be able to display images efficiently. It should probably have support for down-scaling images favorably. /Nothing I've dealt with.

* GUI
Something that looks reasonable nice/smooth would be good, IMO. Ease of design would be nice, too, unless you'd be interested in doing it yourself, and are willing to deal with ... GUI.
/ph4zr hates GUI stuff

* Portability
While ports aren't impossible, it would be easier to design with cross-platform support in mind.

* SSE: size/speed/efficiency
What it says. It shouldn't be bloated; it should be reasonably fast; and it should be fairly easy to code the desired functionality. /In this day and age, I doubt speed will be much of a problem.

Development environment
Where to host the project source during development

Suggestions:

* wiki
* github
* svn
* /similar
Design approach

Thoughts:

* Top-down
Plan user interface and required features, draft an API and then implement it. /thoughts?
~IMO has the advantage of maintaining order, at the risk of assuming functionality that might be difficult to implement.

* Bottom-up
Implement basic methods for accomplishing small steps in a larger design, and gradually build up to a GUI. /thoughts?
~IMO has the advantage of being easy to test as you go, at the risk of top-level actions being more limited in power, and development being slightly more disorganized without a set framework.

* Mixed
Work from both directions. /thoughts?
~IMO Has the advantage that issues with the top-level functionality and lower-level implementation can be addressed as both sides realize they aren't meeting the other's needs, or there are issues with interaction. Essentially I see this as the back-end being designed with certain goals in mind, an API being exposed to top-level implementation, and concessions being made by either side as necessary. Risks: Uh... I'll think of something. Ability to work together and willingness to make concessions is a possible one, though. /incredible bias

* /other
I have no idea.

Both the top-down and bottom-up approach also have the issue of forcing individuals to work on less preferred aspects. The mixed approach would allow those with a preference for GUI to work on that, and those with a preference for cold, hard, logic to work on the lower end. /bias. Obviously if it was a team project it would be more of a guideline than a rule, anyway.


There was probably some other important aspect that I wanted to bring up, but I can't for the life of me remember what it was. /template follows
{Some Other Important Aspect}

Concerns:

* Cannot remember
(stuff)

/other stuff

A final note
Even if this project got started, I wouldn't necessarily be a/the project lead. Someone with more experience in coding, design, or leadership would likely be better suited.
/Plus I'm a bit of a control freak in those kinds of situations.

/I'm running out of steam here, so I'll leave it at that for now. 2011.05.15

MODS: /reserving next post "just in case". Hopefully that's acceptable.

ph4zr:
/reserved as per final line

Updates
Similar [...] (2011.05.20)
project members (2011.05.17)

Current

--- Quote from: ph4zr on May 17, 2011, 04:45:55 AM ---/general /APIs
For Danbooru, Gelbooru, and iqdb.org, I have specifications/examples for the APIs. I haven't looked at the other boorus yet. As I understand it, Gelbooru uses a custom implementation of the booru engine. I don't know if other boorus (kona, imouto, et cetera) use the same version or API as Danbooru, or even if Danbooru uses the "official" booru engine.

As for reverse lookup, I don't think other image search engines would be particularly relevant to this case.

/boorus
The only ones I've considered are danbooru, gelbooru, konachan, imouto, and sankaku. Ideally lookups will be modularized to the point that any additions can either be done via a configuration file or a swapping of the module handling them, but if that isn't possible doesn't happen, are there any others I should consider?

--- End quote ---

tomoya-kun:
Sounds cool, but I am without programming skill.

Jorin:
I'd be up for GUI development and metadata storage!

(click to show/hide)On that note, it looks like we're heading towards an external file. That will easily fix both issues of format compatibility (gif, png) and disrupting the md5 hash as a result of writing new tags to the file.

ph4zr:

--- Quote from: Jorin on May 16, 2011, 08:56:42 PM ---I'd be up for GUI development and metadata storage!

(click to show/hide)On that note, it looks like we're heading towards an external file. That will easily fix both issues of format compatibility (gif, png) and disrupting the md5 hash as a result of writing new tags to the file.
--- End quote ---
/noted+updated
I won't really be doing anything this week, though. Just finished with finals week and I feel like a break. Plus I need to get the summer routine sorted.
It probably seems like I'm always on break, but yeah.

/metadata
You wouldn't have to use an external file for JPEGs, at least. You definitely can store metadata in PNGs, and I -think- exiftool supports it, but there is no standard for storing tags. So, yes, an external file would be the easier solution there.

As for metadata storage, by this you mean storing and querying tag information, yes? So API queries resulting in tag writes would defer the writes to you, but still handle the queries themselves? What about local searches?

/Probably too many questions considering the "hypothetical" stage, but I might as well bring them up while we're on topic.

(click to show/hide)You don't really have to use spoiler tags to talk about stuff like that. I did it just to separate sections in an easy to navigate manner, since I don't have a TOC search system in place.

/general /APIs
For Danbooru, Gelbooru, and iqdb.org, I have specifications/examples for the APIs. I haven't looked at the other boorus yet. As I understand it, Gelbooru uses a custom implementation of the booru engine. I don't know if other boorus (kona, imouto, et cetera) use the same version or API as Danbooru, or even if Danbooru uses the "official" booru engine.

As for reverse lookup, I don't think other image search engines would be particularly relevant to this case.

/boorus
The only ones I've considered are danbooru, gelbooru, konachan, imouto, and sankaku. Ideally lookups will be modularized to the point that any additions can either be done via a configuration file or a swapping of the module handling them, but if that isn't possible doesn't happen, are there any others I should consider?

Navigation

[0] Message Index

[#] Next page

Go to full version