Hydrus Manual

Hydrus 101, the basics of being an autistic waifu hoarder.

What is Hydrus

 

hydrus network - client and server

The hydrus network client is a desktop application written for Anonymous and other internet enthusiasts with large media collections. It organises your files into an internal database and browses them with tags instead of folders, a little like a booru on your desktop. Tags and files can be anonymously shared through custom servers that any user may run. Everything is free, nothing phones home, and the source code is included with the release. It is developed mostly for Windows, but builds for Linux and macOS are available (perhaps with some limitations, depending on your situation).

The software is constantly being improved. I try to put out a new release every Wednesday by 8pm Eastern.

Currently importable filetypes are:

On the Windows and Linux builds, an MPV window is embedded to play video and audio smoothly. For files like pdf, which cannot currently be viewed in the client, it is easy to launch any file with your OS's default program.

The client can download files and parse tags from a number of websites, including by default:

And can be extended to download from more locations using easily shareable user-made downloaders. It can also be set to 'subscribe' to any gallery search, repeating it every few days to keep up with new results.

The program's emphasis is on your freedom. There is no DRM, no spying, no censorship. The program never phones home.

If you would like to try it, I strongly recommend you check out the help and getting started guide. A copy is included with the release as well.

So:

Starting out

Starting out

Introduction

on being anonymous

Nearly all sites use the same pseudonymous username/password system, and nearly all of them have the same drama, sockpuppets, and egotistical mods. Censorship is routine. That works for many people, but not for me.

I enjoy being anonymous online. When you aren't afraid of repercussions, you can be as truthful as you want. You can have conversations that can happen nowhere else. It's fun!

I've been on the imageboards for a long time, saving everything I like to my hard drive. After a while, the whole collection was just too large to manage on my own.

the hydrus network

So! I'm developing a program that helps people organise their files together anonymously. I want to help you do what you want with your stuff, and that's it. You can share some tags and files with other people if you want to, but you don't have to connect to anything if you don't. The default is complete privacy, no sharing, and every upload requires a conscious action on your part. I don't plan to ever record metrics on users, nor serve ads, nor charge for my software. The software never phones home.

This does a lot more than a normal image viewer. If you are totally new to the idea of personal media collections and tagging, I suggest you start slow, walk through the getting started guides, and experiment doing different things. If you aren't sure on what a button does, try clicking it! You'll be importing thousands of files and applying tens of thousands of tags in no time.

The client is chiefly a file database. It stores your files inside its own folders, managing them far better than an explorer window or some online gallery. Here's a screenshot of one of my test installs with a search showing all files:

As well as the client, there is also a server that anyone can run to store files or tags for sharing between many users. The mechanics of running a server is usually confusing to new users, so wait a little while before you explore this. Some users run a public tag repository with hundreds of millions of tags that you can access and contribute to if you wish.

I have many plans to expand the client and the network.

statement of principles

None of the above are currently true, but I would love to live in a world where they were. My software is an attempt to move us a little closer.

I try to side with the person over the authority, the distributed over the centralised. I still use gmail and youtube just like pretty much everyone, but I would rather be using different systems, especially in ten years. No one seemed to be making what I wanted for file management, so I decided to do it myself, and here we are.

If, after a few months, you find you enjoy the software and would like to further support it, I have set up a simple no-reward patreon, which you can read more about here.

license

These programs are free software. Everything I, hydrus dev, have made is under the Do What The Fuck You Want To Public License, Version 3, as published by Kris Craig. See https://github.com/sirkris/WTFPL/blob/master/WTFPL.md for more details.

Do what the fuck you want to with my software, and if shit breaks, DEAL WITH IT.

Starting out

Getting Started: Installing

If any of this is confusing, a simpler guide is here, and some video guides are here!

downloading

You can get the latest release at my github releases page.

I try to release a new version every Wednesday by 8pm EST and write an accompanying post on my tumblr and a Hydrus Network General thread on 8chan.moe /t/.

installing

The hydrus releases are 64-bit only. If you are a python expert, there is the slimmest chance you'll be able to get it running from source on a 32-bit machine, but it would be easier just to find a newer computer to run it on.

for Windows:

for macOS:

for Linux:

from source:

Hydrus stores all its data—options, files, subscriptions, everything—entirely inside its own directory. You can extract it to a usb stick, move it from one place to another, have multiple installs for multiple purposes, wrap it all up inside a truecrypt volume, whatever you like. The .exe installer writes some unavoidable uninstall registry stuff to Windows, but the 'installed' client itself will run fine if you manually move it.

However, for macOS users: the Hydrus App is non-portable and puts your database in ~/Library/Hydrus (i.e. /Users/[You]/Library/Hydrus). You can update simply by replacing the old App with the new, but if you wish to backup, you should be looking at ~/Library/Hydrus, not the App itself.

updating

Hydrus is imageboard-tier software, wild and fun but unprofessional. It is written by one Anon spinning a lot of plates. Mistakes happen from time to time, usually in the update process. There are also no training wheels to stop you from accidentally overwriting your whole db if you screw around. Be careful when updating. Make backups beforehand!

Hydrus does not auto-update. It will stay the same version unless you download and install a new one.

Although I put out an new version every week, you can update far less often if you want. The client keeps to itself, so if it does exactly what you want and a new version does nothing you care about, you can just leave it. Other users enjoy updating every week, simply because it makes for a nice schedule. Others like to stay a week or two behind what is current, just in case I mess up and cause a temporary bug in something they like.

A user has written a longer and more formal guide to updating, and information on the 334->335 step here.

The update process:

 

Unless the update specifically disables or reconfigures something, all your files and tags and settings will be remembered after the update.

Releases typically need to update your database to their version. New releases can retroactively perform older database updates, so if the new version is v255 but your database is on v250, you generally only need to get the v255 release, and it'll do all the intervening v250->v251, v251->v252, etc... update steps in order as soon as you boot it. If you need to update from a release more than, say, ten versions older than current, see below. You might also like to skim the release posts or changelog to see what is new.

Clients and servers of different versions can usually connect to one another, but from time to time, I make a change to the network protocol, and you will get polite error messages if you try to connect to a newer server with an older client or vice versa. There is still no need to update the client--it'll still do local stuff like searching for files completely fine. Read my release posts and judge for yourself what you want to do.

clean installs

This is only relevant if you update and cannot boot at all.

Very rarely, hydrus needs a clean install. This can be due to a special update like when we moved from 32-bit to 64-bit or needing to otherwise 'reset' a custom install situation. The problem is usually that a library file has been renamed in a new version and hydrus has trouble figuring out whether to use the older one (from a previous version) or the newer.

In any case, if you cannot boot hydrus and it either fails silently or you get a crash log or system-level error popup complaining in a technical way about not being able to load a dll/pyd/so file, you may need a clean install, which essentially means clearing any old files out and reinstalling.

However, you need to be careful not to delete your database! It sounds silly, but at least one user has made a mistake here. The process is simple, do not deviate:

After that, you'll have a 'clean' version of hydrus that only has the latest version's dlls. If hydrus still will not boot, I recommend you roll back to your last working backup and let me, hydrus dev, know what your error is.

big updates

If you have not updated in some time--say twenty versions or more--doing it all in one jump, like v250->v290, is likely not going to work. I am doing a lot of unusual stuff with hydrus, change my code at a fast pace, and do not have a ton of testing in place. Hydrus update code often falls to bitrot, and so some underlying truth I assumed for the v255->v256 code may not still apply six months later. If you try to update more than 50 versions at once (i.e. trying to perform more than a year of updates in one go), the client will give you a polite error rather than even try.

As a result, if you get a failure on trying to do a big update, try cutting the distance in half--try v270 first, and then if that works, try v270->v290. If it doesn't, try v260, and so on.

If you narrow the gap down to just one version and still get an error, please let me know. I am very interested in these sorts of problems and will be happy to help figure out a fix with you (and everyone else who might be affected).

backing up

Maintaining a regular backup is important for hydrus. The program stores a lot of complicated data that you will put hours and hours of work into, and if you only have one copy and your hard drive breaks, you could lose everything. This has happened before, and it sucks to go through. Don't let it be you.

If you do not already have a backup routine for your files, this is a great time to start. I now run a backup every week of all my data so that if my computer blows up or anything else awful happens, I'll at worst have lost a few days' work. Before I did this, I once lost an entire drive with tens of thousands of files, and it felt awful. If you are new to saving a lot of media, I hope you can avoid what I felt. ;_;

I use ToDoList to remind me of my jobs for the day, including backup tasks, and FreeFileSync to actually mirror over to an external usb drive. I recommend both highly (and for ToDoList, I recommend hiding the complicated columns, stripping it down to a simple interface). It isn't a huge expense to get a couple-TB usb drive either--it is absolutely worth it for the peace of mind.

By default, hydrus stores all your user data in one location, so backing up is simple:

Do not put your live database in a folder that continuously syncs to a cloud backup. Many of these services will interfere with a running client and can cause database corruption. If you still want to use a system like this, either turn the sync off while the client is running, or use the above backup workflows to safely backup your client to a separate folder that syncs to the cloud.

I recommend you always backup before you update, just in case there is a problem with my code that breaks your database. If that happens, please contact me, describing the problem, and revert to the functioning older version. I'll get on any problems like that immediately.

Starting out

Getting started: Files

If any of this is confusing, a simpler guide is here, and some video guides are here!

a warning

Hydrus can be powerful, and you control everything. By default, you are not connected to any servers and absolutely nothing is shared with other users--and you can't accidentally one-click your way to exposing your whole collection--but if you tag private files with real names and click to upload that data to a tag repository that other people have access to, the program won't try to stop you. If you want to do private sexy slideshows of your shy wife, that's great, but think twice before you upload files or tags anywhere, particularly as you learn. It is impossible to contain leaks of private information.

There are no limits and few brakes on your behaviour. It is possible to import millions of files. For many new users, their first mistake is downloading too much too fast in overexcitement and becoming overwhelmed. Take things slow and figure out good processing workflows that work for your schedule before you start adding 500 subscriptions.

the problem

If you have ever seen something like this--

--then you already know the problem: using a filesystem to manage a lot of images sucks.

Finding the right picture quickly can be difficult. Finding everything by a particular artist at a particular resolution is unthinkable. Integrating new files into the whole nested-folder mess is a further pain, and most operating systems bug out when displaying 10,000+ thumbnails.

so, what does the hydrus client do?

Let's first focus on importing files.

When you first boot the client, you will see a blank page. There are no files in the database and so there is nothing to search. To get started, I suggest you simply drag-and-drop a folder with a hundred or so images onto the main window. A dialog will appear affirming what you want to import. Ok that, and a new page will open. Thumbnails will stream in as the software processes each file.

The files are being imported into the client's database. The client discards their filenames.

Notice your original folder and its files are untouched. You can move the originals somewhere else, delete them, and the client will still return searches fine. In the same way, you can delete from the client, and the original files will remain unchanged--import is a copy, not a move, operation. The client performs all its operations on its internal database, which holds copies of the files it imports. If you find yourself enjoying using the client and decide to completely switch over, you can delete the original files you import without worry. You can always export them back again later.

 

Now:

The client can currently import the following mimetypes:

Although some support is imperfect for the complicated filetypes. For the Windows and Linux built releases, hydrus now embeds an MPV player for video, audio and gifs, which provides smooth playback and audio, but some other environments may not support MPV and so will default when possible to the native hydrus software renderer, which does not support audio. When something does not render how you want, right-clicking on its thumbnail presents the option 'open externally', which will open the file in the appropriate default program (e.g. ACDSee, VLC).

The client can also download files from several websites, including 4chan and other imageboards, many boorus, and gallery sites like deviant art and hentai foundry. You will learn more about this later.

inbox and archiving

The client sends newly imported files to an inbox, just like your email. Inbox acts like a tag, matched by 'system:inbox'. A small envelope icon is drawn in the top corner of all inbox files:

If you are sure you want to keep a file long-term, you should archive it, which will remove it from the inbox. You can archive from your selected thumbnails' right-click menu, or by pressing F7. If you make a mistake, you can spam Ctrl-Z for undo or hit Shift-F7 on any set of files to explicitly return them to the inbox.

Anything you do not want to keep should be deleted by selecting from the right-click menu or by hitting the delete key. Deleted files are sent to the trash. They will get a little trash icon:

A trashed file will not appear in subsequent normal searches, although you can search the trash specifically by clicking the 'my files' button on the autocomplete dropdown and changing the file domain to 'trash'. Undeleting a file (shift+delete) will return it to 'my files' as if nothing had happened. Files that remain in the trash will be permanently deleted, usually after a few days. You can change the permanent deletion behaviour in the client's options.

A quick way of processing new files is--

filtering

Lets say you just downloaded a good thread, or perhaps you just imported an old folder of miscellany. You now have a whole bunch of files in your inbox--some good, some awful. You probably want to quickly go through them, saying yes, yes, yes, no, yes, no, no, yes, where yes means 'keep and archive' and no means 'delete this trash'. Filtering is the solution.

Select some thumbnails, and either choose filter->archive/delete from the right-click menu or hit F12. You will see them in a special version of the media viewer, with the following controls:

Your choices will not be committed until you finish filtering.

This saves time.

lastly

The hydrus client's workflows are not designed for half-finished files that you are still working on. Think of it as a giant archive for everything excellent you have decided to store away. It lets you find and remember these things quickly.

In general, hydrus is good for individual files like you commonly find on imageboards or boorus. Although advanced users can cobble together some page-tag-based solutions, it is not yet great for multi-file media like comics and definitely not as a typical playlist-based music player.

If you are looking for a comic manager to supplement hydrus, check out this user-made guide to other archiving software here!

And although the client can hold millions of files, it starts to creak and chug when displaying or otherwise tracking more than about 40,000 or so in a single gui window. As you learn to use it, please try not to let your download queues or general search pages regularly sit at more than 40 or 50k total items, or you'll start to slow other things down. Another common mistake is to leave one large 'system:everything' or 'system:inbox' page open with 70k+ files. For these sorts of 'ongoing processing' pages, try adding a 'system:limit=256' to keep them snappy. One user mentioned he had regular gui hangs of thirty seconds or so, and when we looked into it, it turned out his handful of download pages had three million files queued up! Just try and take things slow until you figure out what your computer's limits are.

Starting out

Getting started: Tags

If any of this is confusing, a simpler guide is here, and some video guides are here!

how do we find files?

So, you have stored some media in your database. Everything is hashed and cached. You can search by inbox and resolution and size and so on, but if you really want to find what we are looking for, you will have to use tags.

FAQ: what is a tag?

Your client starts with one local tags service, called 'my tags', which keeps all of its file->tag mappings in your client's database where only you can see them. It is a good place to practise. So, select a file and press F3:

The autocomplete dropdown in the manage tags dialog works very like the one in a normal search page--you type part of a tag, and matching results will appear below. You select the tag you want with the arrow keys and hit enter. Since your 'my tags' service doesn't have any tags in it yet, you won't get any results here except the exact match of what you typed. If you want to remove a tag, enter the exact same thing again or double-click it in the box above.

Prefixing a tag with a category and a colon will create a namespaced tag. This helps inform the software and other users about what the tag is. Examples of namespaced tags are:

The client is set up to draw common namespaces in different colours, just like boorus do. You can change these colours in the options.

Once you are happy with your tags, hit 'apply' or just press enter on the text box if it is empty.

The tags are now saved to your database. Searching for any of them will return this file and anything else so tagged:

If you add more tags or system predicates to a search, you will limit the results to those files that match every single one:

You can also exclude a tag by prefixing it with a hyphen (e.g. '-heresy').

OR searching

Searches find files that match every search 'predicate' in the list (it is an AND search), which makes it difficult to search for files that include one OR another tag. More recently, simple OR search support was added. All you have to do is hold down Shift when you enter/double-click a tag in the autocomplete entry area. Instead of sending the tag up to the active search list up top, it will instead start an under-construction 'OR chain' in the tag results below:

You can keep searching for and entering new tags. Holding down Shift on new tags will extend the OR chain, and entering them as normal will 'cap' the chain and send it to the complete and active search predicates above.

Any file that has one or more of those OR sub-tags will match.

If you enter an OR tag incorrectly, you can either cancel or 'rewind' the under-construction search predicate with these new buttons that will appear:

You can also cancel an under-construction OR by hitting Esc on an empty input. You can add any sort of search term to an OR search predicate, including system predicates. Some unusual sub-predicates (typically a '-tag', or a very broad system predicate) can run very slowly, but they will run much faster if you include non-OR search predicates in the search:

This search will return all files that have the tag 'fanfic' and one or more of 'medium:text', a positive value for the like/dislike rating 'read later', or PDF mime.

tag repositories

It can take a long time to tag even small numbers of files well, so I created tag repositories so people can share the work.

Tag repos store many file->tag relationships. Anyone who has an access key to the repository can sync with it and hence download all these relationships. If any of their own files match up, they will get those tags. Access keys will also usually have permission to upload new tags and ask for incorrect existing ones to be deleted.

Anyone can run a tag repository, but it is a bit complicated for new users. I ran a public tag repository for a long time, and now this large central store is run by users. It has hundreds of millions of tags and is free to access and contribute to.

To connect with it, please check here.

If you add it, your client will download updates from the repository over time and, usually when it is idle or shutting down, 'process' them into its database until it is fully synchronised. The processing step is CPU and HDD heavy, and you can customise when it happens in file->options->maintenance and processing. As the repository synchronises, you should see some new tags appear, particularly on famous files that lots of people have.

Tags are rich, cpu-intensive metadata. The Public Tag Repository has hundreds of millions of mappings, and your client will eventually download and index them all. Be aware that the PTR has been growing since 2011 and now has hundreds of millions of mappings. As of 2020-03, it requires about 4GB of bandwidth and file storage, and your database itself will grow by 25GB! It will take hours of total processing time to fully synchronise. Because of mechanical drive latency, HDDs are often too slow to process hundreds of millions of tags in reasonable time. Syncing with large repositories is only recommended if your hydrus db is on an SSD. Even then, it is best left to work on this in small pieces in the background, either during idle time or shutdown time, so unless you are an advanced user, just leave it to download and process on its own--it usually takes a couple of weeks to quietly catch up.

You can watch more detailed synchronisation progress in the services->review services window.

Your new service should now be listed on the left of the manage tags dialog. Adding tags to a repository works very similarly to the 'my tags' service except hitting 'apply' will not immediately confirm your changes--it will put them in a queue to be uploaded. These 'pending' tags will be counted with a plus '+' or minus '-' sign:

Notice that a 'pending' menu has appeared on the main window. This lets you start the upload when you are ready and happy with everything that you have queued.

When you upload your pending tags, they will commit and look to you like any other tag. The tag repository will anonymously bundle them into the next update, which everyone else will download in a day or so. They will see your tags just like you saw theirs.

If you attempt to remove a tag that has been uploaded, you may be prompted to give a reason, creating a petition that a janitor for the repository will review.

I recommend you not spam tags to the public tag repo until you get a rough feel for the guidelines, and my original tag schema thoughts, or just lurk until you get the idea. It roughly follows what you will see on a typical booru. The general rule is to only add factual tags--no subjective opinion.

You can connect to more than one tag repository if you like. When you are in the manage tags dialog, pressing the up or down arrow keys on an empty input switches between your services.

FAQ: why can my friend not see what I just uploaded?

Starting out

Getting started: Downloading

downloading

The hydrus client has a sophisticated and completely user-customisable download system. It can pull from any booru or regular gallery site or imageboard, and also from some special examples like twitter and tumblr. A fresh install will by default have support for the bigger sites, but it is possible, with some work, for any user to create a new shareable downloader for a new site.

The downloader is highly parallelisable, and while the default bandwidth rules should stop you from running too hot and downloading so much at once that you annoy the servers you are downloading from, there are no brakes in the program on what you can get.

It is very important that you take this slow. Many users get overexcited with their new ability to download 500,000 files and then do so, only discovering later that 98% of what they got was junk that they now have to wade through. Figure out what workflows work for you, how fast you process files, what content you actually want, how much bandwidth and hard drive space you have, and prioritise and throttle your incoming downloads to match. If you can realistically only archive/delete filter 50 files a day, there is little benefit to downloading 500 new files a day. START SLOW.

It also takes a decent whack of CPU to import a file. You'll usually never notice this with just one hard drive import going, but if you have twenty different download queues all competing for database access and individual 0.1-second hits of heavy CPU work, you will discover your client starts to judder and lag. Keep it in mind, and you'll figure out what your computer is happy with. I also recommend you try to keep your total loaded files/urls to be under 20,000 to keep things snappy. Remember that you can pause your import queues, if you need to calm things down a bit.

let's do it

Open the new page selector with F9 and then hit download->gallery:

You can do a test download here of a few files if you want, but don't start downloading loads of stuff until you have read about parsing tags!

So, when you want to start a new download, you first select the source with the button--by default, it is probably 'Artstation' for you--and then type in a query in the text box and hit enter. The download will soon start and fill in information, and thumbnails should stream in, just like the hard drive importer. The downloader typically works by walking through the search's gallery pages one by one, queueing up the found files for later download. There are several intentional delays built into the system, so do not worry if work seems to halt for a little while--you will get a feel for it with experience.

The thumbnail panel can only show results from one queue at a time, so double-click on an entry to 'highlight' it, which will show its thumbs and also give more detailed info and controls in the 'highlighted query' panel. I encourage you to explore the highlight panel over time, as it can show and do quite a lot. Double-click again to 'clear' it.

It is a good idea to 'test' larger downloads, either by visiting the site itself for that query, or just waiting a bit and reviewing the first files that come in. Just make sure that you are getting what you thought you would, whether that be verifying that the query text is correct or that the site isn't only giving you bloated gifs or other bad quality files. The 'file limit', which stops the gallery search after the set number of files, is also great for limiting fishing expeditions (such as overbroad searches like 'wide_hips', which on the bigger boorus have 100k+ results and return variable quality). If the gallery search runs out of new files before the file limit is hit, the search will naturally stop (and the entry in the list should gain a ⏹ 'stop' symbol).

Note that some sites only serve 25 or 50 pages of results, despite their indices suggesting hundreds. If you notice that one site always bombs out at, say, 500 results, it may be due to a decision on their end. You can usually test this by visiting the pages hydrus tried in your web browser.

In general, particularly when starting out, artist searches are best. They are usually fewer than a thousand files and have fairly uniform quality throughout.

parsing tags

But we don't just want files--most sites offer tags as well. By default, hydrus does not fetch any tags for downloads. As you use the client, you will figure out what sorts of tags you are interested in and shape your parsing rules appropriately, but for now, let's do a test that just gets everything--click tag import options:

By default, all 'tag import options' objects defer to the client's defaults. Since we want to change this queue from the current default of 'get nothing' to 'get everything', uncheck the top default checkbox and then click 'get tags' on a tag service, whether that is your 'my tags' or the PTR if you have added it. Hit apply and run a simple query for something, like 'blue_eyes' on one of the boorus. Pause its gallery search after a page or two, and then pause the import queue after a dozen or so files come in--they should be really well tagged!

It is easy to get tens of thousands of tags this way. Different sites offer different kinds and qualities of tags, and the client's downloaders (which were designed by me, the dev, or a user) may parse all or only some of them. Many users like to just get everything on offer, but others only ever want, say, 'creator', 'series', and 'character' tags. If you feel brave, click that 'all tags' button on tag import options, which will take you into hydrus's advanced 'tag filter', which allows you to whitelist or blacklist the incoming list of tags according to whatever your preferences are.

The file limit and file/tag import options on the upper panel, if changed, will only apply to new queries. If you want to change the options for an existing queue, either do so on its highlight panel or use the 'set options to queries' button.

Tag import options can get complicated. The blacklist button will let you skip downloading files that have certain tags (perhaps you would like to auto-skip all images with 'gore', 'scat', or 'diaper'?), again using the tag filter. The 'additional tags' also allow you to add some personal tags to all files coming in--for instance, you might like to add 'process into favourites' to your 'my tags' for some query you really like so you can find those files again later and process them separately. That little 'cog' icon button can also do some advanced things. I recommend you start by just getting everything (or nothing, if you really would rather tag everything yourself), and then revisiting it once you have some more experience. Once you have played with this a bit, let's fix your preferences as the new default:

default tag import options

Hit network->downloaders->manage default tag import options. Set a new default for 'file posts', and that will be the default (that we originally turned off above) for all gallery download pages (and subscriptions, which you will learn about later). You can have different TIOs for each site, but again, we will leave it simple for now.

watching threads

If you are an imageboard user, try going to a thread you like and drag-and-drop its URL (straight from your web browser's address bar) onto the hydrus client. It should open up a new 'watcher' page and import the thread's files!

With only one URL to check, watchers are a little simpler than gallery searches, but as that page is likely receiving frequent updates, it checks it over and over until it dies. By default, the watcher's 'checker options' will regulate how quickly it checks based on the speed at which new files are coming in--if a thread is fast, it will check frequently; if it is running slow, it may only check once per day. When a thread falls below a critical posting velocity or 404s, checking stops.

In general, you can leave the checker options alone, but you might like to revisit them if you are always visiting faster or slower boards and find you are missing files or getting DEAD too early.

bandwidth

It will not be too long until you see a "bandwidth free in xxxxx..." message. As a long-term storage solution, hydrus is designed to be polite in its downloading--both to the source server and your computer. The client's default bandwidth rules have some caps to stop big mistakes, spread out larger jobs, and at a bare minimum, no domain will be hit more than once a second.

All the bandwidth rules are completely customisable. They can get quite complicated. I strongly recommend you not look for them until you have more experience. I especially strongly recommend you not ever turn them all off, thinking that will improve something, as you'll probably render the client too laggy to function and get yourself an IP ban from the next server you pull from.

If you want to download 10,000 files, set up the queue and let it work. The client will take breaks, likely even to the next day, but it will get there in time. Many users like to leave their clients on all the time, just running in the background, which makes these sorts of downloads a breeze--you check back in the evening and discover your download queues, watchers, and subscriptions have given you another thousand things to deal with.

Again: the real problem with downloading is not finding new things, it is keeping up with what you get. Start slow and figure out what is important to your bandwidth budget, hard drive budget, and free time budget. Almost everyone fails at this.

subscriptions

Subscriptions are a way to automatically recheck a good query in future, to keep up with new files. Many users come to use them. When you are comfortable with downloaders and have an idea of what you like, come back and read the subscription help, which is here.

other downloading

There are two other ways of downloading, mostly for advanced or one-off use.

The url downloader works like the gallery downloader but does not do searches. You can paste downloadable URLs to it, and it will work through them as one list. Dragging and dropping recognisable URLs onto the client (e.g. from your web browser) will also spawn and use this downloader.

The simple downloader will do very simple parsing for unusual jobs. If you want to download all the images in a page, or all the image link destinations, this is the one to use. There are several default parsing rules to choose from, and if you learn the downloader system yourself, it will be easy to make more.

logins

The client now supports a flexible (but slightly prototype and ugly) login system. It can handle simple sites and is as completely user-customisable as the downloader system. The client starts with multiple login scripts by default, which you can review under network->downloaders->manage logins:

Many sites grant all their content without you having to log in at all, but others require it for NSFW or special content, or you may wish to take advantage of site-side user preferences like personal blacklists. If you wish, you can give hydrus some login details here, and it will try to login--just as a browser would--before it downloads anything from that domain.

For multiple reasons, I do not recommend you use important accounts with hydrus. Use a throwaway account you don't care much about.

To start using a login script, select the domain and click 'edit credentials'. You'll put in your username/password, and then 'activate' the login for the domain, and that should be it! The next time you try to get something from that site, the first request will wait (usually about ten seconds) while a login popup performs the login. Most logins last for about thirty days (and many refresh that 30-day timer every time you make a new request), so once you are set up, you usually never notice it again, especially if you have a subscription on the domain.

Most sites only have one way of logging in, but hydrus does support more. Hentai Foundry is a good example--by default, the client performs the 'click-through' login as a guest, which requires no credentials and means any hydrus client can get any content from the start. But this way of logging in only lasts about 60 minutes or so before having to be refreshed, and it does not hide any spicy stuff, so if you use HF a lot, I recommend you create a throwaway account, set the filters you like in your HF profile (e.g. no guro content), and then click the 'change login script' in the client to the proper username/pass login.

The login system is new and still a bit experimental. Don't try to pull off anything too weird with it! If anything goes wrong, it will likely delay the script (and hence the whole domain) from working for a while, or invalidate it entirely. If the error is something simple, like a password typo or current server maintenance, go back to this dialog to fix and scrub the error and try again. If the site just changed its layout, you may need to update the login script. If it is more complicated, please contact me, hydrus_dev, with the details!

If you would like to login to a site that is not yet supported by hydrus (usually ones with a Captcha in the login page), see about getting a web browser add-on that lets you export a cookies.txt (either for the whole browser or just for that domain) and then drag and drop that file onto the hydrus network->data->review session cookies dialog. This sometimes does not work if your add-on's export formatting is unusual. If it does work, hydrus will import and use those cookies, which skips the login by making your hydrus pretend to be your browser directly. This is obviously advanced and hacky, so if you need to do it, let me know how you get on and what tools you find work best!

Starting out

Getting started: Ratings

The hydrus client supports two kinds of ratings: like/dislike and numerical. Let's start with the simpler one:

like/dislike

This can set one of two values to a file. It does not have to represent like or dislike--it can be anything you want. Go to services->manage services->local->like/dislike ratings:

You can set a variety of colours and shapes.

numerical

This is '3 out of 5 stars' or '8/10'. You can set the range to whatever whole numbers you like:

As well as the shape and colour options, you can set how many 'stars' to display and whether 0/10 is permitted.

If you change the star range at a later date, any existing ratings will be 'stretched' across the new range. As values are collapsed to the nearest integer, this is best done for scales that are multiples. 2/5 will neatly become 4/10 on a zero-allowed service, for instance, and 0/4 can nicely become 1/5 if you disallow zero ratings in the same step. If you didn't intuitively understand that, just don't touch the number of stars or zero rating checkbox after you have created the numerical rating service!

now what?

Ratings are displayed in the top-right of the media viewer:

Hovering over each control will pop up its name, in case you forget which is which. You can set then them with a left- or right-click. Like/dislike and numerical have slightly different click behaviour, so have a play with them to get their feel. Pressing F4 on a selection of thumbnails will open a dialog with a very similar layout, which will let you set the same rating to many files simultaneously.

Once you have some ratings set, you can search for them using system:rating, which produces this dialog:

On my own client, I find it useful to have several like/dislike ratings set up as one-click pseudo-tags, like the 'OP images' above.

Starting out

Access keys

The PTR is now run by users with more bandwidth than I had to give, so the bandwidth limits are gone! If you would like to talk with the new management, please check the discord.

A guide and schema for the new PTR is here.

first off

I have purposely not pre-baked any default repositories into the client. You have to choose to connect yourself. The client will never connect anywhere until you tell it to.

For a long time, I ran the Public Tag Repository. It grew to 650 million tags and I no longer had the bandwidth or janitor time it deserved. It is now run by users.

I created a 'frozen' copy of the PTR when I stopped running it. If you are an advanced user, you can run your own new tag repository starting from that frozen point or, if you know python or SQLite and wish to play around with its data, get more easily accessible Hydrus Tag Archives of its tags and siblings and pairs, right here.

easy setup

Hit help->add the public tag repository and you will all be set up.

manually

To add a new repository to your client, hit services->manage services and click the add button:

Here's the info so you can copy it:

It is worth checking the 'test address' and 'test access key' buttons just to double-check your firewall and key are all correct.

Tags are rich, cpu-intensive metadata. The Public Tag Repository has hundreds of millions of mappings, and your client will eventually download and index them all. Be aware that the PTR has been growing since 2011 and now has hundreds of millions of mappings. As of 2020-03, it requires about 4GB of bandwidth and file storage, and your database itself will grow by 25GB! It will take hours of total processing time to fully synchronise. Because of mechanical drive latency, HDDs are often too slow to process hundreds of millions of tags in reasonable time. Syncing with large repositories is only recommended if your hydrus db is on an SSD. Even then, it is best left to work on this in small pieces in the background, either during idle time or shutdown time, so unless you are an advanced user, just leave it to download and process on its own--it usually takes a couple of weeks to quietly catch up.

jump-starting an install

A user kindly manages a store of update files and pre-processed empty client databases to get your synced quicker. This is generally recommended for advanced users or those following a guide, but if you are otherwise interested, please check it out:

https://cuddlebear92.github.io/Quicksync/

The Next Step

The Next Step

more getting started with files

exporting and uploading

There are many ways to export files from the client:

The Next Step

adding new downloaders

all downloaders are user-creatable and -shareable

Since the big downloader overhaul, all downloaders can be created, edited, and shared by any user. Creating one from scratch is not simple, and it takes a little technical knowledge, but importing what someone else has created is easy.

Hydrus objects like downloaders can sometimes be shared as data encoded into png files, like this:

This contains all the information needed for a client to add a realbooru tag search entry to the list you select from when you start a new download or subscription.

You can get these pngs from anyone who has experience in the downloader system. An archive is maintained here.

To 'add' the easy-import pngs to your client, hit network->downloaders->import downloaders. A little image-panel will appear onto which you can drag-and-drop these png files. The client will then decode and go through the png, looking for interesting new objects and automatically import and link them up without you having to do any more. Your only further input on your end is a 'does this look correct?' check right before the actual import, just to make sure there isn't some mistake or other glaring problem.

Objects imported this way will take precedence over existing functionality, so if one of your downloaders breaks due to a site change, importing a fixed png here will overwrite the broken entries and become the new default.

The Next Step

thoughts on a public tagging schema

This document was originally written for when I ran the Public Tag Repository. This is now run by users, so I am no longer an authority for it. I am briefly editing the page and leaving it as a record for some of my thoughts on tagging if you are interested. You can, of course, run your own tag repositories and do your own thing additionally or instead.

A newer guide and schema for the PTR is here.

seriousness of schema

This is not all that important; it just makes searches and cooperation easier if most of us can mostly follow some guidelines.

We will never be able to easily and perfectly categorise every single image to everyone's satisfaction, so there is no point defining every possible rule for every possible situation. If you do something that doesn't fit, fixing mistakes is not difficult.

If you are still not confident, just lurk for a bit. See how other people have tagged the popular images and do more of that.

you can add pretty much whatever the hell you want, but don't screw around

The most important thing is: if your tag is your opinion, don't add it. 'beautiful' is an unhelpful tag because no one can agree on what it means. 'lingerie', 'blue eyes', and 'male' or 'female' are better since reasonable people can generally agree on what they mean. If someone thinks blue-eyed women are beautiful, they can search for that to find beautiful things.

You can start your own namespaces, categorisation systems, whatever. Just be aware that everyone else will see what you do.

If you are still unsure about the difference between objective and subjective, here's some more examples:

Of course, if you are tagging a picture of someone holding a sign that says 'beautiful', you can bend the rules. Otherwise, please keep your opinions to yourself!

numbers

Numbers should be written '22', '1457 ce', and 'page:3', unless as part of an official title like 'ocean's eleven'. When the client parses and sorts numbers, it does so intelligently, so just use '1' where you might before have done '01' or '001'. I know it looks ugly sometimes to have '2 girls' or '1 cup', but the rules for writing numbers out in full are hazy for special cases.

(Numbers written as 123 are also readable by many different language-speakers, while 'tano', 'deux' and 'seven' are not.)

plurals

Nouns should generally be singular, not plural. 'chair' instead of 'chairs', 'cat' instead of 'cats', even if there are several of the thing in the image. If there really are many of the thing in the image, add a seperate 'multiple' or 'lineup' tag as apppropriate.

Ignore this when the thing is normally said in its plural (usually paired) form. Say 'blue eyes', not 'blue eye'; 'breasts', not 'breast', even if only one is pictured.

acronyms and synonyms

I personally prefer the full 'series:the lord of the rings' rather than 'lotr'. If you are an advanced user, please help out with tag siblings to help induce this.

character:anna (frozen)

I am not fond of putting a series name after a character because it looks unusual and is applied unreliably. It is done to separate same-named characters from each other (particularly when they have no canon surname), which is useful in places that search slowly, have thin tag areas on their web pages, or usually only deal in single-tag searches. For archival purposes, I generally prefer that namespaces are stored as the namespace and nowhere else. 'series:harry potter' and 'character:harry potter', not 'harry potter (harry potter)'. Some sites even say things like 'anna (disney)'. It isn't a big deal, but if you are adding a sibling to collapse these divergent tags into the 'proper' one, I'd prefer it all went to the simple and reliable 'character:anna'. Even better would be migrating towards a canon-ok unique name, like 'character:princess anna of arendelle', which could have the parent 'series:frozen'.

Including nicknames, like 'character:angela "mercy" ziegler' can be useful to establish uniqueness, but are not mandatory. 'character:harleen "harley quinn" frances quinzel' is probably overboard.

protip: rein in your spergitude

In developing hydrus, I have discovered two rules to happy tagging:

  1. Don't try to be perfect.
  2. Only add those tags you actually use in searches.

Tagging can be fun, but it can also be complicated, and the problem space is gigantic. There is always works to do, and it is easy to exhaust onesself or get lost in the bushes agonising over whether to use 'smile' or 'smiling' or 'smirk' or one of a million other split hairs. Problems are easy to fix, and this marathon will never finish, so do not try to sprint. The ride never ends.

The sheer number of tags can also be overwhelming. Importing all the many tags from the boorus is totally fine, but if you are typing tags yourself, I suggest you try not to exhaustively tag everything in the image. You will save a lot of time and ultimately be much happier with your work. Anyone can see what is in an image just by looking at it--tags are primarily for finding things. Character, series and creator namespaces are a great place to start. After that, add what you are interested in, be that 'blue sky' or 'midriff'.

newer thoughts on presentation preferences

Since developing and receiving feedback for the siblings system, and then in dealing with siblings with the PTR, I have come to believe that the most difficult disagreement to resolve in tagging is not in what is in an image, but how those tags should present. It is easy to agree that an image contains a 'bikini', but should that show as 'bikini' or 'clothing:bikini' or 'general:bikini' or 'swimwear:bikini'? Which is better?

This is impossible to answer definitively. There is no perfect dictionary that satisfies everyone, and opinions are fairly fixed. My intentions for future versions of the sibling and tag systems is to allow users to broadly tell the client some display rules such as 'Whenever you have a clothing: tag, display it as unnamespaced' and eventually more sophisticated ones like 'I prefer slang, so show pussy instead of vagina'.

siblings and parents

Please do add siblings and parents! If it is something not obvious, please explain the relationship in your submitted reason. If it is something obvious (e.g. 'wings' is a parent of 'angel wings'), don't bother to put a reason in; I'll just approve it.

My general thoughts:

The Next Step

Getting started with subscriptions

Do not try to create a subscription until you are comfortable with a normal gallery download page! Go here.

Let's say you found an artist you like. You downloaded everything of theirs from some site, but one or two pieces of new work is posted every week. You'd like to keep up with the new stuff, but you don't want to manually make a new download job every week for every single artist you like.

what are subs?

Subscriptions are a way of telling the client to regularly and quietly repeat a gallery search. You set up a number of saved queries, and the client will 'sync' with the latest files in the gallery and download anything new, just as if you were running the download yourself.

Subscriptions only work for booru-like galleries that put the newest files first, and they only keep up with new content--once they have done their first sync, which usually gets the most recent hundred files or so, they will never reach further into the past. Getting older files, as you will see later, is a job best done with a normal download page.

Here's the dialog, which is under network->downloaders->manage subscriptions:

This is a very simple example--there is only one subscription, for safebooru. It has two 'queries' (i.e. searches to keep up with).

It is important to note that while subscriptions can have multiple queries (even hundreds!), they generally only work on one site. Expect to create one subscription for safebooru, one for artstation, one for paheal, and so on for every site you care about. Advanced users may be able to think of ways to get around this, but I recommend against it as it throws off some of the internal check timing calculations.

Before we trip over the advanced buttons here, let's zoom in on the actual subscription:

This is a big and powerful panel! I recommend you open the screenshot up in a new browser tab, or in the actual client, so you can refer to it.

Despite all the controls, the basic idea is simple: Up top, I have selected the 'safebooru tag search' download source, and then I have added two artists--"hong_soon-jae" and "houtengeki". These two queries have their own panels for reviewing what URLs they have worked on and further customising their behaviour, but all they really are is little bits of search text. When the subscription runs, it will put the given search text into the given download source just as if you were running the regular downloader.

For the most part, all you need to do to set up a good subscription is give it a name, select the download source, and use the 'paste queries' button to paste what you want to search. Subscriptions have great default options for almost all query types, so you don't have to go any deeper than that to get started.

Do not change the max number of new files options until you know exactly what they do and have a good reason to alter them!

how do subscriptions work?

Once you hit ok on the main subscription dialog, the subscription system should immediately come alive. If any queries are due for a 'check', they will perform their search and look for new files (i.e. URLs it has not seen before). Once that is finished, the file download queue will be worked through as normal. Typically, the sub will make a popup like this while it works:

The initial sync can sometimes take a few minutes, but after that, each query usually only needs thirty seconds' work every few days. If you leave your client on in the background, you'll rarely see them. If they ever get in your way, don't be afraid to click their little cancel button or call a global halt with network->pause->subscriptions--the next time they run, they will resume from where they were before.

Similarly, the initial sync may produce a hundred files, but subsequent runs are likely to only produce one to ten. If a subscription comes across a lot of big files at once, it may not download them all in one go--but give it time, and it will catch back up before you know it.

When it is done, it leaves a little popup button that will open a new page for you:

This can often be a nice surprise!

what makes a good subscription?

The same rules as for downloaders apply: start slow, be hesitant, and plan for the long-term. Artist queries make great subscriptions as they update reliably but not too often and have very stable quality. Pick the artists you like most, see where their stuff is posted, and set up your subs like that.

Series and character subscriptions are sometimes valuable, but they can be difficult to keep up with and have highly variable quality. It is not uncommon for users to only keep 15% of what a character sub produces. I do not recommend them for anything but your waifu.

Attribute subscriptions like 'blue_eyes' or 'smile' make for terrible subs as the quality is all over the place and you will be inundated by too much content. The only exceptions are for specific, low-count searches that really matter to you, like 'contrapposto' or 'gothic trap thighhighs'.

If you end up subscribing to eight hundred things and get ten thousand new files a week, you made a mistake. Subscriptions are for keeping up with things you like. If you let them overwhelm you, you'll resent them.

Subscriptions syncs are somewhat fragile. Do not try to play with the limits or checker options to download a whole 5,000 file query in one go--if you want everything for a query, run it in the manual downloader and get everything, then set up a normal sub for new stuff. There is no benefit to having a 'large' subscription, and it will trim itself down in time anyway.

It is a good idea to run a 'full' download for a search before you set up a subscription. As well as making sure you have the exact right query text and that you have everything ever posted (beyond the 100 files deep a sub will typically look), it saves the bulk of the work (and waiting on bandwidth) for the manual downloader, where it belongs. When a new subscription picks up off a freshly completed download queue, its initial subscription sync only takes thirty seconds since its initial URLs are those that were already processed by the manual downloader. I recommend you stack artist searches up in the manual downloader using 'no limit' file limit, and when they are all finished, select them in the list and right-click->copy queries, which will put the search texts in your clipboard, newline-separated. This list can be pasted into the subscription dialog in one go with the 'paste queries' button again!

The entire subscription system assumes the source is a typical 'newest first' booru-style search. If you dick around with some order_by:rating/random metatag, it will not work reliably.

how often do subscriptions check?

Hydrus subscriptions use the same variable-rate checking system as its thread watchers, just on a larger timescale. If you subscribe to a busy feed, it might check for new files once a day, but if you enter an artist who rarely posts, it might only check once every month. You don't have to do anything. The fine details of this are governed by the 'checker options' button. This is one of the things you should not mess with as you start out.

If a query goes too 'slow' (typically, this means no new files for 180 days), it will be marked DEAD in the same way a thread will, and it will not be checked again. You will get a little popup when this happens. This is all editable as you get a better feel for the system--if you wish, it is completely possible to set up a sub that never dies and only checks once a year.

I do not recommend setting up a sub that needs to check more than once a day. Any search that is producing that many files is probably a bad fit for a subscription. Subscriptions are for lightweight searches that are updated every now and then.


(you might like to come back to this point once you have tried subs for a week or so and want to refine your workflow)


ok, I set up three hundred queries, and now these popup buttons are a hassle

One the edit subscription panel, the 'presentation' options let you publish files to a page. The page will have the subscription's name, just like the button makes, but it cuts out the middle-man and 'locks it in' more than the button, which will be forgotten if you restart the client. Also, if a page with that name already exists, the new files will be appended to it, just like a normal import page! I strongly recommend moving to this once you have several subs going. Make a 'page of pages' called 'subs' and put all your subscription landing pages in there, and then you can check it whenever is convenient.

If you discover your subscription workflow tends to be the same for each sub, you can also customise the publication 'label' used. If multiple subs all publish to the 'nsfw subs' label, they will all end up on the same 'nsfw subs' popup button or landing page. Sending multiple subscriptions' import streams into just one or two locations like this can be great.

You can also hide the main working popup. I don't recommend this unless you are really having a problem with it, since it is useful to have that 'active' feedback if something goes wrong.

Note that subscription file import options will, by default, only present 'new' files. Anything already in the db will still be recorded in the internal import cache and used to calculate next check times and so on, but it won't clutter your import stream. This is different to the default for all the other importers, but when you are ready to enter the ranks of the Patricians, you will know to edit your 'loud' default file import options under options->importing to behave this way as well. Efficient workflows only care about new files.

how exactly does the sync work?

Figuring out when a repeating search has 'caught up' can be a tricky problem to solve. It sounds simple, but unusual situations like 'a file got tagged late, so it inserted deeper than it ideally should in the gallery search' or 'the website changed its URL format completely, help' can cause problems. Subscriptions are automatic systems, so they tend to be a bit more careful and paranoid about problems, lest they burn 10GB on 10,000 unexpected diaperfur images.

The initial sync is simple. It does a regular search, stopping if it reaches the 'initial file limit' or the last file in the gallery, whichever comes first. The default initial file sync is 100, which is a great number for almost all situations.

Subsequent syncs are more complicated. It ideally 'stops' searching when it reaches files it saw in a previous sync, but if it comes across new files mixed in with the old, it will search a bit deeper. It is not foolproof, and if a file gets tagged very late and ends up a hundred deep in the search, it will probably be missed. There is no good and computationally cheap way at present to resolve this problem, but thankfully it is rare.

Remember that an important 'staying sane' philosophy of downloading and subscriptions is to focus on dealing with the 99.5% you have before worrying about the 0.5% you do not.

The amount of time between syncs is calculated by the checker options. Based on the timestamps attached to existing urls in the subscription cache (either added time, or the post time as parsed from the url), the sub estimates how long it will be before n new files appear, and then next check is scheduled for then. Unless you know what you are doing, checker options, like file limits, are best left alone. A subscription will naturally adapt its checking speed to the file 'velocity' of the source, and there is usually very little benefit to trying to force a sub to check at a radically different speed.

If you want to force your subs to run at the same time, say every evening, it is easier to just use network->pause->subscriptions as a manual master on/off control. The ones that are due will catch up together, the ones that aren't won't waste your time.

Remember that subscriptions only keep up with new content. They cannot search backwards in time in order to 'fill out' a search, nor can they fill in gaps. Do not change the file limits or check times to try to make this happen. If you want to ensure complete sync with all existing content for a particular search, use the manual downloader.

In practice, most subs only need to check the first page of a gallery since only the first two or three urls are new.

periodic file limit exceeded

If, during a regular sync, the sub keeps finding new URLs, never hitting a block of already-seen URLs, it will stop upon hitting its 'periodic file limit', which is also usually 100. When it happens, you will get a popup message notification. There are two typical reasons for this:

The first case is a natural accident of statistics. The subscription now has a 'gap' in its sync. If you want to get what you missed, you can try to fill in the gap with a manual downloader page. Just download to 200 files or so, and the downloader will work quickly to one-time work through the URLs in the gap.

The second case is a safety stopgap for hydrus. If a site decides to have /post/123456 style URLs instead of post.php?id=123456 style, hydrus will suddenly see those as entirely 'new' URLs. It could also be because of an updated downloader, which pulls URLs in API format or similar. This is again thankfully quite rare, but it triggers several problems--the associated downloader usually breaks, as it does not yet recognise those new URLs, and all your subs for that site will parse through and hit the periodic limit for every query. When this happens, you'll usually get several periodic limit popups at once, and you may need to update your downloader. If you know the person who wrote the original downloader, they'll likely want to know about the problem, or may already have a fix sorted. It is often a good idea to pause the affected subs until you have it figured out and working in a normal gallery downloader page.

I put character queries in my artist sub, and now things are all mixed up

On the main subscription dialog, there are 'merge' and 'separate' buttons. These are powerful, but they will walk you through the process of pulling queries out of a sub and merging them back into a different one. Only subs that use the same download source can be merged. Give them a go, and if it all goes wrong, just hit the cancel button on the dialog.

The Next Step

Filtering Duplicates

duplicates

As files are shared on the internet, they are often resized, cropped, converted to a different format, altered by the original or a new artist, or turned into a template and reinterpreted over and over and over. Even if you have a very restrictive importing workflow, your client is almost certainly going to get some duplicates. Some will be interesting alternate versions that you want to keep, and others will be thumbnails and other low-quality garbage you accidentally imported and would rather delete. Along the way, it would be nice to merge your ratings and tags to the better files so you don't lose any work.

Finding and processing duplicates within a large collection is impossible to do by hand, so I have written a system to do the heavy lifting for you. It currently works on still images, but an extension for gifs and video is planned.

Hydrus finds potential duplicates using a search algorithm that compares images by their shape. Once these pairs of potentials are found, they are presented to you through a filter like the archive/delete filter to determine their exact relationship and if you want to make a further action, such as deleting the 'worse' file of a pair. All of your decisions build up in the database to form logically consistent groups of duplicates and 'alternate' relationships that can be used to infer future information. For instance, if you say that file A is a duplicate of B and B is a duplicate of C, A and C are automatically recognised as duplicates as well.

This all starts on--

the duplicates processing page

On the normal 'new page' selection window, hit special->duplicates processing. This will open this page:

Let's go to the preparation page first:

The 'similar shape' algorithm works on distance. Two files with 0 distance are likely exact matches, such as resizes of the same file or lower/higher quality jpegs, whereas those with distance 4 tend to be to be hairstyle or costume changes. You will be starting on distance 0 and not expect to ever go above 4 or 8 or so. Going too high increases the danger of being overwhelmed by false positives.

If you are interested, the current version of this system uses a 64-bit phash to represent the image shape and a VPTree to search different files' phashes' relative hamming distance. I expect to extend it in future with multiple phash generation (flips, rotations, and 'interesting' image crops and video frames) and most-common colour comparisons.

Searching for duplicates is fairly fast per file, but with a large client with hundreds of thousands of files, the total CPU time adds up. You can do a little manual searching if you like, but once you are all settled here, I recommend you hit the cog icon on the preparation page and let hydrus do this page's catch-up search work in your regular maintenance time. It'll swiftly catch up and keep you up to date without you even thinking about it.

Start searching on the 'exact match' search distance of 0. It is generally easier and more valuable to get exact duplicates out of the way first.

Once you have some files searched, you should see a potential pair count appear in the 'filtering' page.

the filtering page

Processing duplicates can be real trudge-work if you do not set up a workflow you enjoy. It is a little slower than the archive/delete filter, and sometimes takes a bit more cognitive work. For many users, it is a good task to do while listening to a podcast or having a video going on another screen.

If you have a client with tens of thousands of files, you will likely have thousands of potential pairs. This can be intimidating, but do not worry--due to the A, B, C logical inferrences as above, you will not have to go through every single one. The more information you put into the system, the faster the number will drop.

The filter has a regular file search interface attached. As you can see, it defaults to system:everything, but you can limit what files you will be working on simply by adding new search predicates. You might like to only work on files in your archive (i.e. that you know you care about to begin with), for instance. You can choose whether both files of the pair should match the search, or just one. 'creator:' tags work very well at cutting the search domain to something more manageable and consistent--try your favourite creator!

If you would like an example from the current search domain, hit the 'show some random potential pairs' button, and it will show two or more files that seem related. It is often interesting and surprising to see what it finds! The action buttons below allow for quick processing of these pairs and groups when convenient (particularly for large cg sets with 100+ alternates), but I recommend you leave these alone until you know the system better.

When you are ready, launch the filter.

the duplicates filter

We have not set up your duplicate 'merge' options yet, so do not get too into this. For this first time, just poke around, make some pretend choices, and then cancel out and choose to forget them.

Like the archive/delete filter, this uses quick mouse-clicks, keyboard shortcuts, or button clicks to action pairs. It presents two files at a time, labelled A and B, which you can quickly switch between just as in the normal media viewer. As soon as you action them, the next pair is shown. The two files will have their current zoom-size locked so they stay the same size (and in the same position) as you switch between them. Scroll your mouse wheel a couple of times and see if any obvious differences stand out.

Please note the hydrus media viewer does not currently work well with large resolutions at high zoom (it gets laggy and may have memory issues). Don't zoom in to 1600% and try to look at jpeg artifact differences on very large files, as this is simply not well supported yet.

The hover window on the right also presents a number of 'comparison statements' to help you make your decision. Green statements mean this current file is probably 'better', and red the opposite. Larger, older, higher-quality, more-tagged files are generally considered better. These statements have scores associated with them (which you can edit in file->options->duplicates), and the file of the pair with the highest score is presented first. If the files are duplicates, you can generally assume the first file you see, the 'A', is the better, particularly if there are several green statements.

The filter will need to occasionally checkpoint, saving the decisions so far to the database, before it can fetch the next batch. This allows it to apply inferred information from your current batch and reduce your pending count faster before serving up the next set. It will present you with a quick interstitial 'confirm/back' dialog just to let you know. This happens more often as the potential count decreases.

the decisions to make

There are three ways a file can be related to another in the current duplicates system: duplicates, alternates, or false positive (not related).

False positive (not related) is the easiest. You will not see completely unrelated pairs presented very often in the filter, particularly at low search distances, but if the shape of face and hair and clothing happen to line up (or geometric shapes, often), the search system may make a false positive match. In this case, just click 'they are not related'.

Alternate relations are files that are not duplicates but obviously related in some way. Perhaps a costume change or a recolour. Hydrus does not have rich alternate support yet (but it is planned, and highly requested), so this relationship is mostly a 'holding area' for files that we will revisit for further processing in the future.

Duplicate files are of the exact same thing. They may be different resolutions, file formats, encoding quality, or one might even have watermark, but they are fundamentally different views on the exact same art. As you can see with the buttons, you can select one file as the 'better' or say they are about the same. If the files are basically the same, there is no point stressing about which is 0.2% better--just click 'they are the same'. For better/worse pairs, you might have reason to keep both, but most of the time I recommend you delete the worse.

You can customise the shortcuts under file->shortcuts->duplicate_filter. The defaults are:

merging metadata

If two duplicates have different metadata like tags or archive status, you probably want to merge them. Cancel out of the filter and click the 'edit default duplicate metadata merge options' button:

By default, these options are fairly empty. You will have to set up what you want based on your services and preferences. Setting a simple 'copy all tags' is generally a good idea, and like/dislike ratings also often make sense. The settings for better and same quality should probably be similar, but it depends on your situation.

If you choose the 'custom action' in the duplicate filter, you will be presented with a fresh 'edit duplicate merge options' panel for the action you select and can customise the merge specifically for that choice. ('favourite' options will come here in the future!)

Once you are all set up here, you can dive into the duplicate filter. Please let me know how you get on with it!

what now?

The duplicate system is still incomplete. Now the db side is solid, the UI needs to catch up. Future versions will show duplicate information on thumbnails and the media viewer and allow quick-navigation to a file's duplicates and alternates.

For now, if you wish to see a file's duplicates, right-click it and select file relationships. You can review all its current duplicates, open them in a new page, appoint the new 'best file' of a duplicate group, and even mass-action selections of thumbnails.

You can also search for files based on the number of file relations they have (including when setting the search domain of the duplicate filter!) using system:file relationships. You can also search for best/not best files of groups, which makes it easy, for instance, to find all the spare duplicate files if you decide you no longer want to keep them.

I expect future versions of the system to also auto-resolve easy duplicate pairs, such as clearing out pixel-for-pixel png versions of jpgs.

game cgs

If you import a lot of game CGs, which frequently have dozens or hundreds of alternates, I recommend you set them as alternates by selecting them all and setting the status through the thumbnail right-click menu. The duplicate filter, being limited to pairs, needs to compare all new members of an alternate group to all other members once to verify they are not duplicates. This is not a big deal for alternates with three or four members, but game CGs provide an overwhelming edge case. Setting a group of thumbnails as alternate 'fixes' their alternate status immediately, discounting the possibility of any internate duplicates, and provides an easy way out of this situation.

more information and examples

the duplicates system

(advanced nonsense, you can skip this section. tl;dr: duplicate file groups keep track of their best quality file, sometimes called the King)

Hydrus achieves duplicate transitivity by treating duplicate files as groups. Although you action pairs, if you set (A duplicate B), that creates a group (A,B). Subsequently setting (B duplicate C) extends the group to be (A,B,C), and so (A duplicate C) is transitively implied.

The first version of the duplicate system attempted to record better/worse/same information for all files in a virtual duplicate group, but this proved very complicated, workflow-heavy, and not particularly useful. The new system instead appoints a single King as the best file of a group. All other files in the group are beneath the King and have no other relationship data retained.

This King represents the group in the duplicate filter (and in potential pairs, which are actually recorded between duplicate media groups--even if most of them at the outset only have one member). If the other file in a pair is considered better, it becomes the new King, but if it is worse or equal, it merges into the other members. When two Kings are compared, whole groups can merge!

Alternates are stored in a similar way, except the members are duplicate groups rather than individual files and they have no significant internal relationship metadata yet. If α, β, and γ are duplicate groups that each have one or more files, then setting (α alt β) and (β alt γ) creates an alternate group (α,β,γ), with the caveat that α and γ will still be sent to the duplicate filter once just to check they are not duplicates by chance. The specific file members of these groups, A, B, C and so on, inherit the relationships of their parent groups when you right-click on their thumbnails.

False positive relationships are stored between pairs of alternate groups, so they apply transitively between all the files of either side's alternate group. If (α alt β) and (ψ alt ω) and you apply (α fp ψ), then (α fp ω), (β fp ψ), and (β fp ω) are all transitively implied.

Some fun. And simpler.

The Next Step

Reducing program lag

hydrus is cpu and hdd hungry

The hydrus client manages a lot of complicated data and gives you a lot of power over it. To add millions of files and tags to its database, and then to perform difficult searches over that information, it needs to use a lot of CPU time and hard drive time--sometimes in small laggy blips, and occasionally in big 100% CPU chunks. I don't put training wheels or limiters on the software either, so if you search for 300,000 files, the client will try to fetch that many.

In general, the client works best on snappy computers with low-latency hard drives where it does not have to constantly compete with other CPU- or HDD- heavy programs. Running hydrus on your games computer is no problem at all, but if you leave the client on all the time, then make sure under the options it is set not to do idle work while your CPU is busy, so your games can run freely. Similarly, if you run two clients on the same computer, you should have them set to work at different times, because if they both try to process 500,000 tags at once on the same hard drive, they will each slow to a crawl.

If you run on an HDD, keeping it defragged is very important, and good practice for all your programs anyway. Make sure you know what this is and that you do it.

maintenance and processing

I have attempted to offload most of the background maintenance of the client (which typically means repository processing and internal database defragging) to time when you are not using the client. This can either be 'idle time' or 'shutdown time'. The calculations for what these exactly mean are customisable in file->options->maintenance and processing.

If you run a quick computer, you likely don't have to change any of these options. Repositories will synchronise and the database will stay fairly optimal without you even noticing the work that is going on. This is especially true if you leave your client on all the time.

If you have an old, slower computer though, or if your hard drive is high latency, make sure these options are set for whatever is best for your situation. Turning off idle time completely is often helpful as some older computers are slow to even recognise--mid task--that you want to use the client again, or take too long to abandon a big task half way through. If you set your client to only do work on shutdown, then you can control exactly when that happens.

reducing search and general gui lag

Searching for tags via the autocomplete dropdown and searching for files in general can sometimes take a very long time. It depends on many things. In general, the more predicates (tags and system:something) you have active for a search, and the more specific they are, the faster it will be.

You can also look at file->options->speed and memory, again especially if you have a slow computer. Increasing the autocomplete thresholds is very often helpful. You can even force autocompletes to only fetch results when you manually ask for them.

Having lots of thumbnails open or downloads running can slow many things down. Check the 'pages' menu to see your current session weight. If it is about 50,000, or you have individual pages with more than 10,000 files or download URLs, try cutting down a bit.

finally - profiles

Lots of my code remains unoptimised for certain situations. My development environment only has a few thousand images and a few million tags. As I write code, I am usually more concerned with getting it to work at all rather than getting it to work fast for every possible scenario. So, if something is running slow for you, but your computer is otherwise working fine, let me know and I can almost always speed it up.

Let me know:

profile is a large block of debug text that lets me know which parts of my code are running slow for you. A profile for a single call looks like this.

It is very helpful to me to have a profile. You can generate one by going help->debug->xxx profile mode, which tells the client to generate profile information for every subsequent xxx request. This can be spammy, so don't leave it on for a very long time (you can turn it off by hitting the help menu entry again).

For most problems, you probably want db profile mode.

Turn on a profile mode, do the thing that runs slow for you (importing a file, fetching some tags, whatever), and then check your database folder (most likely install_dir/db) for a new 'client profile - DATE.log' file. This file will be filled with several sets of tables with timing information. Please send that whole file to me, or if it is too large, cut what seems important. It should not contain any personal information, but feel free to look through it.

There are several ways to contact me.

Advanced Usage

Advanced Usage

Advanced usage: General

this is non-comprehensive

I am always changing and adding little things. The best way to learn is just to look around. If you think a shortcut should probably do something, try it out! If you can't find something, let me know and I'll try to add it!

advanced mode

To avoid confusing clutter, several advanced menu items and buttons are hidden by default. When you are comfortable with the program, hit help->advanced mode to reveal them!

searching with wildcards

The autocomplete tag dropdown supports wildcard searching with '*'.

The '*' will match any number of characters. Every normal autocomplete search has a secret '*' on the end that you don't see, which is how full words get matched from you only typing in a few letters.

This is useful when you can only remember part of a word, or can't spell part of it. You can put '*' characters anywhere, but you should experiment to get used to the exact way these searches work. Some results can be surprising!

You can select the special predicate inserted at the top of your autocomplete results (the highlighted '*gelion' and '*va*ge*' above). It will return all files that match that wildcard, i.e. every file for every other tag in the dropdown list.

This is particularly useful if you have a number of files with commonly structured over-informationed tags, like this:

In this case, selecting the 'title:cool pic*' predicate will return all three images in the same search, where you can conveniently give them some more-easily searched tags like 'series:cool pic' and 'page:1', 'page:2', 'page:3'.

exclude deleted files

In the client's options is a checkbox to exclude deleted files. It recurs pretty much anywhere you can import, under 'import file options'. If you select this, any file you ever deleted will be excluded from all future remote searches and import operations. This can stop you from importing/downloading and filtering out the same bad files several times over. The default is off. You may wish to have it set one way most of the time, but switch it the other just for one specific import or search.

inputting non-english lanuages

If you typically use an IME to input Japanese or another non-english language, you may have encountered problems entering into the autocomplete tag entry control in that you need Up/Down/Enter to navigate the IME, but the autocomplete steals those key presses away to navigate the list of results. To fix this, press Insert to temporarily disable the autocomplete's key event capture. The autocomplete text box will change colour to let you know it has released its normal key capture. Use your IME to get the text you want, then hit Insert again to restore the autocomplete to normal behaviour.

tag display

If you do not like a particular tag or namespace, you can easily hide it with services->manage tag display:

This image is out of date, sorry!

You can exclude single tags, like as shown above, or entire namespaces (enter the colon, like 'species:'), or all namespaced tags (use ':'), or all unnamespaced tags (''). 'all known tags' will be applied to everything, as well as any repository-specific rules you set.

A blacklist excludes whatever is listed; a whitelist excludes whatever is not listed.

This censorship is local to your client. No one else will experience your changes or know what you have censored.

importing and adding tags at the same time

Add tags before importing on file->import files lets you give tags to the files you import en masse, and intelligently, using regexes that parse filename:

This should be somewhat self-explanatory to anyone familiar with regexes. I hate them, personally, but I recognise they are powerful and exactly the right tool to use in this case. This is a good introduction.

Once you are done, you'll get something neat like this:

Which you can more easily manage by collecting:

Collections have a small icon in the bottom left corner. Selecting them actually selects many files (see the status bar), and performing an action on them (like archiving, uploading) will do so to every file in the collection. Viewing collections fullscreen pages through their contents just like an uncollected search.

Here is a particularly zoomed out view, after importing volume 2:

Importing with tags is great for long-running series with well-formatted filenames, and will save you literally hours' finicky tagging.

tag migration

At some point I will write some better help for this system, which is powerful. Be careful with it!

Sometimes, you may wish to move thousands or millions of tags from one place to another. These actions are now collected in one place: services->tag migration.

It proceeds from left to right, reading data from the source and applying it to the destination with the certain action. There are multiple filters available to select which sorts of tag mappings or siblings or parents will be selected from the source. The source and destination can be the same, for instance if you wanted to delete all 'clothing:' tags from a service, you would pull all those tags and then apply the 'delete' action on the same service.

You can import from and export to Hydrus Tag Archives (HTAs), which are external, portable .db files. In this way, you can move millions of tags between two hydrus clients, or share with a friend, or import from an HTA put together from a website scrape.

Tag Migration is a powerful system. Be very careful with it. Do small experiments before starting large jobs, and if you intend to migrate millions of tags, make a backup of your db beforehand, just in case it goes wrong.

This system was once much more simple, but it still had HTA support. If you wish to play around with some HTAs, there are some old user-created ones here.

custom shortcuts

Once you are comfortable with manually setting tags and ratings, you may be interested in setting some shortcuts to do it quicker. Try hitting file->shortcuts or clicking the keyboard icon on any media viewer window's top hover window.

There are two kinds of shortcuts in the program--reserved, which have fixed names, are undeletable, and are always active in certain contexts (related to their name), and custom, which you create and name and edit and are only active in a media viewer when you want them to. You can redefine some simple shortcut commands, but most importantly, you can create shortcuts for adding/removing a tag or setting/unsetting a rating.

Use the same 'keyboard' icon to set the current and default custom shortcuts.

finding duplicates

system:similar_to lets you run the duplicates processing page's searches manually. You can either insert the hash and hamming distance manually, or you can launch these searches automatically from the thumbnail right-click->find similar files menu. For example:

truncated/malformed file import errors

Some files, even though they seem ok in another program, will not import to hydrus. This is usually because they file has some 'truncated' or broken data, probably due to a bad upload or storage at some point in its internet history. While sophisticated external programs can usually patch the error (often rendering the bottom lines of a jpeg as grey, for instance), hydrus is not so clever. Please feel free to send or link me, hydrus developer, to these files, so I can check them out on my end and try to fix support.

If the file is one you particularly care about, the easiest solution is to open it in photoshop or gimp and save it again. Those programs should be clever enough to parse the file's weirdness, and then make a nice clean saved file when it exports. That new file should be importable to hydrus.

setting a password

the client offers a very simple password system, enough to keep out noobs. You can set it at database->set a password. It will thereafter ask for the password every time you start the program, and will not open without it. However none of the database is encrypted, and someone with enough enthusiasm or a tool and access to your computer can still very easily see what files you have. The password is mainly to stop idle snoops checking your images if you are away from your machine.

Advanced Usage

Advanced usage: Tag Siblings

quick version

Tag siblings let you replace a bad tag with a better tag.

what's the problem?

Reasonable people often use different words for the same things.

A great example is in Japanese names, which are natively written surname first. character:ayanami rei and character:rei ayanami have the same meaning, but different users will use one, or the other, or even both.

Other examples are tiny syntactic changes, common misspellings, and unique acronyms:

A particular repository may have a preferred standard, but it is not easy to guarantee that all the users will know exactly which tag to upload or search for.

After some time, you get this:

Without continual intervention by janitors or other experienced users to make sure y⊇x (i.e. making the yellow circle entirely overlap the blue by manually giving y to everything with x), searches can only return x (blue circle) or y (yellow circle) or x∩y (the lens-shaped overlap). What we really want is x∪y (both circles).

So, how do we fix this problem?

tag siblings

Let's define a relationship, A->B, that means that any time we would normally see or use tag A or tag B, we will instead only get tag B:

Note that this relationship implies that B is in some way 'better' than A.

ok, I understand; now confuse me

This relationship is transitive, which means as well as saying A->B, you can also say B->C, which implies A->C and B->C.

You can also have an A->C and B->C that does not include A->B.

The outcome of these two arrangements is the same (everything ends up as C), but the underlying semantics are a little different if you ever want to edit them.

Many complicated arrangements are possible:

Note that if you say A->B, you cannot say A->C; the left-hand side can only go to one. The right-hand side can receive many. The client will stop you from constructing loops.

how you do it

Just open services->manage tag siblings, and add a few.

The client will automatically collapse the tagspace to whatever you set. It'll even work with autocomplete, like so:

Please note that siblings' autocomplete counts may be slightly inaccurate, as unioning the count is difficult to quickly estimate.

The client will not collapse siblings anywhere you 'write' tags, such as the manage tags dialog. You will be able to add or remove A as normal, but it will be written in some form of "A (B)" to let you know that, ultimately, the tag will end up displaying in the main gui as B:

Although the client may present A as B, it will secretly remember A! You can remove the association A->B, and everything will return to how it was. No information is lost at any point.

remote siblings

Whenever you add or remove a tag sibling pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that sibling pair. If it is denied, only you will see it.

Advanced Usage

Advanced usage: Tag Parents

quick version

Tag parents let you automatically add a particular tag every time another tag is added. The relationship will also apply retroactively.

what's the problem?

Tags often fall into certain heirarchies. Certain tags always imply certain other tags, and it is annoying and time-consuming to add them all individually every time.

For example, whenever you tag a file with ak-47, you probably also want to tag it assault rifle, and maybe even firearm as well.

Another time, you might tag a file character:eddard stark, and then also have to type in house stark and then series:game of thrones. (you might also think series:game of thrones should actually be series:a song of ice and fire, but that is an issue for siblings)

Drawing more relationships would make a significantly more complicated venn diagram, so let's draw a family tree instead:

tag parents

Let's define the child-parent relationship 'C->P' as saying that tag P is the semantic superset/superclass of tag C. All files that have C should also have P, without exception. When the user tries to add tag C to a file, tag P is added automatically.

Let's expand our weapon example:

In that graph, adding ar-15 to a file would also add semi-automatic riflerifle, and firearm. Searching for handgun would return everything with m1911 and smith and wesson model 10.

This can obviously get as complicated and autistic as you like, but be careful of being too confident--this is just a fun example, but is an AK-47 truly always an assault rifle? Some people would say no, and beyond its own intellectual neatness, what is the purpose of attempting to create such a complicated and 'perfect' tree? Of course you can create any sort of parent tags on your local tags or your own tag repositories, but this sort of thing can easily lead to arguments between reasonable people. I only mean to say, as someone who does a lot of tag work, to try not to create anything 'perfect', as it usually ends up wasting time. Act from need, not toward purpose.

how you do it

Go to services->manage tag parents:

Which looks and works just like the manage tag siblings dialog.

Note that when you hit ok, the client will look up all the files with all your added tag Cs and retroactively apply/pend the respective tag Ps if needed. This could mean thousands of tags!

Once you have some relationships added, the parents and grandparents will show indented anywhere you 'write' tags, such as the manage tags dialog:

Hitting enter on cersei will try to add house lannister and series:game of thrones as well.

remote parents

Whenever you add or remove a tag parent pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that parent pair. If it is denied, only you will see it.

Advanced Usage

Database Migration

the hydrus database

A hydrus client consists of three components:

  1. the software installation

    This is the part that comes with the installer or extract release, with the executable and dlls and a handful of resource folders. It doesn't store any of your settings--it just knows how to present a database as a nice application. If you just run the client executable straight, it looks in its 'db' subdirectory for a database, and if one is not found, it creates a new one. If it sees a database running at a lower version than itself, it will update the database before booting it.

    It doesn't really matter where you put this. An SSD will load it marginally quicker the first time, but you probably won't notice. If you run it without command-line parameters, it will try to write to its own directory (to create the initial database), so if you mean to run it like that, it should not be in a protected place like Program Files.

  2. the actual database

    The client stores all its preferences and current state and knowledge about files--like file size and resolution, tags, ratings, inbox status, and so on and so on--in a handful of SQLite database files, defaulting to install_dir/db. Depending on the size of your client, these might total 1MB in size or be as much as 10GB.

    In order to perform a search or to fetch or process tags, the client has to interact with these files in many small bursts, which means it is best if these files are on a drive with low latency. An SSD is ideal, but a regularly-defragged HDD with a reasonable amount of free space also works well.

  3. your media files

    All of your jpegs and webms and so on (and their thumbnails) are stored in a single complicated directory that is by default at install_dir/db/client_files. All the files are named by their hash and stored in efficient hash-based subdirectories. In general, it is not navigable by humans, but it works very well for the fast access from a giant pool of files the client needs to do to manage your media.

    Thumbnails tend to be fetched dozens at a time, so it is, again, ideal if they are stored on an SSD. Your regular media files--which on many clients total hundreds of GB--are usually fetched one at a time for human consumption and do not benefit from the expensive low-latency of an SSD. They are best stored on a cheap HDD, and, if desired, also work well across a network file system.

these components can be put on different drives

Although an initial install will keep these parts together, it is possible to, say, run the database on a fast drive but keep your media in cheap slow storage. This is an excellent arrangement that works for many users. And if you have a very large collection, you can even spread your files across multiple drives. It is not very technically difficult, but I do not recommend it for new users.

Backing such an arrangement up is obviously more complicated, and the internal client backup is not sophisticated enough to capture everything, so I recommend you figure out a broader solution with a third-party backup program like FreeFileSync.

pulling your media apart

As always, I recommend creating a backup before you try any of this, just in case it goes wrong.

If you would like to move your files and thumbnails to new locations, I generally recommend you not move their folders around yourself--the database has an internal knowledge of where it thinks its file and thumbnail folders are, and if you move them while it is closed, it will become confused and you will have to manually relocate what is missing on the next boot via a repair dialog. This is not impossible to figure out, but if the program's 'client files' folder confuses you at all, I'd recommend you stay away. Instead, you can simply do it through the gui:

Go database->migrate database, giving you this dialog:

This is an image from my old laptop's client. At that time, I had moved the main database and its files out of the install directory but otherwise kept everything together. Your situation may be simpler or more complicated.

To move your files somewhere else, add the new location, empty/remove the old location, and then click 'move files now'.

Portable means that the path is beneath the main db dir and so is stored as a relative path. Portable paths will still function if the database changes location between boots (for instance, if you run the client from a USB drive and it mounts under a different location).

Weight means the relative amount of media you would like to store in that location. It only matters if you are spreading your files across multiple locations. If location A has a weight of 1 and B has a weight of 2, A will get approximately one third of your files and B will get approximately two thirds.

The operations on this dialog are simple and atomic--at no point is your db ever invalid. Once you have the locations and ideal usage set how you like, hit the 'move files now' button to actually shuffle your files around. It will take some time to finish, but you can pause and resume it later if the job is large or you want to undo or alter something.

If you decide to move your actual database, the program will have to shut down first. Before you boot up again, you will have to create a new program shortcut:

informing the software that the database is not in the default location

A straight call to the client executable will look for a database in install_dir/db. If one is not found, it will create one. So, if you move your database and then try to run the client again, it will try to create a new empty database in the previous location!

So, pass it a -d or --db_dir command line argument, like so:

And it will instead use the given path. If no database is found, it will similarly create a new empty one at that location. You can use any path that is valid in your system, but I would not advise using network locations and so on, as the database works best with some clever device locking calls these interfaces may not provide.

Rather than typing the path out in a terminal every time you want to launch your external database, create a new shortcut with the argument in. Something like this, which is from my main development computer and tests that a fresh default install will run an existing database ok:

Note that an install with an 'external' database no longer needs access to write to its own path, so you can store it anywhere you like, including protected read-only locations (e.g. in 'Program Files'). If you do move it, just double-check your shortcuts are still good and you are done.

finally

If your database now lives in one or more new locations, make sure to update your backup routine to follow them!

moving to an SSD

As an example, let's say you started using the hydrus client on your HDD, and now you have an SSD available and would like to move your thumbnails and main install to that SSD to speed up the client. Your database will be valid and functional at every stage of this, and it can all be undone. The basic steps are:

  1. Move your 'fast' files to the fast location.
  2. Move your 'slow' files out of the main install directory.
  3. Move the install and db itself to the fast location and update shortcuts.

Specifically:

You should now have something like this:

p.s. running multiple clients

Since you now know how to tell the software about an external database, you can, if you like, run multiple clients from the same install (and if you previously had multiple install folders, now you can now just use the one). Just make multiple shortcuts to the same client executable but with different database directories. They can run at the same time. You'll save yourself a little memory and update-hassle. I do this on my laptop client to run a regular client for my media and a separate 'admin' client to do PTR petitions and so on.

Advanced Usage

Program Launch Arguments

launch arguments

You can launch the program with several different arguments to alter core behaviour. If you are not familiar with this, you are essentially putting additional text after the launch command that runs the program. You can run this straight from a terminal console (usually good to test with), or you can bundle it into an easy shortcut that you only have to double-click. An example of a launch command with arguments:

C:\Hydrus Network\client.exe -d="E:\hydrus db" --no_db_temp_files

You can also add --help to your program path, like this:

client.py --help
server.exe --help
./server --help

Which gives you a full listing of all below arguments, however this will not work with the built client executables, which are bundled as a non-console programs and will not give you text results to any console they are launched from. As client.exe is the most commonly run version of the program, here is the list, with some more help about each command:

The server supports the same arguments. It also takes a positional argument of 'start' (start the server, the default), 'stop' (stop any existing server), or 'restart' (do a stop, then a start), which should go before any of the above arguments.

Advanced Usage

Client API

client api

The hydrus client now supports a very simple API so you can access it with external programs.

By default, the Client API is not turned on. Go to services->manage services and give it a port to get it started. I recommend you not allow non-local connections (i.e. only requests from the same computer will work) to start with.

The Client API should start immediately. It will only be active while the client is open. To test it is running all correct (and assuming you used the default port of 45869), try loading this:

http://127.0.0.1:45869

You should get a welcome page. By default, the Client API is HTTP, which means it is ok for communication on the same computer or across your home network (e.g. your computer's web browser talking to your computer's hydrus), but not secure for transmission across the internet (e.g. your phone to your home computer). You can turn on HTTPS, but due to technical complexities it will give itself a self-signed 'certificate', so the security is good but imperfect, and whatever is talking to it (e.g. your web browser looking at https://127.0.0.1:45869) may need to add an exception.

The Client API is still experimental and sometimes not user friendly. If you want to talk to your home computer across the internet, you will need some networking experience. You'll need a static IP or reverse proxy service or dynamic domain solution like no-ip.org so your device can locate it, and potentially port-forwarding on your router to expose the port. If you have a way of hosting a domain and have a signed certificate (e.g. from Let's Encrypt), you can overwrite the client.crt and client.key files in your 'db' directory and HTTPS hydrus should host with those.

Once the API is running, go to its entry in services->review services. Each external program trying to access the API will need its own access key, which is the familiar 64-character hexadecimal used in many places in hydrus. You can enter the details manually from the review services panel and then copy/paste the key to your external program, or the program may have the ability to request its own access while a mini-dialog launched from the review services panel waits to catch the request.

Browsers and tools created by hydrus users:

Library modules created by hydrus users:

API

On 200 OK, the API returns JSON for everything except actual file/thumbnail requests. On 4XX and 5XX, assume it will return plain text, sometimes a raw traceback. You'll typically get 400 for a missing parameter, 401/403/419 for missing/insufficient/expired access, and 500 for a real deal serverside error.

Access and permissions

The client gives access to its API through different 'access keys', which are the typical 64-character hex used in many other places across hydrus. Each guarantees different permissions such as handling files or tags. Most of the time, a user will provide full access, but do not assume this. If the access header or parameter is not provided, you will get 401, and all insufficient permission problems will return 403 with appropriate error text.

Access is required for every request. You can provide this as an http header, like so:

Or you can include it as a GET or POST parameter on any request (except POST /add_files/add_file, which uses the entire POST body for the file's bytes). Use the same name for your GET or POST argument, such as:

There is now a simple 'session' system, where you can get a temporary key that gives the same access without having to include the permanent access key in every request. You can fetch a session key with the /session_key command and thereafter use it just as you would an access key, just with Hydrus-Client-API-Session-Key instead.

Session keys will expire if they are not used within 24 hours, or if the client is restarted, or if the underlying access key is deleted. An invalid/expired session key will give a 419 result with an appropriate error text.

Bear in mind the Client API is still under construction and is http-only for the moment--be careful about transmitting sensitive content outside of localhost. The access key will be unencrypted across any connection, and if it is included as a GET parameter, as simple and convenient as that is, it could be cached in all sorts of places.

Access Management

GET /api_version

Gets the current API version. I will increment this every time I alter the API.

GET /request_new_permissions

Register a new external program with the client. This requires the 'add from api request' mini-dialog under services->review services to be open, otherwise it will 403.

GET /session_key

Get a new session key.

GET /verify_access_key

Check your access key is valid.

Adding Files

POST /add_files/add_file

Tell the client to import a file.

  • Restricted access: YES. Import Files permission needed.

  • Required Headers:

    • Content-Type : application/json (if sending path), application/octet-stream (if sending file)
  • Arguments (in JSON):

path : (the path you want to import)

POST /add_files/delete_files

Tell the client to send files to the trash.

POST /add_files/undelete_files

Tell the client to pull files back out of the trash.

POST /add_files/archive_files

Tell the client to archive inboxed files.

POST /add_files/unarchive_files

Tell the client re-inbox archived files.

Adding Tags

GET /add_tags/clean_tags

Ask the client about how it will see certain tags.

GET /add_tags/get_tag_services

Ask the client about its tag services.

POST /add_tags/add_tags

Make changes to the tags that files have.

  • Restricted access: YES. Add Tags permission needed.

  • Required Headers: n/a

  • Arguments (in JSON):

    • hash : (an SHA256 hash for a file in 64 characters of hexadecimal)
    • hashes : (a list of SHA256 hashes)
    • service_names_to_tags : (an Object of service names to lists of tags to be 'added' to the files)
    • service_names_to_actions_to_tags : (an Object of service names to content update actions to lists of tags)
    • add_siblings_and_parents : obsolete, now does nothing

You can use either 'hash' or 'hashes', and you can use either the simple add-only 'service_names_to_tags' or the advanced 'service_names_to_actions_to_tags'.

The service names are as in the /add_tags/get_tag_services call.

The permitted 'actions' are:

    • 0 - Add to a local tag service.
    • 1 - Delete from a local tag service.
    • 2 - Pend to a tag repository.
    • 3 - Rescind a pend from a tag repository.
    • 4 - Petition from a tag repository. (This is special)
    • 5 - Rescind a petition from a tag repository.

When you petition a tag from a repository, a 'reason' for the petition is typically needed. If you send a normal list of tags here, a default reason of "Petitioned from API" will be given. If you want to set your own reason, you can instead give a list of [ tag, reason ] pairs.

Some example requests:

Adding some tags to a file:

{
	"hash" : "df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56",
	"service_names_to_tags" : {
		"my tags" : [ "character:supergirl", "rating:safe" ]
	}
}

Adding more tags to two files:

{
	"hashes" : [ "df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56", "f2b022214e711e9a11e2fcec71bfd524f10f0be40c250737a7861a5ddd3faebf" ],
	"service_names_to_tags" : {
		"my tags" : [ "process this" ],
		"public tag repository" : [ "creator:dandon fuga" ]
	}
}

A complicated transaction with all possible actions:

{
	"hash" : "df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56",
	"service_names_to_actions_to_tags" : {
		"my tags" : {
			"0" : [ "character:supergirl", "rating:safe" ],
			"1" : [ "character:superman" ]
		},
		"public tag repository" : {
			"2" : [ "character:supergirl", "rating:safe" ],
			"3" : [ "filename:image.jpg" ],
			"4" : [ [ "creator:danban faga", "typo" ], [ "character:super_girl", "underscore" ] ]
			"5" : [ "skirt" ]
		}
	}
}

This last example is far more complicated than you will usually see. Pend rescinds and petition rescinds are not common. Petitions are also quite rare, and gathering a good petition reason for each tag is often a pain.

Note that the enumerated status keys in the service_names_to_actions_to_tags structure are strings, not ints (JSON does not support int keys for Objects).

Response description: 200 and no content.

Note also that hydrus tag actions are safely idempotent. You can pend a tag that is already pended and not worry about an error--it will be discarded. The same for other reasonable logical scenarios: deleting a tag that does not exist will silently make no change, pending a tag that is already 'current' will again be passed over. It is fine to just throw 'process this' tags at every file import you add and not have to worry about checking which files you already added it to.

Adding URLs

GET /add_urls/get_url_files

Ask the client about an URL's files.

GET /add_urls/get_url_info

Ask the client for information about a URL.

POST /add_urls/add_url

Tell the client to 'import' a URL. This triggers the exact same routine as drag-and-dropping a text URL onto the main client window.

  • Restricted access: YES. Import URLs permission needed. Add Tags needed to include tags.

  • Required Headers:

    • Content-Type : application/json
  • Arguments (in JSON):

    • url : (the url you want to add)
    • destination_page_key : (optional page identifier for the page to receive the url)
    • destination_page_name : (optional page name to receive the url)
    • show_destination_page : (optional, defaulting to false, controls whether the UI will change pages on add)
    • service_names_to_additional_tags : (optional tags to give to any files imported from this url)
    • filterable_tags : (optional tags to be filtered by any tag import options that applies to the URL)
    • service_names_to_tags : (obsolete, legacy synonym for service_names_to_additional_tags)

If you specify a destination_page_name and an appropriate importer page already exists with that name, that page will be used. Otherwise, a new page with that name will be recreated (and used by subsequent calls with that name). Make sure it that page name is unique (e.g. '/b/ threads', not 'watcher') in your client, or it may not be found.

Alternately, destination_page_key defines exactly which page should be used. Bear in mind this page key is only valid to the current session (they are regenerated on client reset or session reload), so you must figure out which one you want using the /manage_pages/get_pages call. If the correct page_key is not found, or the page it corresponds to is of the incorrect type, the standard page selection/creation rules will apply.

show_destination_page defaults to False to reduce flicker when adding many URLs to different pages quickly. If you turn it on, the client will behave like a URL drag and drop and select the final page the URL ends up on.

service_names_to_additional_tags uses the same data structure as for /add_tags/add_tags. You will need 'add tags' permission, or this will 403. These tags work exactly as 'additional' tags work in a tag import options. They are service specific, and always added unless some advanced tag import options checkbox (like 'only add tags to new files') is set.

filterable_tags works like the tags parsed by a hydrus downloader. It is just a list of strings. They have no inherant service and will be sent to a tag import options, if one exists, to decide which tag services get what. This parameter is useful if you are pulling all a URL's tags outside of hydrus and want to have them processed like any other downloader, rather than figuring out service names and namespace filtering on your end. Note that in order for a tag import options to kick in, I think you will have to have a Post URL URL Class hydrus-side set up for the URL so some tag import options (whether that is Class-specific or just the default) can be loaded at import time.

POST /add_urls/associate_url

Manage which URLs the client considers to be associated with which files.

  • Restricted access: YES. Import URLs permission needed.

  • Required Headers:

    • Content-Type : application/json
  • Arguments (in JSON):

    • url_to_add : (an url you want to associate with the file(s))
    • urls_to_add : (a list of urls you want to associate with the file(s))
    • url_to_delete : (an url you want to disassociate from the file(s))
    • urls_to_delete : (a list of urls you want to disassociate from the file(s))
    • hash : (an SHA256 hash for a file in 64 characters of hexadecimal)
    • hashes : (a list of SHA256 hashes)

All of these are optional, but you obviously need to have at least one of 'url' arguments and one of the 'hash' arguments. The single/multiple arguments work the same--just use whatever is convenient for you. Unless you really know what you are doing with URL Classes, I strongly recommend you stick to associating URLs with just one single 'hash' at a time. Multiple hashes pointing to the same URL is unusual and frequently unhelpful.

Managing Cookies

This refers to the cookies held in the client's session manager, which are sent with network requests to different domains.

GET /manage_cookies/get_cookies

Get the cookies for a particular domain.

  • Restricted access: YES. Manage Cookies permission needed.

  • Required Headers: n/a

  • Arguments: domain

  • Example request (for gelbooru.com):

    • /manage_cookies/get_cookies?domain=gelbooru.com

Response description: A JSON Object listing all the cookies for that domain in [ name, value, domain, path, expires ] format.

  • Example response:

    • {
      	"cookies" : [
      		[ "__cfduid", "f1bef65041e54e93110a883360bc7e71", ".gelbooru.com", "/", 1596223327 ],
      		[ "pass_hash", "0b0833b797f108e340b315bc5463c324", "gelbooru.com", "/", 1585855361 ],
      		[ "user_id", "123456", "gelbooru.com", "/", 1585855361 ]
      	]
      }

Note that these variables are all strings except 'expires', which is either an integer timestamp or null for session cookies.

This request will also return any cookies for subdomains. The session system in hydrus generally stores cookies according to the second-level domain, so if you request for specific.someoverbooru.net, you will still get the cookies for someoverbooru.net and all its subdomains.

POST /manage_cookies/set_cookies

Set some new cookies for the client. This makes it easier to 'copy' a login from a web browser or similar to hydrus if hydrus's login system can't handle the site yet.

  • Restricted access: YES. Manage Cookies permission needed.

  • Required Headers:

    • Content-Type : application/json
  • Arguments (in JSON):

    • cookies : (a list of cookie rows in the same format as the GET request above)
  • Example request body:

    • {
      	"cookies" : [
      		[ "PHPSESSID", "07669eb2a1a6e840e498bb6e0799f3fb", ".somesite.com", "/", 1627327719 ],
      		[ "tag_filter", "1", ".somesite.com", "/", 1627327719 ]
      	]
      }

You can set 'value' to be null, which will clear any existing cookie with the corresponding name, domain, and path (acting essentially as a delete).

Expires can be null, but session cookies will time-out in hydrus after 60 minutes of non-use.

Managing Pages

This refers to the pages of the main client UI.

GET /manage_pages/get_pages

Get the page structure of the current UI session.

  • Restricted access: YES. Manage Pages permission needed.

  • Required Headers: n/a

  • Arguments: n/a

Response description: A JSON Object of the top-level page 'notebook' (page of pages) detailing its basic information and current sub-pages. Page of pages beneath it will list their own sub-page lists.

  • Example response:

    • {
      	"pages" : {
      		"name" : "top pages notebook",
      		"page_key" : "3b28d8a59ec61834325eb6275d9df012860a1ecfd9e1246423059bc47fb6d5bd",
      		"page_type" : 10,
      		"selected" : true,
      		"pages" : [
      			{
      				"name" : "files",
      				"page_key" : "d436ff5109215199913705eb9a7669d8a6b67c52e41c3b42904db083255ca84d",
      				"page_type" : 6,
      				"selected" : false
      			},
      			{
      				"name" : "thread watcher",
      				"page_key" : "40887fa327edca01e1d69b533dddba4681b2c43e0b4ebee0576177852e8c32e7",
      				"page_type" : 9,
      				"selected" : false
      			},
      			{
      				"name" : "pages",
      				"page_key" : "2ee7fa4058e1e23f2bd9e915cdf9347ae90902a8622d6559ba019a83a785c4dc",
      				"page_type" : 10,
      				"selected" : true,
      				"pages" : [
      					{
      						"name" : "urls",
      						"page_key" : "9fe22cb760d9ee6de32575ed9f27b76b4c215179cf843d3f9044efeeca98411f",
      						"page_type" : 7,
      						"selected" : true
      					},
      					{
      						"name" : "files",
      						"page_key" : "2977d57fc9c588be783727bcd54225d577b44e8aa2f91e365a3eb3c3f580dc4e",
      						"page_type" : 6,
      						"selected" : false
      					}
      				]
      			}	
      		]
      	}
      }

The page types are as follows:

The top page of pages will always be there, and always selected. 'selected' means which page is currently in view and will propagate down other page of pages until it terminates. It may terminate in an empty page of pages, so do not assume it will end on a 'media' page.

The 'page_key' is a unique identifier for the page. It will stay the same for a particular page throughout the session, but new ones are generated on a client restart or other session reload.

GET /manage_pages/get_page_info

Get information about a specific page.

This is under construction. The current call dumps a ton of info for different downloader pages. Please experiment in IRL situations and give feedback for now! I will flesh out this help with more enumeration info and examples as this gets nailed down. POST commands to alter pages (adding, removing, highlighting), will come later.

  • Restricted access: YES. Manage Pages permission needed.

  • Required Headers: n/a

  • Arguments:

    • page_key : (hexadecimal page_key as stated in /manage_pages/get_pages)
    • simple : true or false (optional, defaulting to true)
  • Example request:

    • /manage_pages/get_page_info?page_key=aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da&simple=true

Response description: A JSON Object of the page's information. At present, this mostly means downloader information.

POST /manage_pages/focus_page

'Show' a page in the main GUI, making it the current page in view. If it is already the current page, no change is made.

  • Restricted access: YES. Manage Pages permission needed.

  • Required Headers:

    • Content-Type : application/json
  • Arguments (in JSON):

    • page_key : (the page key for the page you wish to show)

The page key is the same as fetched in the /manage_pages/get_pages call.

Searching Files

File search in hydrus is not paginated like a booru--all searches return all results in one go. In order to keep this fast, search is split into two steps--fetching file identifiers with a search, and then fetching file metadata in batches. You may have noticed that the client itself performs searches like this--thinking a bit about a search and then bundling results in batches of 256 files before eventually throwing all the thumbnails on screen.

GET /get_files/search_files

Search for the client's files.

  • Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.

  • Required Headers: n/a

  • Arguments (in percent-encoded JSON):

    • tags : (a list of tags you wish to search for)
    • system_inbox : true or false (optional, defaulting to false)
    • system_archive : true or false (optional, defaulting to false)
  • Example request for all files in the inbox with tags "blue eyes", "blonde hair", and "кино":

    • /get_files/search_files?system_inbox=true&tags=%5B%22blue%20eyes%22%2C%20%22blonde%20hair%22%2C%20%22%5Cu043a%5Cu0438%5Cu043d%5Cu043e%22%5D

If the access key's permissions only permit search for certain tags, at least one whitelisted/non-blacklisted tag must be in the "tags" list or this will 403. Tags can be prepended with a hyphen to make a negated tag (e.g. "-green eyes"), but these will not be eligible for the permissions whitelist check.

Response description: The full list of numerical file ids that match the search.

  • Example response:

    • {
      	"file_ids" : [ 125462, 4852415, 123, 591415 ]
      }

File ids are internal and specific to an individual client. For a client, a file with hash H always has the same file id N, but two clients will have different ideas about which N goes with which H. They are a bit faster than hashes to retrieve and search with en masse, which is why they are exposed here.

The search will be performed on the 'local files' file domain and 'all known tags' tag domain. At current, they will be sorted in import time order, newest to oldest (if you would like to paginate them before fetching metadata), but sort options will expand in future.

Note that most clients will have an invisible system:limit of 10,000 files on all queries. I expect to add more system predicates to help searching for untagged files, but it is tricky to fetch all files under any circumstance. Large queries may take several seconds to respond.

GET /get_files/file_metadata

Get metadata about files in the client.

  • Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.

  • Required Headers: n/a

  • Arguments (in percent-encoded JSON):

    • file_ids : (a list of numerical file ids)
    • hashes : (a list of hexadecimal SHA256 hashes)
    • only_return_identifiers : true or false (optional, defaulting to false)
    • detailed_url_information : true or false (optional, defaulting to false)

You need one of file_ids or hashes. If your access key is restricted by tag, you cannot search by hashes, and the file_ids you search for must have been in the most recent search result.

  • Example request for two files with ids 123 and 4567:

    • /get_files/file_metadata?file_ids=%5B123%2C%204567%5D

  • The same, but only wants hashes back:

    • /get_files/file_metadata?file_ids=%5B123%2C%204567%5D&only_return_identifiers=true

  • And one that fetches two hashes, 4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2 and 3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82:

    • /get_files/file_metadata?hashes=%5B%224c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2%22%2C%20%223e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82%22%5D

This request string can obviously get pretty ridiculously long. It also takes a bit of time to fetch metadata from the database. In its normal searches, the client usually fetches file metadata in batches of 256.

Response description: A list of JSON Objects that store a variety of file metadata.

  • Example response:

    • {
      	"metadata" : [
      		{
      			"file_id" : 123,
      			"hash" : "4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2",
      			"size" : 63405,
      			"mime" : "image/jpg",
      			"ext" : ".jpg",
      			"width" : 640,
      			"height" : 480,
      			"duration" : null,
      			"has_audio" : false,
      			"num_frames" : null,
      			"num_words" : null,
      			"is_inbox" : true,
      			"is_local" : true,
      			"is_trashed" : false,
      			"known_urls" : [],
      			"service_names_to_statuses_to_tags" : {}
      			"service_names_to_statuses_to_display_tags" : {}
      		},
      		{
      			"file_id" : 4567,
      			"hash" : "3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82",
      			"size" : 199713,
      			"mime" : "video/webm",
      			"ext" : ".webm",
      			"width" : 1920,
      			"height" : 1080,
      			"duration" : 4040,
      			"has_audio" : true,
      			"num_frames" : 102,
      			"num_words" : null,
      			"is_inbox" : false,
      			"is_local" : true,
      			"is_trashed" : false,
      			"known_urls" : [
      				"https://gelbooru.com/index.php?page=post&s=view&id=4841557",
      				"https://img2.gelbooru.com//images/80/c8/80c8646b4a49395fb36c805f316c49a9.jpg",
      				"http://origin-orig.deviantart.net/ed31/f/2019/210/7/8/beachqueen_samus_by_dandonfuga-ddcu1xg.jpg"
      			],
      			"service_names_to_statuses_to_tags" : {
      				"my tags" : {
      					"0" : [ "favourites" ]
      					"2" : [ "process this later" ]
      				},
      				"my tag repository" : {
      					"0" : [ "blonde_hair", "blue_eyes", "looking_at_viewer" ]
      					"1" : [ "bodysuit" ]
      				}
      			},
      			"service_names_to_statuses_to_display_tags" : {
      				"my tags" : {
      					"0" : [ "favourites" ]
      					"2" : [ "process this later", "processing" ]
      				},
      				"my tag repository" : {
      					"0" : [ "blonde hair", "blue eyes", "looking at viewer" ]
      					"1" : [ "bodysuit", "clothing" ]
      				}
      			}
      		}
      	]
      }

    And one where only_return_identifiers is true:

    • {
      	"metadata" : [
      		{
      			"file_id" : 123,
      			"hash" : "4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2"
      		},
      		{
      			"file_id" : 4567,
      			"hash" : "3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82"
      		}
      	]
      }

Size is in bytes. Duration is in milliseconds, and may be an int or a float.

The service_names_to_statuses_to_tags structures are similar to the /add_tags/add_tags scheme, excepting that the status numbers are:

    • 0 - current
    • 1 - pending
    • 2 - deleted
    • 3 - petitioned

Note that since JSON Object keys must be strings, these status numbers are strings, not ints.

While service_names_to_statuses_to_tags represents the actual tags stored on the database for a file, the service_names_to_statuses_to_display_tags structure reflects how tags appear in the UI, after siblings are collapsed and parents are added. If you want to edit a file's tags, use service_names_to_statuses_to_tags. If you want to render to the user, use service_names_to_statuses_to_displayed_tags.

If you add detailed_url_information=true, a new entry, 'detailed_known_urls', will be added for each file, with a list of the same structure as /add_urls/get_url_info. This may be an expensive request if you are querying thousands of files at once.

For example:

GET /get_files/file

Get a file.

  • Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.

  • Required Headers: n/a

  • Arguments :

    • file_id : (numerical file id for the file)
    • hash : (a hexadecimal SHA256 hash for the file)

Only use one. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.

GET /get_files/thumbnail

Get a file's thumbnail.

  • Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.

  • Required Headers: n/a

  • Arguments :

    • file_id : (numerical file id for the file)
    • hash : (a hexadecimal SHA256 hash for the file)

Only use one. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.

Advanced Usage

IPFS

ipfs

IPFS is a p2p protocol that makes it easy to share many sorts of data. The hydrus client can communicate with an IPFS daemon to send and receive files.

You can read more about IPFS from their homepage, or this guide that explains its various rules in more detail.

For our purposes, we only need to know about these concepts:

getting ipfs

Get the prebuilt executable here. Inside should be a very simple 'ipfs' executable that does everything. Extract it somewhere and open up a terminal in the same folder, and then type:

The IPFS exe should now be running in that terminal, ready to respond to requests:

You can kill it with Ctrl+C and restart it with the 'ipfs daemon' call again (you only have to run 'ipfs init' once).

When it is running, opening this page should download and display an example 'Hello World!' file from ~~~across the internet~~~.

Your daemon listens for other instances of ipfs using port 4001, so if you know how to open that port in your firewall and router, make sure you do.

connecting your client

IPFS daemons are treated as services inside hydrus, so go to services->manage services->remote->ipfs daemons and add in your information. Hydrus uses the API port, default 5001, so you will probably want to use credentials of '127.0.0.1:5001'. You can click 'test credentials' to make sure everything is working.

Thereafter, you will get the option to 'pin' and 'unpin' from a thumbnail's right-click menu, like so:

This works like hydrus's repository uploads--it won't happen immediately, but instead will be queued up at the pending menu. Commit all your pins when you are ready:

Notice how the IPFS icon appears on your pending and pinned files. You can search for these files using 'system:file service'.

Unpin works the same as pin, just like a hydrus repository petition.

Right-clicking any pinned file will give you a new 'share' action:

Which will put it straight in your clipboard. In this case, it is QmP6BNvWfkNf74bY3q1ohtDZ9gAmss4LAjuFhqpDPQNm1S.

If you want to share a pinned file with someone, you have to tell them this multihash. They can then:

directories

If you have many files to share, IPFS also supports directories, and now hydrus does as well. IPFS directories use the same sorts of multihash as files, and you can download them into the hydrus client using the same pages->new download popup->an ipfs multihash menu entry. The client will detect the multihash represents a directory and give you a simple selection dialog:

You may recognise those hash filenames--this example was created by hydrus, which can create ipfs directories from any selection of files from the same right-click menu:

Hydrus will pin all the files and then wrap them in a directory, showing its progress in a popup. Your current directory shares are summarised on the respective services->review services panel:

If you find you use IPFS a lot, here are some add-ons for your web browser, as recommended by /tech/:

This script changes all bare ipfs hashes into clickable links to the ipfs gateway (on page loads):

https://greasyfork.org/en/scripts/14837-ipfs-hash-linker

These redirect all gateway links to your local daemon when it's on, it works well with the previous script:

https://github.com/lidel/ipfs-firefox-addon

https://github.com/dylanPowers/ipfs-chrome-extension

Advanced Usage

The Local Booru

This was a fun project, but it never advanced beyond a prototype. The future of this system is other people's nice applications plugging into the Client API.

local booru

The hydrus client has a simple booru to help you share your files with others over the internet.

First of all, this is hosted from your client, which means other people will be connecting to your computer and fetching files you choose to share from your hard drive. If you close your client or shut your computer down, the local booru will no longer work.

how to do it

First of all, turn the local booru server on by going to services->manage services and giving it a port:

It doesn't matter what you pick, but make it something fairly high. When you ok that dialog, the client should start the booru. You may get a firewall warning.

Then right click some files you want to share and select share->local booru. This will throw up a small dialog, like so:

This lets you enter an optional name, which titles the share and helps you keep track of it, an optional text, which lets you say some words or html to the people you are sharing with, and an expiry, which lets you determine if and when the share will no longer work.

You can also copy either the internal or external link to your clipboard. The internal link (usually starting something like http://127.0.0.1:45866/) works inside your network and is great just for testing, while the external link (starting http://[your external ip address]:[external port]/) will work for anyone around the world, as long as your booru's port is being forwarded correctly.

If you use a dynamic-ip service like No-IP, you can replace your external IP with your redirect hostname. You have to do it by hand right now, but I'll add a way to do it automatically in future.

Note that anyone with the external link will be able to see your share, so make sure you only share links with people you trust.

forwarding your port

Your home router acts as a barrier between the computers inside the network and the internet. Those inside can see out, but outsiders can only see what you tell the router to permit. Since you want to let people connect to your computer, you need to tell the router to forward all requests of a certain kind to your computer, and thus your client.

If you have never done this before, it can be a headache, especially doing it manually. Luckily, a technology called UPnP makes it a ton easier, and this is how your Skype or Bittorrent clients do it automatically. Not all routers support it, but most do. You can have hydrus try to open a port this way back on services->manage services. Unless you know what you are doing and have a good reason to make them different, you might as well keep the internal and external ports the same.

Once you have it set up, the client will try to make sure your router keeps that port open for your client. If it all works, you should see the new mapping appear in your services->manage local upnp dialog, which lists all your router's current port mappings.

If you want to test that the port forward is set up correctly, going to http://[external ip]:[external port]/ should give a little html just saying hello. Your ISP might not allow you to talk to yourself, though, so ask a friend to try if you are having trouble.

If you still do not understand what is going on here, this is a good article explaining everything.

If you do not like UPnP or your router does not support it, you can set the port forward up manually, but I encourage you to keep the internal and external port the same, because absent a 'upnp port' option, the 'copy external share link' button will use the internal port.

so, what do you get?

The html layout is very simple:



It uses a very similar stylesheet to these help pages. If you would like to change the style, have a look at the html and then edit install_dir/static/local_booru_style.css. The thumbnails will be the same size as in your client.

editing an existing share

You can review all your shares on services->review services, under local->booru. You can copy the links again, change the title/text/expiration, and delete any shares you don't want any more.

future plans

This was a fun project, but it never advanced beyond a prototype. The future of this system is other people's nice applications plugging into the Client API.

Advanced Usage

Setting up your own Server

You do not need the server to do anything with hydrus! It is only for advanced users to do very specific jobs! The server is also hacked-together and quite technical. It requires a fair amount of experience with the client and its concepts, and it does not operate on a timescale that works well on a LAN. Only try running your own server once you have a bit of experience synchronising with something like the PTR and you think, 'Hey, I know exactly what that does, and I would like one!'

Here is a document put together by a user describing whether you want the server.

setting up a server

I will use two terms, server and service, to mean two distinct things:

Setting up a hydrus server is easy compared to, say, Apache. There are no .conf files to mess about with, and everything is controlled through the client. When started, the server will place an icon in your system tray in Windows or open a small frame in Linux or macOS. To close the server, either right-click the system tray icon and select exit, or just close the frame.

The basic process for setting up a server is:

Let's look at these steps in more detail:

start the server

Since the server and client have so much common code, I package them together. If you have the client, you have the server. If you installed in Windows, you can hit the shortcut in your start menu. Otherwise, go straight to 'server' or 'server.exe' or 'server.pyw' in your installation directory. The program will first try to take port 45870 for its administration interface, so make sure that is free. Open your firewall as appropriate.

_client

set up the client

In the services->manage services dialog, add a new 'hydrus server administration service' and set up the basic options as appropriate. If you are running the server on the same computer as the client, its hostname is 'localhost'.

In order to set up the first admin account and an access key, use 'init' as a registration key. This special registration key will only work to initialise this first super-account.

YOU'LL WANT TO SAVE YOUR ACCESS KEY IN A SAFE PLACE

If you lose your admin access key, there is no way to get it back, and if you are not sqlite-proficient, you'll have to restart from the beginning by deleting your server's database files.

If the client can't connect to the server, it is either not running or you have a firewall/port-mapping problem. If you want a quick way to test the server's visibility, just put https://host:port into your browser (make sure it is https! http will not work)--if it is working, your browser will probably complain about its self-signed https certificate. Once you add a certificate exception, the server should return some simple html identifying itself.

set up the server

You should have a new submenu, 'administrate services', under 'services', in the client gui. This is where you control most server and service-wide stuff.

admin->your server->manage services lets you add, edit, and delete the services your server runs. Every time you add one, you will also be added as that service's first administrator, and the admin menu will gain a new entry for it.

making accounts

Go admin->your service->create new accounts to create new registration keys. Send the registration keys to the users you want to give these new accounts. A registration key will only work once, so if you want to give several people the same account, they will have to share the access key amongst themselves once one of them has registered the account. (Or you can register the account yourself and send them all the same access key. Do what you like!)

Go admin->manage account types to add, remove, or edit account types. Make sure everyone has at least downloader (get_data) permissions so they can stay synchronised.

You can create as many accounts of whatever kind you like. Depending on your usage scenario, you may want to have all uploaders, one uploader and many downloaders, or just a single administrator. There are many combinations.

???

The most important part is to have fun! There are no losers on the INFORMATION SUPERHIGHWAY.

profit

I honestly hope you can get some benefit out of my code, whether just as a backup or as part of a far more complex system. Please mail me your comments as I am always keen to make improvements.

btw, how to backup a repo's db

All of a server's files and options are stored in its accompanying .db file and respective subdirectories, which are created on first startup (just like with the client). To backup or restore, you have two options:

OMG EVERYTHING WENT WRONG

If you get to a point where you can no longer boot the repository, try running SQLite Studio and opening server.db. If the issue is simple--like manually changing the port number--you may be in luck. Send me an email if it is tricky.

Remember that everything is breaking all the time. Make regular backups, and you'll minimise your problems.

Advanced Usage

running a client or server in wine

getting it to work on wine

Several Linux and macOS users have found success running hydrus with Wine. Here is a post from a Linux dude:

Some things I picked up on after extended use:

Installation process:

If you get the client running in Wine, please let me know how you get on!

Advanced Usage

running a client or server from source

running from source

I write the client and server entirely in python, which can run straight from source. It is not simple to get hydrus running this way, but if none of the built packages work for you (for instance you use a non-Ubuntu-compatible flavour of Linux), it may be the only way you can get the program to run. Also, if you have a general interest in exploring the code or wish to otherwise modify the program, you will obviously need to do this stuff.

a quick note about Linux flavours

I often point people here when they are running non-Ubuntu flavours of Linux and cannot run my build. One Debian user mentioned that he had an error like this:

 

 

But that by simply deleting the libX11.so.6 file in the hydrus install directory, he was able to boot. I presume this meant my hydrus build was then relying on his local libX11.so, which happened to have better API compatibility. If you receive a similar error, you might like to try the same sort of thing. Let me know if you discover anything!

building on windows

Installing some packages on windows with pip may need Visual Studio's C++ Build Tools for your version of python. Although these tools are free, it can be a pain to get them through the official (and often huge) downloader installer from Microsoft. Instead, install Chocolatey and use this one simple line:

choco install -y vcbuildtools visualstudio2017buildtools

Trust me, this will save a ton of headaches!

what you will need

You will need basic python experience, python 3.x and a number of python modules. Most of it you can get through pip.

If you are on Linux or macOS, or if you are on Windows and have an existing python you do not want to stomp all over with new modules, I recommend you create a virtual environment:

Note, if you are on Linux, it may be easier to use your package manager instead of messing around with venv. A user has written a great summary with all needed packages here.

If you do want to create a new venv environment:

That '. venv/bin/activate' line turns your venv on, and will be needed every time you run the client.pyw/server.py files. You can easily tuck it into a launch script.

On Windows, the path is venv\Scripts\activate, and the whole deal is done much easier in cmd than Powershell. If you get Powershell by default, just type 'cmd' to get an old fashioned command line. In cmd, the launch command is just 'venv\scripts\activate', no leading period.

After that, you can go nuts with pip. I think this will do for most systems:

You may want to do all that in smaller batches.

You will also need Qt5. Either PySide2 (default) or PyQt5 are supported, through qtpy. You can install, again, with pip:

-or-

Qt 5.15 currently seems to be working well, but 5.14 caused some trouble.

And optionally, you can add these packages:

Here is a masterline with everything for general use:

For Windows, depending on which compiler you are using, pip can have problems building some modules like lz4 and lxml. This page has a lot of prebuilt binaries--I have found it very helpful many times. You may want to update python's sqlite3.dll as well--you can get it here, and just drop it in C:\Python37\DLLs or wherever you have python installed. I have a fair bit of experience with Windows python, so send me a mail if you need help.

If you don't have ffmpeg in your PATH and you want to import videos, you will need to put a static FFMPEG executable in the install_dir/bin directory. Have a look at how I do it in the extractable compiled releases if you can't figure it out. On Windows, you can copy the exe from one of those releases, or just download the latest static build right from the FFMPEG site.

Once you have everything set up, client.pyw and server.py should look for and run off client.db and server.db just like the executables. They will look in the 'db' directory by default, or anywhere you point them with the "-d" parameter, again just like the executables.

I develop hydrus on and am most experienced with Windows, so the program is more stable and reasonable on that. I do not have as much experience with Linux or macOS, so I would particularly appreciate your Linux/macOS bug reports and any informed suggestions.

my code

Unlike most software people, I am more INFJ than INTP/J. My coding style is unusual and unprofessional, and everything is pretty much hacked together. Please look through the source if you are interested in how things work and ask me if you don't understand something. I'm constantly throwing new code together and then cleaning and overhauling it down the line.

I work strictly alone, so while I am very interested in detailed bug reports or suggestions for good libraries to use, I am not looking for pull requests. Everything I do is WTFPL, so feel free to fork and play around with things on your end as much as you like.

Making a Downloader

Making a Downloader

introduction

Creating custom downloaders is only for advanced users who understand HTML or JSON. Beware! If you are simply looking for how to add new downloaders, please head over here.

this system

The first versions of hydrus's downloaders were all hardcoded and static--I wrote everything into the program itself and nothing was user-creatable or -fixable. After the maintenance burden of the entire messy system proved too large for me to keep up with and a semi-editable booru system proved successful, I decided to overhaul the entire thing to allow user creation and sharing of every component. It is designed to be very simple to the front-end user--they will typically handle a couple of png files and then select a new downloader from a list--but very flexible (and hence potentially complicated) on the back-end. These help pages describe the different compontents with the intention of making an HTML- or JSON- fluent user able to create and share a full new downloader on their own.

As always, this is all under active development. Your feedback on the system would be appreciated, and if something is confusing or you discover something in here that is out of date, please let me know.

what is a downloader?

In hydrus, a downloader is one of:

The system currently supports HTML and JSON parsing. XML should be fine under the HTML parser--it isn't strict about checking types and all that.

what does a downloader do?

So we have three components:

URL downloaders and watchers do not need the Gallery URL Generator, as their input is an URL. And simple downloaders also have an explicit 'just download it and parse it with this simple rule' action, so they do not use URL Classes (or even full-fledged Page Parsers) either.

Making a Downloader

gallery url generators

GUGs

And convert them into an initialising Gallery URL, such as:

These are all the 'first page' of the results if you type or click-through to the same location on those sites. We are essentially emulating their own simple search-url generation inside the hydrus client.

actually doing it

Although it is usually a fairly simple process of just substituting the inputted tags into a string template, there are a couple of extra things to think about. Let's look at the ui under network->downloader definitions->manage gugs:

The client will split whatever the user enters by whitespace, so 'blue_eyes blonde_hair' becomes two search terms, [ 'blue_eyes', 'blonde_hair' ], which are then joined back together with the given 'search terms separator', to make 'blue_eyes+blonde_hair'. Different sites use different separators, although ' ', '+', and ',' are most common. The new string is substituted into the '%tags%' in the template phrase, and the URL is made.

Note that you will not have to make %20 or %3A percent-encodings for reserved characters here--the network engine handles all that before the request is sent. For the most part, if you need to include or a user puts in ':' or ' ' or 'おっぱい', you can just pass it along straight into the final URL without worrying.

This ui should update as you change it, so have a play and look at how the output example url changes to get a feel for things. Look at the other defaults to see different examples. Even if you break something, you can just cancel out.

The name of the GUG is important, as this is what will be listed when the user chooses what 'downloader' they want to use. Make sure it has a clear unambiguous name.

The initial search text is also important. Most downloaders just take some text tags, but if your GUG expects a numerical artist id (like pixiv artist search does), you should specify that explicitly to the user. You can even put in a brief '(two tag maximum)' type of instruction if you like.

Notice that the Deviart Art example above is actually the stream of wlop's favourites, not his works, and without an explicit notice of that, a user could easily mistake what they have selected. 'gelbooru' or 'newgrounds' are bad names, 'type here' is a bad initialising text.

Nested GUGs

Making a Downloader

url classes

url classes

The fundamental connective tissue of the downloader system is the 'URL Class'. This object identifies and normalises URLs and links them to other components. Whenever the client handles a URL, it tries to match it to a URL Class to figure out what to do.

the types of url

For hydrus, an URL is useful if it is one of:

the components of a url

As far as we are concerned, a URL string has four parts:

So, let's look at the 'edit url class' panel, which is found under network->manage url classes:

A TBIB File Page like https://tbib.org/index.php?page=post&s=view&id=6391256 is a Post URL. Let's look at the metadata first:

And now, for matching the string itself, let's revisit our four components:

string matches

As you edit these components, you will be presented with the Edit String Match Panel:

This lets you set the type of string that will be valid for that component. If a given path or query component does not match the rules given here, the URL will not match the URL Class. Most of the time you will probably want to set 'fixed characters' of something like "post" or "index.php", but if the component you are editing is more complicated and could have a range of different valid values, you can specify just numbers or letters or even a regex pattern. If you try to do something complicated, experiment with the 'example string' entry to make sure you have it set how you think.

Don't go overboard with this stuff, though--most sites do not have super-fine distinctions between their different URL types, and hydrus users will not be dropping user account or logout pages or whatever on the client, so you can be fairly liberal with the rules.

how do they match, exactly?

This URL Class will be assigned to any URL that matches the location, path, and query. Missing path compontent or query parameters in the URL will invalidate the match but additonal ones will not!

For instance, given:

Only URL A will match

And:

Both URL A and B will match

And:

Both URL A and B will match, URL C will not

If multiple URL Classes match a URL, the client will try to assign the most 'complicated' one, with the most path components and then query parameters.

Given two example URLs and URL Classes:

URL A will match URL Class A but not URL Class B and so will receive A.

URL B will match both and receive URL Class B as it is more complicated.

This situation is not common, but when it does pop up, it can be a pain. It is usually a good idea to match exactly what you need--no more, no less.

normalising urls

Different URLs can give the same content. The http and https versions of a URL are typically the same, and:

And:

Since we are in the business of storing and comparing URLs, we want to 'normalise' them to a single comparable beautiful value. You see a preview of this normalisation on the edit panel. Normalisation happens to all URLs that enter the program.

Note that in e621's case (and for many other sites!), that text after the id is purely decoration. It can change when the file's tags change, so if we want to compare today's URLs with those we saw a month ago, we'd rather just be without it.

On normalisation, all URLs will get the preferred http/https switch, and their query parameters will be alphabetised. File and Post URLs will also cull out any surplus path or query components. This wouldn't affect our TBIB example above, but it will clip the e621 example down to that 'bare' id URL, and it will take any surplus 'lang=en' or 'browser=netscape_24.11' garbage off the query text as well. URLs that are not associated and saved and compared (i.e. normal Gallery and Watchable URLs) are not culled of unmatched path components or query parameters, which can sometimes be useful if you want to match (and keep intact) gallery URLs that might or might not include an important 'sort=desc' type of parameter.

Since File and Post URLs will do this culling, be careful that you not leave out anything important in your rules. Make sure what you have is both necessary (nothing can be removed and still keep it valid) and sufficient (no more needs to be added to make it valid). It is a good idea to try pasting the 'normalised' version of the example URL into your browser, just to check it still works.

'default' values

Some sites present the first page of a search like this:

https://danbooru.donmai.us/posts?tags=skirt

But the second page is:

 

https://danbooru.donmai.us/posts?tags=skirt&page=2

Another example is:

https://www.hentai-foundry.com/pictures/user/Mister69M

https://www.hentai-foundry.com/pictures/user/Mister69M/page/2

What happened to 'page=1' and '/page/1'? Adding those '1' values in works fine! Many sites, when an index is absent, will secretly imply an appropriate 0 or 1. This looks pretty to users looking at a browser address bar, but it can be a pain for us, who want to match both styles to one URL Class. It would be nice if we could recognise the 'bare' initial URL and fill in the '1' values to coerce it to the explicit, automation-friendly format. Defaults to the rescue:

After you set a path component or query parameter String Match, you will be asked for an optional 'default' value. You won't want to set one most of the time, but for Gallery URLs, it can be hugely useful--see how the normalisation process automatically fills in the missing path component with the default! There are plenty of examples in the default Gallery URLs of this, so check them out. Most sites use page indices starting at '1', but Gelbooru-style imageboards use 'pid=0' file index (and often move forward 42, so the next pages will be 'pid=42', 'pid=84', and so on, although others use deltas of 20 or 40).

can we predict the next gallery page?

Now we can harmonise gallery urls to a single format, we can predict the next gallery page! If, say, the third path component or 'page' query parameter is always a number referring to page, you can select this under the 'next gallery page' section and set the delta to change it by. The 'next gallery page url' section will be automatically filled in. This value will be consulted if the parser cannot find a 'next gallery page url' from the page content.

It is neat to set this up, but I only recommend it if you actually cannot reliably parse a next gallery page url from the HTML later in the process. It is neater to have searches stop naturally because the parser said 'no more gallery pages' than to have hydrus always one page beyond and end every single search on an uglier 'No results found' or 404 result.

Unfortunately, some sites will either not produce an easily parsable next page link or randomly just not include it due to some issue on their end (Gelbooru is a funny example of this). Also, APIs will often have a kind of 'start=200&num=50', 'start=250&num=50' progression but not include that state in the XML or JSON they return. These cases require the automatic next gallery page rules (check out Artstation and tumblr api gallery page URL Classes in the defaults for examples of this).

If you know that a URL has an API backend, you can tell the client to use that API URL when it fetches data. The API URL needs its own URL Class.

To define the relationship, click the "String Converter" button, which gives you this:

You may have seen this panel elsewhere. It lets you convert a string to another over a number of transformation steps. The steps can be as simple as adding or removing some characters or applying a full regex substitution. For API URLs, you are mostly looking to isolate some unique identifying data ("m/thread/16086187" in this case) and then substituting that into the new API path. It is worth testing this with several different examples!

Making a Downloader

parsers

parsers

In hydrus, a parser is an object that takes a single block of HTML or JSON data and returns many kinds of hydrus-level metadata.

Parsers are flexible and potentially quite complicated. You might like to open network->manage parsers and explore the UI as you read these pages. Check out how the default parsers already in the client work, and if you want to write a new one, see if there is something already in there that is similar--it is usually easier to duplicate an existing parser and then alter it than to create a new one from scratch every time.

There are three main components in the parsing system (click to open each component's help page):

Once you are comfortable with these objects, you might like to check out these walkthroughs, which create full parsers from nothing:

Once you are comfortable with parsers, and if you are feeling brave, check out how the default imageboard and pixiv parsers work. These are complicated and use more experimental areas of the code to get their job done. If you are trying to get a new imageboard parser going and can't figure out subsidiary page parsers, send me a mail or something and I'll try to help you out!

When you are making a parser, consider this checklist (you might want to copy/have your own version of this somewhere):

Making a Downloader

putting it all together

putting it all together

Now you know what GUGs, URL Classes, and Parsers are, you should have some ideas of how URL Classes could steer what happens when the downloader is faced with an URL to process. Should a URL be imported as a media file, or should it be parsed? If so, how?

You may have noticed in the Edit GUG ui that it lists if a current URL Class matches the example URL output. If the GUG has no matching URL Class, it won't be listed in the main 'gallery selector' button's list--it'll be relegated to the 'non-functioning' page. Without a URL Class, the client doesn't know what to do with the output of that GUG. But if a URL Class does match, we can then hand the result over to a parser set at network->downloader definitions->manage url class links:

Here you simply set which parsers go with which URL Classes. If you have URL Classes that do not have a parser linked (which is the default for new URL Classes), you can use the 'try to fill in gaps...' button to automatically fill the gaps based on guesses using the parsers' example URLs. This is usually the best way to line things up unless you have multiple potential parsers for that URL Class, in which case it'll usually go by the parser name earliest in the alphabet.

If the URL Class has no parser set or the parser is broken or otherwise invalid, the respective URL's file import object in the downloader or subscription is going to throw some kind of error when it runs. If you make and share some parsers, the first indication that something is wrong is going to be several users saying 'I got this error: (copy notes from file import status window)'. You can then load the parser back up in manage parsers and try to figure out what changed and roll out an update.

manage url class links also shows 'api link review', which summarises which URL Classes api-link to others. In these cases, only the api URL gets a parser entry in the first 'parser links' window, since the first will never be fetched for parsing (in the downloader, it will always be converted to the API URL, and that is fetched and parsed).

Once your GUG has a URL Class and your URL Classes have parsers linked, test your downloader! Note that Hydrus's URL drag-and-drop import uses URL Classes, so if you don't have the GUG and gallery stuff done but you have a Post URL set up, you can test that just by dragging a Post URL from your browser to the client, and it should be added to a new URL Downloader and just work. It feels pretty good once it does!

Making a Downloader

sharing downloaders

sharing

If you are working with users who also understand the downloader system, you can swap your GUGs, URL Classes, and Parsers separately using the import/export buttons on the relevant dialogs, which work in pngs and clipboard text.

But if you want to share conveniently, and with users who are not familiar with the different downloader objects, you can package everything into a single easy-import png as per here.

The dialog to use is network->downloader definitions->export downloaders:

It isn't difficult. Essentially, you want to bundle enough objects to make one or more 'working' GUGs at the end. I recommend you start by just hitting 'add gug', which--using Example URLs--will attempt to figure out everything you need by itself.

This all works on Example URLs and some domain guesswork, so make sure your url classes are good and the parsers have correct Example URLs as well. If they don't, they won't all link up neatly for the end user. If part of your downloader is on a different domain to the GUGs and Gallery URLs, then you'll have to add them manually. Just start with 'add gug' and see if it looks like enough.

Once you have the necessary and sufficient objects added, you can export to png. You'll get a similar 'does this look right?' summary as what the end-user will see, just to check you have everything in order and the domains all correct. If that is good, then make sure to give the png a sensible filename and embellish the title and description if you need to. You can then send/post that png wherever, and any regular user will be able to use your work.

Making a Downloader

login manager

login

The system works, but this help was never done! Check the defaults for examples of how it works, sorry!

Misc

Misc

Privacy

privacy

Repositories are designed to respect your privacy. They never know what you are searching for. The client synchronises (copies) the repository's entire file or mapping list to its internal database, and does its own searches over those internal caches, all on your hard drive. It never sends search queries outside your own computer, nor does it log what you do look for. Your searches are your business, and no-one else's.

The PTR has a public shared access key. You do not have to contact anyone to get the key, so no one can infer who you are from it, and all regular user uploads are merged together, making it all a big mess. The PTR is more private than this document's worst case scenarios.

The only privacy risk for hydrus's repositories are in what you upload (ultimately by using the pending menu at the top of the program). Even then, it would typically be very difficult even for an admin to figure anything about you, but it is possible.

Repositories know nothing more about your client than they can infer from what you choose upload, and the software usually commands them to forget as much as possible as soon as possible. Specifically:

  tag repository file repository
  upload mappings download mappings upload file download file
Anonymous account is linked to action Yes No Yes No
IP address is remembered No No Maybe No

i.e:

 

 

Furthermore:

 

 

As always, there are some clever exceptions, mostly in servers between friends that will just have a handful of users, where the admin would be handing out registration keys and, with effort, could pick through the limited user creation records to figure out which access key you were. In that case, if you were to tag a file three years before it surfaced on the internet, and the admin knew you are attached to the account that made that tag, they could infer you most likely created it. If you set up a file repository for just a friend and yourself, it becomes trivial by elimination to guess who uploaded the NarutoXSonichu shota diaper fanon. If you sign up for a file repository that hosts only certain stuff and rack up a huge bandwidth record for the current month, anyone who knows that and also knows the account is yours alone will know basically what you were up to.

The PTR has a shared access key that is already public, so the risks are far smaller. No one can figure out who you are from the access key.

Note that the code is freely available and entirely mutable. If someone wants to put the time in, they could create a file repository that looks from the outside like any other but nonetheless logs the IP and nature of every request. As with any website, protect yourself, and if you do not trust an admin, do not give them or their server any information about you.

Even anonymised records can reveal personally identifying information. Don't trust anyone on any site who plans to release internal maps of 'anonymised' accounts -> content, even for some benevolent academic purpose.

Misc

Contact and Links

I welcome all your bug reports, questions, ideas, and comments. It is always interesting to see how other people are using my software and what they generally think of it. Most of the changes every week are suggested by users.

You can contact me by email, twitter, tumblr, discord, or the 8chan.moe /t/ thread or Endchan board--I do not mind which. Please know that I have difficulty with social media, and while I try to reply to all messages, it sometimes takes me a while to catch up.

The Github Issue Tracker was turned off for some time, as it did not fit my workflow and I could not keep up, but it is now running again, managed by a team of volunteer users. Please feel free to submit feature requests there if you are comfortable with Github. I am not socially active on Github, and it is mostly just a mirror of my home dev environment, where I work alone.

I am on the discord on Saturday afternoon, USA time, if you would like to talk live, and briefly on Wednesday after I put the release out. If that is not a good time for you, feel free to leave me a DM and I will get to you when I can. There are also plenty of other hydrus users who idle who would be happy to help with any sort of support question.

I delete all tweets and resolved email conversations after three months. So, if you think you are waiting for a reply, or I said I was going to work on something you care about and seem to have forgotten, please do nudge me.

Anyway:

Misc

Financial Support

can I contribute to hydrus development?

I do not expect anything from anyone. I'm amazed and grateful that anyone wants to use my software and share tags with others. I enjoy the feedback and work, and I hope to keep putting completely free weekly releases out as long as there is more to do.

That said, as I have developed the software, several users have kindly offered to contribute money, either as thanks for a specific feature or just in general. I kept putting the thought off, but I eventually got over my hesitance and set something up.

I find the tactics of most internet fundraising very distasteful, especially when they promise something they then fail to deliver. I much prefer the 'if you like me and would like to contribute, then please do, meanwhile I'll keep doing what I do' model. I support several 'put out regular free content' creators on Patreon in this way, and I get a lot out of it, even though I have no direct reward beyond the knowledge that I helped some people do something neat.

If you feel the same way about my work, I've set up a simple Patreon page here. If you can help out, it is deeply appreciated.

Misc

FAQ

what is a repository?

repository is a service in the hydrus network that stores a certain kind of information--files or tag mappings, for instance--as submitted by users all over the internet. Those users periodically synchronise with the repository so they know everything that it stores. Sometimes, like with tags, this means creating a complete local copy of everything on the repository. Hydrus network clients never send queries to repositories; they perform queries over their local cache of the repository's data, keeping everything confined to the same computer.

what is a tag?

wiki

tag is a small bit of text describing a single property of something. They make searching easy. Good examples are "flower" or "nicolas cage" or "the sopranos" or "2003". By combining several tags together ( e.g. [ 'tiger woods', 'sports illustrated', '2008' ] or [ 'cosplay', 'the legend of zelda' ] ), a huge image collection is reduced to a tiny and easy-to-digest sample.

A good word for the connection of a particular tag to a particular file is mapping.

Hydrus is designed with the intention that tags are for searching, not describing. Workflows and UI are tuned for finding files and other similar files (e.g. by the same artist), and while it is possible to have nice metadata overlays around files, this is not considered their chief purpose. Trying to have 'perfect' descriptions for files is often a rabbit-hole that can consume hours of work with relatively little demonstrable benefit.

All tags are automatically converted to lower case. 'Sunset Drive' becomes 'sunset drive'. Why?

  1. Although it is more beautiful to have 'The Lord of the Rings' rather than 'the lord of the rings', there are many, many special cases where style guides differ on which words to capitalise.
  2. As 'The Lord of the Rings' and 'the lord of the rings' are semantically identical, it is natural to search in a case insensitive way. When case does not matter, what point is there in recording it?

Furthermore, leading and trailing whitespace is removed, and multiple whitespace is collapsed to a single character.

'  yellow   dress '

becomes

'yellow dress'

 

what is a namespace?

namespace is a category that in hydrus prefixes a tag. An example is 'person' in the tag 'person:ron paul'--it lets people and software know that 'ron paul' is a name. You can create any namespace you like; just type one or more words and then a colon, and then the next string of text will have that namespace.

The hydrus client gives namespaces different colours so you can pick out important tags more easily in a large list, and you can also search by a particular namespace, even creating complicated predicates like 'give all files that do not have any character tags', for instance.

why not use filenames and folders?

As a retrieval method, filenames and folders are less and less useful as the number of files increases. Why?

So, the client tracks files by their hash. This technical identifier easily eliminates duplicates and permits the database to robustly attach other metadata like tags and ratings and known urls and notes and everything else, even across multiple clients and even if a file is deleted and later imported.

As a general rule, I suggest you not set up hydrus to parse and display all your imported files' filenames as tags. 'image.jpg' is useless as a tag. Shed the concept of filenames as you would chains.

can the client manage files from their original locations?

When the client imports a file, it makes a quickly accessible but human-ugly copy in its internal database, by default under install_dir/db/client_files. When it needs to access that file again, it always knows where it is, and it can be confident it is what it expects it to be. It never accesses the original again.

This storage method is not always convenient, particularly for those who are hesitant about converting to using hydrus completely and also do not want to maintain two large copies of their collections. The question comes up--"can hydrus track files from their original locations, without having to copy them into the db?"

The technical answer is, "This support could be added," but I have decided not to, mainly because:

It is not unusual for new users who ask for this feature to find their feelings change after getting more experience with the software. If desired, path text can be preserved as tags using regexes during import, and getting into the swing of searching by metadata rather than navigating folders often shows how very effective the former is over the latter. Most users eventually import most or all of their collection into hydrus permanently, deleting their old folder structure as they go.

For this reason, if you are hesitant about doing things the hydrus way, I advise you try running it on a smaller subset of your collection, say 5,000 files, leaving the original copies completely intact. After a month or two, think about how often you used hydrus to look at the files versus navigating through folders. If you barely used the folders, you probably do not need them any more, but if you used them a lot, then hydrus might not be for you, or it might only be for some sorts of files in your collection.

why use sqlite?

Hydrus uses SQLite for its database engine. Some users who have experience with other engines such as MySQL or PostgreSQL sometimes suggest them as alternatives. SQLite serves hydrus's needs well, and at the moment, there are no plans to change.

Since this question has come up frequently, a user has written an excellent document talking about the reasons to stick with SQLite. If you are interested in this subject, please check it out here:

https://gitgud.io/prkc/hydrus-why-sqlite/blob/master/README.md

what is a hash?

wiki

Hashes are a subject you usually have to be a software engineer to find interesting. The simple answer is that they are unique names for things. Hashes make excellent identifiers inside software, as you can safely assume that f099b5823f4e36a4bd6562812582f60e49e818cf445902b504b5533c6a5dad94 refers to one particular file and no other. In the client's normal operation, you will never encounter a file's hash. If you want to see a thumbnail bigger, double-click it; the software handles the mathematics.

For those who are interested: hydrus uses SHA-256, which spits out 32-byte (256-bit) hashes. The software stores the hash densely, as 32 bytes, only encoding it to 64 hex characters when the user views it or copies to clipboard. SHA-256 is not perfect, but it is a great compromise candidate; it is secure for now, it is reasonably fast, it is available for most programming languages, and newer CPUs perform it more efficiently all the time.

what is an access key?

The hydrus network's repositories do not use username/password, but instead a single strong identifier-password like this:

7ce4dbf18f7af8b420ee942bae42030aab344e91dc0e839260fcd71a4c9879e3

These hex numbers give you access to a particular account on a particular repository, and are often combined like so:

7ce4dbf18f7af8b420ee942bae42030aab344e91dc0e839260fcd71a4c9879e3@hostname.com:45871

They are long enough to be impossible to guess, and also randomly generated, so they reveal nothing personally identifying about you. Many people can use the same access key (and hence the same account) on a repository without consequence, although they will have to share any bandwidth limits, and if one person screws around and gets the account banned, everyone will lose access.

The access key is the account. Do not give it to anyone you do not want to have access to the account. An administrator will never need it; instead they will want your account key.

what is an account key?

This is another long string of random hexadecimal that identifies your account without giving away access. If you need to identify yourself to a repository administrator (say, to get your account's permissions modified), you will need to tell them your account key. You can copy it to your clipboard in services->review services.

why can my friend not see what I just uploaded?

The repositories do not work like conventional search engines; it takes a short but predictable while for changes to propagate to other users.

The client's searches only ever happen over its local cache of what is on the repository. Any changes you make will be delayed for others until their next update occurs. At the moment, the update period is 100,000 seconds, which is about 1 day and 4 hours.

Changelog

Changelog

Changelog 400>

Changelog

Changelog 350-399

Changelog

Changelog 300-349

Changelog

Changelog 250-299

Changelog

Changelog 200-249

Changelog

Changelog 150-199

Changelog

Changelog 100-149

Changelog

Changelog 50-99

Changelog

Changelog <49

Tip's and Trick's

Tip's and Trick's

File Look-up

MD5 Queries:


This will allow you to find tags for files you already have (if the hash value matches).

We will be doing this by copying the md5 hash value and downloading files from those values. This will not download the files again as you already have them on your client, but it will get the tags (if your download settings are set to that)

You need to be in advance mode for this function.

MD5 Queries for version 416 and up:

  1. Go to options > files and trash and tick the box for "When copying a file hashes..."
  2. Select the files you want tags for and right-click, share > copy > hash/hashes > md5 to copy the hash values of the selected files into your clipboard.
  3. Paste your clipboard in a gallery download tab of your choosing (booru needs to support md5 searching) by pressing cog button>paste multiple queries merged (or something like that) on the booru download tab.

MD5 Queries for version 415 and down:

  1. Select the files you want tags for
  2. Right-click, share>copy>hash/hashes>md5 to copy the hash values of the selected files into your clipboard.
  3. Paste in your newly copied hash values into Notepad++
  4. Add "md5:" as a prefix at the start of each line. You can do this by Replacing (By pressing Ctrl+H) ^ with md5: or md5= Replacing ^ will add whatever you replace with at the start of each line.
  5. Copy all the lines with the md5: prefix
  6. Paste your clipboard in a gallery download tab of your choosing (booru needs to support md5 searching) by pressing cog button>paste multiple queries merged (or something like that) on the booru download tab. After v336 of Hydrus, this should all the placed into a single queue.

This will add all your hash values to your download queue so it will search for the hash value one at a time to get the file.

Note that this only works if the hash value actually matches with the value you have. but you can try this on many sites like danbooru, gelbooru, sankaku and more. those that support md5 searching this way. You can also import all md5 supported nGUG's here:
 

Both of these ngugs are meant for specific usage with prefixes of md5: vs md5= as not all sites support either or.


Image Recognition:


Use IQDB to find files on other sites with either of these two options.
For newer users with GUI (Python is not required, Windows ONLY): https://github.com/nostrenz/hatate-iqdb-tagger
CLI+Python required (Linux/Mac/Docker option, also on Windows): https://github.com/rachmadaniHaryono/iqdb_tagger
These will upload your file (even resize first) to the iqdb servers to find similar looking files hosted on Boorus and other sites.
SauceNAO lookup: https://github.com/GoAwayNow/HydrausNao