pijul/pijul_contrib_guide: README.md

## What is this?

This is intended to serve as a reference for developers interested in working on/contributing to Pijul.

## Notation/conventions used herein

Out of respect for the time and sanity of readers, we will try to be careful about how we refer to elements in the hierarchicy of nest_user > repository > remote > channel > (change|discussion), as certain concepts only apply to particular combinations of these elements. As an example, when we want to implicate specifically a pair of both a repository and a channel when writing in plain text, we'll explicitly write it as a pair (repo, channel), since "repo/channel" is commonly meant to convey as "repo OR channel". When we use backticks to show a (pseudo-)code block, we'll continue to use a forward slash since that's what's used as a path separator in nest URLs. For example, `pijul pull user/repo`.

## Before submitting a patch

Please run `cargo fmt` to format your changes before submitting a patch. As long as you have cargo fmt installed, you can get pijul to do this for you by adding the following to `.pijul/config`:
```
[hooks]
record = [ "cargo fmt" ]
```

## For contributors using Rust Analyzer (esp. with VScode):
By default, Rust Analyzer runs `cargo check` with the `--test` flag included, which can really slow down editor interactions and feedback. For VScode users, it's recommended to set up a workspace in your local copy of the pijul repository. An example of a basic workspace file can be found below; you would edit the `<your_carg_bin_path>` entry to point to your cargo binary, and place this file in the project root as IE `pijul.code-workspace`. When you open vscode, it will ask you if you want to open the workspace.

Example pijul.code-workspace:
```
{
	"folders": [
		{
			"path": "."
		}
    ],
    "settings": {
        "rust-analyzer.checkOnSave.overrideCommand": [
            "<your_cargo_bin_path>"
            "check",
            "--workspace",
            "--message-format=json",
            "--manifest-path",
            "./Cargo.toml",
            "--lib",
            "--bins",
            "--examples"
        ]
    }
}
```

For users of other editors, if RA is taking a long time and reporting a large number of errors, this would be the first place to look. If you have a specific workaround for your setup, please submit it as a patch and it will be added to this list.


## Scripting tricks (for tests, automation, etc.)
When testing modifications or debugging issues, it's often the case that you'll want to recreate fairly complex situations and repository states quickly and in a reproducible manner. To that end, it's useful to know how to write scripts that interact with pijul.

### Forcing specific timestamps
`pijul record --timestamp <time>`

### Minimizing interactive editor pop-ups/pager interactions during script execution

#### Record a patch without being prompted for hunk selection/change message
`pijul record --all --message "<msg>"`

#### Push/pull without being prompted for hunk selection
`pijul pull --all <remote>`

`pijul push --all <remote>`

## Logging

Pijul uses [log](https://crates.io/crates/log) and [env_logger](https://crates.io/crates/env_logger) to log messages to stdout; messages are tagged with priority levels (`log` provides the levels error, warn, info, debug, trace). The function that sets up pijul's logging is `pijul::env_logger_init`. You can consult the documentation for `log` and `env_logger` to get more information, but to turn logging on, run pijul with:

```
RUST_LOG=<level> <command>
```

for example:

```
RUST_LOG=warn pijul change
```

A quick and dirty example of how you can add new logging messages with the macros provided by log (which use standard format string syntax):
```
fn some_fun() {
    ...
    log::warn!("I want to see this value: {}", value);
    
    ...
}
```

## Important types, API touchstones

### Transactions

The big ticket source of state is a transaction (often these variables are called `txn` in the source code). There are different `_TxnT` traits implemented by different types of transaction, but the basic concrete type for a transaction is `GenericTxn`

### pijul::repository::Repository

Unsurprisingly, this contains a lot of state pertaining to a particular repository.

### pijul::remote::RemoteRepo

`RemoteRepo` is a concrete type frequently used to interact with a Transaction. `RemoteRepo` is an enum, with a variant for each kind of remote; Local, Ssh, Http, LocalChannel, as well as a bookkeeping `None` variant. For clarity, `Local` represents a local repo other than the one the user is currently working within, while `LocalChannel` is a channel within the current repo.

### libpijul::pristine::Hash and libpijul::pristine::Merkle

`Hash` and `SerializedHash` are sequences of bytes which represent changes.\
`Merkle` and `SerializedMerkle` are sequences of bytes which represent the state of a repository.\
Both `Hash` and `Merkle` appear to users as strings of [base32](https://en.wikipedia.org/wiki/Base32).\ 

You can freely convert between Hash <-> SerializedHash and Merkle <-> SerializedMerkle using the `From` and `Into` traits. Example: `Hash::from(_)` or `my_hash.into()`.

## Common tasks in Pijul's codebase

### Visiting the change log

Use `TxnTExt::reverse_log(..)` (most recent first) or `TxnTExt::log(..)` (oldest first). You'll need a transaction and a `ChannelRef`. An example can be found in `pijul::command::log.rs`.

## Working internally with push/pull (remotes, caching, and you)

Pijul recognizes two "views" of a (remote, channel) pair. The first is just the actual state of the (remote, channel). The second is a locally stored version of the remote, which is the last version of the actual remote that we've signed off on (we'll continue to call this the "local remote cache"). During a push or pull, the local remote cache almost always just downloads and caches whatever new patches are present in the (remote, channel) before you select what you actually want to push or pull via hunk selection. (NOTE: For `LocalChannel` remotes, there's no local cache; `LocalChannel` means it's just another channel in the same repository you're already working in).

API notes:\
+ As usual, we need some kind of Transaction for most of the interactions with either view of a remote.\
+ `pijul::remote::RemoteRepo` represents the _actual_ remote.\
+ `libpijul::pristine::RemoteRef` represents the local cache of the last remote we've signed off on.\

Example of the base case:\
+ local/master is comprised of the patches `[(0, A), (1, B)]`\
+ the last time we interacted with remote/bugfix, it was also at `[(0, A), (1, B)]`\
+ remote/bugfix is now comprised of the patches `[(0, A), (1, B), (2, C), (3, D), (4, E)]`\

If we invoke `pijul pull remote/bufgix`, pijul will put the new patches `[(2, C), (3, D), (4, E)]` in the local remote cache for `remote/bugfix`, and ask which of the new changes you actually want to pull into `local/master`. Now, even if only `(2, C)` is pulled via hunk selection, subsequent pull operations generally won't have to re-download `(3, D)` and `(4, E)`.

At time of writing, the only notable exception to this straight-forward caching strategy is if the actual remote has unrecorded a patch that ALSO exists in the channel we're pulling to or pushing from. 

Example of the exceptional case:\
we're pulling from remote/bugfix into local/master.\
+ local/master is comprised of the patches `[(0, A), (1, B), (2, C), (3, D)]`,\
+ the last version of `remote/bugfix` we saw was `[(0, A), (1, B), (2, C)]`\

If, when we invoke `pijul pull remote/bugfix`, we discover that the new set of changes comprising that (remote, channel) is `[(0, A), (_, _), (_, _), (3, X), (4, Y)]`, meaning that the actual remote has unrecorded `(1, B)` and `(2, C)` since the last time we interacted with it, then the patches after the dichotomy (the last point at which the local remote cache and the actual remote were the same; here `(0, A)`) will not be cached.

We want to notify user that `(1, B)` and `(2, C)` have been unrecorded in the remote they're pulling from (or pushing to) before they're presented with hunk selection, and preserving the divergence between the local remote cache and the actual remote allows us to do this. Furthermore, we don't want to overwrite the local remote cache, because we want to continue to remind the user of this unrecord until either a) the user forces the cache to update (with the `--force-cache` flag), or b) the user fixes the discrepancy by unrecording `(1, B)` and `(2, C)` in local/master.

For the sake of completeness, if the actual remote has unrecorded one or more patches, but they do NOT exist in the channel we're trying to pull/push to, the cache will be updated and the user will not be notified. It is assumed that the user isn't concerned with those patches since they were either actively ignored during a previous hunk selection, or we never knew about them in the first place.