Discussion:
[PATCH v2 01/16] CIFS: Fix async reading on reconnects
Pavel Shilovsky
2014-06-27 09:57:38 UTC
Permalink
If we get into read_into_pages() from cifs_readv_receive() and then
loose a network, we issue cifs_reconnect that moves all mids to
a private list and issue their callbacks. The callback of the async
read request sets a mid to retry, frees it and wakes up a process
that waits on the rdata completion.

After the connection is established we return from read_into_pages()
with a short read, use the mid that was freed before and try to read
the remaining data from the a newly created socket. Both actions are
not what we want to do. In reconnect cases (-EAGAIN) we should not
mask off the error with a short read but should return the error
code instead.

Cc: Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+***@public.gmane.org>
Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/file.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index e90a1e9..6b6df30 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2823,7 +2823,7 @@ cifs_uncached_read_into_pages(struct TCP_Server_Info *server,
total_read += result;
}

- return total_read > 0 ? total_read : result;
+ return total_read > 0 && result != -EAGAIN ? total_read : result;
}

ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
@@ -3231,7 +3231,7 @@ cifs_readpages_read_into_pages(struct TCP_Server_Info *server,
total_read += result;
}

- return total_read > 0 ? total_read : result;
+ return total_read > 0 && result != -EAGAIN ? total_read : result;
}

static int cifs_readpages(struct file *file, struct address_space *mapping,
--
1.8.1.2
Pavel Shilovsky
2014-06-27 09:57:40 UTC
Permalink
Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/file.c | 100 +++++++++++++++++++++++++++++++--------------------------
1 file changed, 54 insertions(+), 46 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 69d1763..306c3df 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1958,6 +1958,58 @@ wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
return nr_pages;
}

+static int
+wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages,
+ struct address_space *mapping, struct writeback_control *wbc)
+{
+ int rc = 0;
+ struct TCP_Server_Info *server;
+ unsigned int i;
+
+ wdata->sync_mode = wbc->sync_mode;
+ wdata->nr_pages = nr_pages;
+ wdata->offset = page_offset(wdata->pages[0]);
+ wdata->pagesz = PAGE_CACHE_SIZE;
+ wdata->tailsz = min(i_size_read(mapping->host) -
+ page_offset(wdata->pages[nr_pages - 1]),
+ (loff_t)PAGE_CACHE_SIZE);
+ wdata->bytes = ((nr_pages - 1) * PAGE_CACHE_SIZE) + wdata->tailsz;
+
+ do {
+ if (wdata->cfile != NULL)
+ cifsFileInfo_put(wdata->cfile);
+ wdata->cfile = find_writable_file(CIFS_I(mapping->host), false);
+ if (!wdata->cfile) {
+ cifs_dbg(VFS, "No writable handles for inode\n");
+ rc = -EBADF;
+ break;
+ }
+ wdata->pid = wdata->cfile->pid;
+ server = tlink_tcon(wdata->cfile->tlink)->ses->server;
+ rc = server->ops->async_writev(wdata, cifs_writedata_release);
+ } while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
+
+ for (i = 0; i < nr_pages; ++i)
+ unlock_page(wdata->pages[i]);
+
+ if (!rc)
+ return rc;
+
+ /* send failure -- clean up the mess */
+ for (i = 0; i < nr_pages; ++i) {
+ if (rc == -EAGAIN)
+ redirty_page_for_writepage(wbc, wdata->pages[i]);
+ else
+ SetPageError(wdata->pages[i]);
+ end_page_writeback(wdata->pages[i]);
+ page_cache_release(wdata->pages[i]);
+ }
+ if (rc != -EAGAIN)
+ mapping_set_error(mapping, rc);
+
+ return rc;
+}
+
static int cifs_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
@@ -1965,7 +2017,6 @@ static int cifs_writepages(struct address_space *mapping,
bool done = false, scanned = false, range_whole = false;
pgoff_t end, index;
struct cifs_writedata *wdata;
- struct TCP_Server_Info *server;
int rc = 0;

/*
@@ -1987,7 +2038,7 @@ static int cifs_writepages(struct address_space *mapping,
}
retry:
while (!done && index <= end) {
- unsigned int i, nr_pages, found_pages;
+ unsigned int nr_pages, found_pages;
pgoff_t next = 0, tofind;
struct page **pages;

@@ -2032,50 +2083,7 @@ retry:
continue;
}

- wdata->sync_mode = wbc->sync_mode;
- wdata->nr_pages = nr_pages;
- wdata->offset = page_offset(wdata->pages[0]);
- wdata->pagesz = PAGE_CACHE_SIZE;
- wdata->tailsz =
- min(i_size_read(mapping->host) -
- page_offset(wdata->pages[nr_pages - 1]),
- (loff_t)PAGE_CACHE_SIZE);
- wdata->bytes = ((nr_pages - 1) * PAGE_CACHE_SIZE) +
- wdata->tailsz;
-
- do {
- if (wdata->cfile != NULL)
- cifsFileInfo_put(wdata->cfile);
- wdata->cfile = find_writable_file(CIFS_I(mapping->host),
- false);
- if (!wdata->cfile) {
- cifs_dbg(VFS, "No writable handles for inode\n");
- rc = -EBADF;
- break;
- }
- wdata->pid = wdata->cfile->pid;
- server = tlink_tcon(wdata->cfile->tlink)->ses->server;
- rc = server->ops->async_writev(wdata,
- cifs_writedata_release);
- } while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
-
- for (i = 0; i < nr_pages; ++i)
- unlock_page(wdata->pages[i]);
-
- /* send failure -- clean up the mess */
- if (rc != 0) {
- for (i = 0; i < nr_pages; ++i) {
- if (rc == -EAGAIN)
- redirty_page_for_writepage(wbc,
- wdata->pages[i]);
- else
- SetPageError(wdata->pages[i]);
- end_page_writeback(wdata->pages[i]);
- page_cache_release(wdata->pages[i]);
- }
- if (rc != -EAGAIN)
- mapping_set_error(mapping, rc);
- }
+ rc = wdata_send_pages(wdata, nr_pages, mapping, wbc);
kref_put(&wdata->refcount, cifs_writedata_release);

wbc->nr_to_write -= nr_pages;
--
1.8.1.2
Jeff Layton
2014-07-09 13:08:23 UTC
Permalink
On Fri, 27 Jun 2014 13:57:40 +0400
Post by Pavel Shilovsky
---
fs/cifs/file.c | 100 +++++++++++++++++++++++++++++++--------------------------
1 file changed, 54 insertions(+), 46 deletions(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 69d1763..306c3df 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1958,6 +1958,58 @@ wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
return nr_pages;
}
+static int
+wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages,
+ struct address_space *mapping, struct writeback_control *wbc)
+{
+ int rc = 0;
+ struct TCP_Server_Info *server;
+ unsigned int i;
+
+ wdata->sync_mode = wbc->sync_mode;
+ wdata->nr_pages = nr_pages;
+ wdata->offset = page_offset(wdata->pages[0]);
+ wdata->pagesz = PAGE_CACHE_SIZE;
+ wdata->tailsz = min(i_size_read(mapping->host) -
+ page_offset(wdata->pages[nr_pages - 1]),
+ (loff_t)PAGE_CACHE_SIZE);
+ wdata->bytes = ((nr_pages - 1) * PAGE_CACHE_SIZE) + wdata->tailsz;
+
+ do {
+ if (wdata->cfile != NULL)
+ cifsFileInfo_put(wdata->cfile);
+ wdata->cfile = find_writable_file(CIFS_I(mapping->host), false);
+ if (!wdata->cfile) {
+ cifs_dbg(VFS, "No writable handles for inode\n");
+ rc = -EBADF;
+ break;
+ }
+ wdata->pid = wdata->cfile->pid;
+ server = tlink_tcon(wdata->cfile->tlink)->ses->server;
+ rc = server->ops->async_writev(wdata, cifs_writedata_release);
+ } while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
+
+ for (i = 0; i < nr_pages; ++i)
+ unlock_page(wdata->pages[i]);
+
+ if (!rc)
+ return rc;
+
+ /* send failure -- clean up the mess */
+ for (i = 0; i < nr_pages; ++i) {
+ if (rc == -EAGAIN)
+ redirty_page_for_writepage(wbc, wdata->pages[i]);
+ else
+ SetPageError(wdata->pages[i]);
+ end_page_writeback(wdata->pages[i]);
+ page_cache_release(wdata->pages[i]);
+ }
+ if (rc != -EAGAIN)
+ mapping_set_error(mapping, rc);
+
+ return rc;
+}
+
static int cifs_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
@@ -1965,7 +2017,6 @@ static int cifs_writepages(struct address_space *mapping,
bool done = false, scanned = false, range_whole = false;
pgoff_t end, index;
struct cifs_writedata *wdata;
- struct TCP_Server_Info *server;
int rc = 0;
/*
@@ -1987,7 +2038,7 @@ static int cifs_writepages(struct address_space *mapping,
}
while (!done && index <= end) {
- unsigned int i, nr_pages, found_pages;
+ unsigned int nr_pages, found_pages;
pgoff_t next = 0, tofind;
struct page **pages;
continue;
}
- wdata->sync_mode = wbc->sync_mode;
- wdata->nr_pages = nr_pages;
- wdata->offset = page_offset(wdata->pages[0]);
- wdata->pagesz = PAGE_CACHE_SIZE;
- wdata->tailsz =
- min(i_size_read(mapping->host) -
- page_offset(wdata->pages[nr_pages - 1]),
- (loff_t)PAGE_CACHE_SIZE);
- wdata->bytes = ((nr_pages - 1) * PAGE_CACHE_SIZE) +
- wdata->tailsz;
-
- do {
- if (wdata->cfile != NULL)
- cifsFileInfo_put(wdata->cfile);
- wdata->cfile = find_writable_file(CIFS_I(mapping->host),
- false);
- if (!wdata->cfile) {
- cifs_dbg(VFS, "No writable handles for inode\n");
- rc = -EBADF;
- break;
- }
- wdata->pid = wdata->cfile->pid;
- server = tlink_tcon(wdata->cfile->tlink)->ses->server;
- rc = server->ops->async_writev(wdata,
- cifs_writedata_release);
- } while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
-
- for (i = 0; i < nr_pages; ++i)
- unlock_page(wdata->pages[i]);
-
- /* send failure -- clean up the mess */
- if (rc != 0) {
- for (i = 0; i < nr_pages; ++i) {
- if (rc == -EAGAIN)
- redirty_page_for_writepage(wbc,
- wdata->pages[i]);
- else
- SetPageError(wdata->pages[i]);
- end_page_writeback(wdata->pages[i]);
- page_cache_release(wdata->pages[i]);
- }
- if (rc != -EAGAIN)
- mapping_set_error(mapping, rc);
- }
+ rc = wdata_send_pages(wdata, nr_pages, mapping, wbc);
kref_put(&wdata->refcount, cifs_writedata_release);
wbc->nr_to_write -= nr_pages;
Reviewed-by: Jeff Layton <jlayton-***@public.gmane.org>
Pavel Shilovsky
2014-07-21 15:53:16 UTC
Permalink
Post by Jeff Layton
On Fri, 27 Jun 2014 13:57:40 +0400
Post by Pavel Shilovsky
---
fs/cifs/file.c | 100 +++++++++++++++++++++++++++++++--------------------------
1 file changed, 54 insertions(+), 46 deletions(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 69d1763..306c3df 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1958,6 +1958,58 @@ wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
return nr_pages;
}
+static int
+wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages,
+ struct address_space *mapping, struct writeback_control *wbc)
+{
+ int rc = 0;
+ struct TCP_Server_Info *server;
+ unsigned int i;
+
+ wdata->sync_mode = wbc->sync_mode;
+ wdata->nr_pages = nr_pages;
+ wdata->offset = page_offset(wdata->pages[0]);
+ wdata->pagesz = PAGE_CACHE_SIZE;
+ wdata->tailsz = min(i_size_read(mapping->host) -
+ page_offset(wdata->pages[nr_pages - 1]),
+ (loff_t)PAGE_CACHE_SIZE);
+ wdata->bytes = ((nr_pages - 1) * PAGE_CACHE_SIZE) + wdata->tailsz;
+
+ do {
+ if (wdata->cfile != NULL)
+ cifsFileInfo_put(wdata->cfile);
+ wdata->cfile = find_writable_file(CIFS_I(mapping->host), false);
+ if (!wdata->cfile) {
+ cifs_dbg(VFS, "No writable handles for inode\n");
+ rc = -EBADF;
+ break;
+ }
+ wdata->pid = wdata->cfile->pid;
+ server = tlink_tcon(wdata->cfile->tlink)->ses->server;
+ rc = server->ops->async_writev(wdata, cifs_writedata_release);
+ } while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
+
+ for (i = 0; i < nr_pages; ++i)
+ unlock_page(wdata->pages[i]);
+
I changed this patch in v3 and decided to leave error processing in
the caller (cifs_writepages).
Post by Jeff Layton
Post by Pavel Shilovsky
+ if (!rc)
+ return rc;
+
+ /* send failure -- clean up the mess */
+ for (i = 0; i < nr_pages; ++i) {
+ if (rc == -EAGAIN)
+ redirty_page_for_writepage(wbc, wdata->pages[i]);
+ else
+ SetPageError(wdata->pages[i]);
+ end_page_writeback(wdata->pages[i]);
+ page_cache_release(wdata->pages[i]);
+ }
+ if (rc != -EAGAIN)
+ mapping_set_error(mapping, rc);
+
+ return rc;
+}
+
static int cifs_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
@@ -1965,7 +2017,6 @@ static int cifs_writepages(struct address_space *mapping,
bool done = false, scanned = false, range_whole = false;
pgoff_t end, index;
struct cifs_writedata *wdata;
- struct TCP_Server_Info *server;
int rc = 0;
/*
@@ -1987,7 +2038,7 @@ static int cifs_writepages(struct address_space *mapping,
}
while (!done && index <= end) {
- unsigned int i, nr_pages, found_pages;
+ unsigned int nr_pages, found_pages;
pgoff_t next = 0, tofind;
struct page **pages;
continue;
}
- wdata->sync_mode = wbc->sync_mode;
- wdata->nr_pages = nr_pages;
- wdata->offset = page_offset(wdata->pages[0]);
- wdata->pagesz = PAGE_CACHE_SIZE;
- wdata->tailsz =
- min(i_size_read(mapping->host) -
- page_offset(wdata->pages[nr_pages - 1]),
- (loff_t)PAGE_CACHE_SIZE);
- wdata->bytes = ((nr_pages - 1) * PAGE_CACHE_SIZE) +
- wdata->tailsz;
-
- do {
- if (wdata->cfile != NULL)
- cifsFileInfo_put(wdata->cfile);
- wdata->cfile = find_writable_file(CIFS_I(mapping->host),
- false);
- if (!wdata->cfile) {
- cifs_dbg(VFS, "No writable handles for inode\n");
- rc = -EBADF;
- break;
- }
- wdata->pid = wdata->cfile->pid;
- server = tlink_tcon(wdata->cfile->tlink)->ses->server;
- rc = server->ops->async_writev(wdata,
- cifs_writedata_release);
- } while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
-
- for (i = 0; i < nr_pages; ++i)
- unlock_page(wdata->pages[i]);
-
the below part should be in cifs_writepages() since it's easier to
understand the code this way: we get pages from the page cache, call
wdata_send_pages() and clean up pages if sending fails.
Post by Jeff Layton
Post by Pavel Shilovsky
- /* send failure -- clean up the mess */
- if (rc != 0) {
- for (i = 0; i < nr_pages; ++i) {
- if (rc == -EAGAIN)
- redirty_page_for_writepage(wbc,
- wdata->pages[i]);
- else
- SetPageError(wdata->pages[i]);
- end_page_writeback(wdata->pages[i]);
- page_cache_release(wdata->pages[i]);
- }
- if (rc != -EAGAIN)
- mapping_set_error(mapping, rc);
- }
+ rc = wdata_send_pages(wdata, nr_pages, mapping, wbc);
kref_put(&wdata->refcount, cifs_writedata_release);
wbc->nr_to_write -= nr_pages;
Jeff, I removed your reviewed-by tag since the patch is changed. Could
you look at the new smaller version, please?
--
Best regards,
Pavel Shilovsky.
Pavel Shilovsky
2014-06-27 09:57:41 UTC
Permalink
Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/file.c | 56 ++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 306c3df..4117da0 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1878,6 +1878,40 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
return rc;
}

+static struct cifs_writedata *
+wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping,
+ pgoff_t end, pgoff_t *index,
+ unsigned int *found_pages)
+{
+ unsigned int nr_pages;
+ struct page **pages;
+ struct cifs_writedata *wdata;
+
+ wdata = cifs_writedata_alloc((unsigned int)tofind,
+ cifs_writev_complete);
+ if (!wdata)
+ return NULL;
+
+ /*
+ * find_get_pages_tag seems to return a max of 256 on each
+ * iteration, so we must call it several times in order to
+ * fill the array or the wsize is effectively limited to
+ * 256 * PAGE_CACHE_SIZE.
+ */
+ *found_pages = 0;
+ pages = wdata->pages;
+ do {
+ nr_pages = find_get_pages_tag(mapping, index,
+ PAGECACHE_TAG_DIRTY, tofind,
+ pages);
+ *found_pages += nr_pages;
+ tofind -= nr_pages;
+ pages += nr_pages;
+ } while (nr_pages && tofind && *index <= end);
+
+ return wdata;
+}
+
static unsigned int
wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
struct address_space *mapping,
@@ -2040,35 +2074,17 @@ retry:
while (!done && index <= end) {
unsigned int nr_pages, found_pages;
pgoff_t next = 0, tofind;
- struct page **pages;

tofind = min((cifs_sb->wsize / PAGE_CACHE_SIZE) - 1,
end - index) + 1;

- wdata = cifs_writedata_alloc((unsigned int)tofind,
- cifs_writev_complete);
+ wdata = wdata_alloc_and_fillpages(tofind, mapping, end, &index,
+ &found_pages);
if (!wdata) {
rc = -ENOMEM;
break;
}

- /*
- * find_get_pages_tag seems to return a max of 256 on each
- * iteration, so we must call it several times in order to
- * fill the array or the wsize is effectively limited to
- * 256 * PAGE_CACHE_SIZE.
- */
- found_pages = 0;
- pages = wdata->pages;
- do {
- nr_pages = find_get_pages_tag(mapping, &index,
- PAGECACHE_TAG_DIRTY,
- tofind, pages);
- found_pages += nr_pages;
- tofind -= nr_pages;
- pages += nr_pages;
- } while (nr_pages && tofind && index <= end);
-
if (found_pages == 0) {
kref_put(&wdata->refcount, cifs_writedata_release);
break;
--
1.8.1.2
Shirish Pargaonkar
2014-07-19 06:24:27 UTC
Permalink
This looks correct.
Post by Pavel Shilovsky
---
fs/cifs/file.c | 56 ++++++++++++++++++++++++++++++++++++--------------------
1 file changed, 36 insertions(+), 20 deletions(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 306c3df..4117da0 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1878,6 +1878,40 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
return rc;
}
+static struct cifs_writedata *
+wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping,
+ pgoff_t end, pgoff_t *index,
+ unsigned int *found_pages)
+{
+ unsigned int nr_pages;
+ struct page **pages;
+ struct cifs_writedata *wdata;
+
+ wdata = cifs_writedata_alloc((unsigned int)tofind,
+ cifs_writev_complete);
+ if (!wdata)
+ return NULL;
+
+ /*
+ * find_get_pages_tag seems to return a max of 256 on each
+ * iteration, so we must call it several times in order to
+ * fill the array or the wsize is effectively limited to
+ * 256 * PAGE_CACHE_SIZE.
+ */
+ *found_pages = 0;
+ pages = wdata->pages;
+ do {
+ nr_pages = find_get_pages_tag(mapping, index,
+ PAGECACHE_TAG_DIRTY, tofind,
+ pages);
+ *found_pages += nr_pages;
+ tofind -= nr_pages;
+ pages += nr_pages;
+ } while (nr_pages && tofind && *index <= end);
+
+ return wdata;
+}
+
static unsigned int
wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
struct address_space *mapping,
while (!done && index <= end) {
unsigned int nr_pages, found_pages;
pgoff_t next = 0, tofind;
- struct page **pages;
tofind = min((cifs_sb->wsize / PAGE_CACHE_SIZE) - 1,
end - index) + 1;
- wdata = cifs_writedata_alloc((unsigned int)tofind,
- cifs_writev_complete);
+ wdata = wdata_alloc_and_fillpages(tofind, mapping, end, &index,
+ &found_pages);
if (!wdata) {
rc = -ENOMEM;
break;
}
- /*
- * find_get_pages_tag seems to return a max of 256 on each
- * iteration, so we must call it several times in order to
- * fill the array or the wsize is effectively limited to
- * 256 * PAGE_CACHE_SIZE.
- */
- found_pages = 0;
- pages = wdata->pages;
- do {
- nr_pages = find_get_pages_tag(mapping, &index,
- PAGECACHE_TAG_DIRTY,
- tofind, pages);
- found_pages += nr_pages;
- tofind -= nr_pages;
- pages += nr_pages;
- } while (nr_pages && tofind && index <= end);
-
if (found_pages == 0) {
kref_put(&wdata->refcount, cifs_writedata_release);
break;
--
1.8.1.2
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Pavel Shilovsky
2014-06-27 09:57:43 UTC
Permalink
If wsize changes on reconnect we need to use new writedata structure
that for retrying.

Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/cifsglob.h | 2 ++
fs/cifs/cifssmb.c | 84 +++++++++++++++++++++++++++++++++++++++++++-----------
fs/cifs/smb1ops.c | 7 +++++
fs/cifs/smb2ops.c | 10 +++++++
4 files changed, 87 insertions(+), 16 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index de6aed8..8c53f20 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -404,6 +404,8 @@ struct smb_version_operations {
const struct cifs_fid *, u32 *);
int (*set_acl)(struct cifs_ntsd *, __u32, struct inode *, const char *,
int);
+ /* writepages retry size */
+ unsigned int (*wp_retry_size)(struct inode *);
};

struct smb_version_values {
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index b7e5b65..41c9067 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -1899,27 +1899,79 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
int i, rc;
struct inode *inode = wdata->cfile->dentry->d_inode;
struct TCP_Server_Info *server;
+ unsigned int rest_len;

- for (i = 0; i < wdata->nr_pages; i++) {
- lock_page(wdata->pages[i]);
- clear_page_dirty_for_io(wdata->pages[i]);
- }
-
+ server = tlink_tcon(wdata->cfile->tlink)->ses->server;
+ i = 0;
+ rest_len = wdata->bytes;
do {
- server = tlink_tcon(wdata->cfile->tlink)->ses->server;
- rc = server->ops->async_writev(wdata, cifs_writedata_release);
- } while (rc == -EAGAIN);
+ struct cifs_writedata *wdata2;
+ unsigned int j, nr_pages, wsize, tailsz, cur_len;
+
+ wsize = server->ops->wp_retry_size(inode);
+ if (wsize < rest_len) {
+ nr_pages = wsize / PAGE_CACHE_SIZE;
+ if (!nr_pages) {
+ rc = -ENOTSUPP;
+ break;
+ }
+ cur_len = nr_pages * PAGE_CACHE_SIZE;
+ tailsz = PAGE_CACHE_SIZE;
+ } else {
+ nr_pages = DIV_ROUND_UP(rest_len, PAGE_CACHE_SIZE);
+ cur_len = rest_len;
+ tailsz = rest_len - (nr_pages - 1) * PAGE_CACHE_SIZE;
+ }

- for (i = 0; i < wdata->nr_pages; i++) {
- unlock_page(wdata->pages[i]);
- if (rc != 0) {
- SetPageError(wdata->pages[i]);
- end_page_writeback(wdata->pages[i]);
- page_cache_release(wdata->pages[i]);
+ wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete);
+ if (!wdata2) {
+ rc = -ENOMEM;
+ break;
}
- }

- mapping_set_error(inode->i_mapping, rc);
+ for (j = 0; j < nr_pages; j++) {
+ wdata2->pages[j] = wdata->pages[i + j];
+ lock_page(wdata2->pages[j]);
+ clear_page_dirty_for_io(wdata2->pages[j]);
+ }
+
+ wdata2->sync_mode = wdata->sync_mode;
+ wdata2->nr_pages = nr_pages;
+ wdata2->offset = page_offset(wdata2->pages[0]);
+ wdata2->pagesz = PAGE_CACHE_SIZE;
+ wdata2->tailsz = tailsz;
+ wdata2->bytes = cur_len;
+
+ wdata2->cfile = find_writable_file(CIFS_I(inode), false);
+ if (!wdata2->cfile) {
+ cifs_dbg(VFS, "No writable handles for inode\n");
+ rc = -EBADF;
+ break;
+ }
+ wdata2->pid = wdata2->cfile->pid;
+ rc = server->ops->async_writev(wdata2, cifs_writedata_release);
+
+ for (j = 0; j < nr_pages; j++) {
+ unlock_page(wdata2->pages[j]);
+ if (rc != 0 && rc != -EAGAIN) {
+ SetPageError(wdata2->pages[j]);
+ end_page_writeback(wdata2->pages[j]);
+ page_cache_release(wdata2->pages[j]);
+ }
+ }
+
+ if (rc) {
+ kref_put(&wdata2->refcount, cifs_writedata_release);
+ if (rc == -EAGAIN)
+ continue;
+ mapping_set_error(inode->i_mapping, rc);
+ break;
+ }
+
+ rest_len -= cur_len;
+ i += nr_pages;
+ } while (i < wdata->nr_pages);
+
kref_put(&wdata->refcount, cifs_writedata_release);
}

diff --git a/fs/cifs/smb1ops.c b/fs/cifs/smb1ops.c
index d1fdfa8..8a96342 100644
--- a/fs/cifs/smb1ops.c
+++ b/fs/cifs/smb1ops.c
@@ -1009,6 +1009,12 @@ cifs_is_read_op(__u32 oplock)
return oplock == OPLOCK_READ;
}

+static unsigned int
+cifs_wp_retry_size(struct inode *inode)
+{
+ return CIFS_SB(inode->i_sb)->wsize;
+}
+
struct smb_version_operations smb1_operations = {
.send_cancel = send_nt_cancel,
.compare_fids = cifs_compare_fids,
@@ -1078,6 +1084,7 @@ struct smb_version_operations smb1_operations = {
.query_mf_symlink = cifs_query_mf_symlink,
.create_mf_symlink = cifs_create_mf_symlink,
.is_read_op = cifs_is_read_op,
+ .wp_retry_size = cifs_wp_retry_size,
#ifdef CONFIG_CIFS_XATTR
.query_all_EAs = CIFSSMBQAllEAs,
.set_EA = CIFSSMBSetEA,
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 787844b..e35ce5b 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -1104,6 +1104,13 @@ smb3_parse_lease_buf(void *buf, unsigned int *epoch)
return le32_to_cpu(lc->lcontext.LeaseState);
}

+static unsigned int
+smb2_wp_retry_size(struct inode *inode)
+{
+ return min_t(unsigned int, CIFS_SB(inode->i_sb)->wsize,
+ SMB2_MAX_BUFFER_SIZE);
+}
+
struct smb_version_operations smb20_operations = {
.compare_fids = smb2_compare_fids,
.setup_request = smb2_setup_request,
@@ -1177,6 +1184,7 @@ struct smb_version_operations smb20_operations = {
.create_lease_buf = smb2_create_lease_buf,
.parse_lease_buf = smb2_parse_lease_buf,
.clone_range = smb2_clone_range,
+ .wp_retry_size = smb2_wp_retry_size,
};

struct smb_version_operations smb21_operations = {
@@ -1252,6 +1260,7 @@ struct smb_version_operations smb21_operations = {
.create_lease_buf = smb2_create_lease_buf,
.parse_lease_buf = smb2_parse_lease_buf,
.clone_range = smb2_clone_range,
+ .wp_retry_size = smb2_wp_retry_size,
};

struct smb_version_operations smb30_operations = {
@@ -1330,6 +1339,7 @@ struct smb_version_operations smb30_operations = {
.parse_lease_buf = smb3_parse_lease_buf,
.clone_range = smb2_clone_range,
.validate_negotiate = smb3_validate_negotiate,
+ .wp_retry_size = smb2_wp_retry_size,
};

struct smb_version_values smb20_values = {
--
1.8.1.2
Shirish Pargaonkar
2014-07-20 19:49:29 UTC
Permalink
why bother setting rc (to ENOTSUPP, ENOMEM) if those errors are not returned
once we break out of do while loop?
Instead perhaps log an error message?
Post by Pavel Shilovsky
If wsize changes on reconnect we need to use new writedata structure
that for retrying.
---
fs/cifs/cifsglob.h | 2 ++
fs/cifs/cifssmb.c | 84 +++++++++++++++++++++++++++++++++++++++++++-----------
fs/cifs/smb1ops.c | 7 +++++
fs/cifs/smb2ops.c | 10 +++++++
4 files changed, 87 insertions(+), 16 deletions(-)
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index de6aed8..8c53f20 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -404,6 +404,8 @@ struct smb_version_operations {
const struct cifs_fid *, u32 *);
int (*set_acl)(struct cifs_ntsd *, __u32, struct inode *, const char *,
int);
+ /* writepages retry size */
+ unsigned int (*wp_retry_size)(struct inode *);
};
struct smb_version_values {
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index b7e5b65..41c9067 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -1899,27 +1899,79 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
int i, rc;
struct inode *inode = wdata->cfile->dentry->d_inode;
struct TCP_Server_Info *server;
+ unsigned int rest_len;
- for (i = 0; i < wdata->nr_pages; i++) {
- lock_page(wdata->pages[i]);
- clear_page_dirty_for_io(wdata->pages[i]);
- }
-
+ server = tlink_tcon(wdata->cfile->tlink)->ses->server;
+ i = 0;
+ rest_len = wdata->bytes;
do {
- server = tlink_tcon(wdata->cfile->tlink)->ses->server;
- rc = server->ops->async_writev(wdata, cifs_writedata_release);
- } while (rc == -EAGAIN);
+ struct cifs_writedata *wdata2;
+ unsigned int j, nr_pages, wsize, tailsz, cur_len;
+
+ wsize = server->ops->wp_retry_size(inode);
+ if (wsize < rest_len) {
+ nr_pages = wsize / PAGE_CACHE_SIZE;
+ if (!nr_pages) {
+ rc = -ENOTSUPP;
+ break;
+ }
+ cur_len = nr_pages * PAGE_CACHE_SIZE;
+ tailsz = PAGE_CACHE_SIZE;
+ } else {
+ nr_pages = DIV_ROUND_UP(rest_len, PAGE_CACHE_SIZE);
+ cur_len = rest_len;
+ tailsz = rest_len - (nr_pages - 1) * PAGE_CACHE_SIZE;
+ }
- for (i = 0; i < wdata->nr_pages; i++) {
- unlock_page(wdata->pages[i]);
- if (rc != 0) {
- SetPageError(wdata->pages[i]);
- end_page_writeback(wdata->pages[i]);
- page_cache_release(wdata->pages[i]);
+ wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete);
+ if (!wdata2) {
+ rc = -ENOMEM;
+ break;
}
- }
- mapping_set_error(inode->i_mapping, rc);
+ for (j = 0; j < nr_pages; j++) {
+ wdata2->pages[j] = wdata->pages[i + j];
+ lock_page(wdata2->pages[j]);
+ clear_page_dirty_for_io(wdata2->pages[j]);
+ }
+
+ wdata2->sync_mode = wdata->sync_mode;
+ wdata2->nr_pages = nr_pages;
+ wdata2->offset = page_offset(wdata2->pages[0]);
+ wdata2->pagesz = PAGE_CACHE_SIZE;
+ wdata2->tailsz = tailsz;
+ wdata2->bytes = cur_len;
+
+ wdata2->cfile = find_writable_file(CIFS_I(inode), false);
+ if (!wdata2->cfile) {
+ cifs_dbg(VFS, "No writable handles for inode\n");
+ rc = -EBADF;
+ break;
+ }
+ wdata2->pid = wdata2->cfile->pid;
+ rc = server->ops->async_writev(wdata2, cifs_writedata_release);
+
+ for (j = 0; j < nr_pages; j++) {
+ unlock_page(wdata2->pages[j]);
+ if (rc != 0 && rc != -EAGAIN) {
+ SetPageError(wdata2->pages[j]);
+ end_page_writeback(wdata2->pages[j]);
+ page_cache_release(wdata2->pages[j]);
+ }
+ }
+
+ if (rc) {
+ kref_put(&wdata2->refcount, cifs_writedata_release);
+ if (rc == -EAGAIN)
+ continue;
+ mapping_set_error(inode->i_mapping, rc);
+ break;
+ }
+
+ rest_len -= cur_len;
+ i += nr_pages;
+ } while (i < wdata->nr_pages);
+
kref_put(&wdata->refcount, cifs_writedata_release);
}
diff --git a/fs/cifs/smb1ops.c b/fs/cifs/smb1ops.c
index d1fdfa8..8a96342 100644
--- a/fs/cifs/smb1ops.c
+++ b/fs/cifs/smb1ops.c
@@ -1009,6 +1009,12 @@ cifs_is_read_op(__u32 oplock)
return oplock == OPLOCK_READ;
}
+static unsigned int
+cifs_wp_retry_size(struct inode *inode)
+{
+ return CIFS_SB(inode->i_sb)->wsize;
+}
+
struct smb_version_operations smb1_operations = {
.send_cancel = send_nt_cancel,
.compare_fids = cifs_compare_fids,
@@ -1078,6 +1084,7 @@ struct smb_version_operations smb1_operations = {
.query_mf_symlink = cifs_query_mf_symlink,
.create_mf_symlink = cifs_create_mf_symlink,
.is_read_op = cifs_is_read_op,
+ .wp_retry_size = cifs_wp_retry_size,
#ifdef CONFIG_CIFS_XATTR
.query_all_EAs = CIFSSMBQAllEAs,
.set_EA = CIFSSMBSetEA,
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 787844b..e35ce5b 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -1104,6 +1104,13 @@ smb3_parse_lease_buf(void *buf, unsigned int *epoch)
return le32_to_cpu(lc->lcontext.LeaseState);
}
+static unsigned int
+smb2_wp_retry_size(struct inode *inode)
+{
+ return min_t(unsigned int, CIFS_SB(inode->i_sb)->wsize,
+ SMB2_MAX_BUFFER_SIZE);
+}
+
struct smb_version_operations smb20_operations = {
.compare_fids = smb2_compare_fids,
.setup_request = smb2_setup_request,
@@ -1177,6 +1184,7 @@ struct smb_version_operations smb20_operations = {
.create_lease_buf = smb2_create_lease_buf,
.parse_lease_buf = smb2_parse_lease_buf,
.clone_range = smb2_clone_range,
+ .wp_retry_size = smb2_wp_retry_size,
};
struct smb_version_operations smb21_operations = {
@@ -1252,6 +1260,7 @@ struct smb_version_operations smb21_operations = {
.create_lease_buf = smb2_create_lease_buf,
.parse_lease_buf = smb2_parse_lease_buf,
.clone_range = smb2_clone_range,
+ .wp_retry_size = smb2_wp_retry_size,
};
struct smb_version_operations smb30_operations = {
@@ -1330,6 +1339,7 @@ struct smb_version_operations smb30_operations = {
.parse_lease_buf = smb3_parse_lease_buf,
.clone_range = smb2_clone_range,
.validate_negotiate = smb3_validate_negotiate,
+ .wp_retry_size = smb2_wp_retry_size,
};
struct smb_version_values smb20_values = {
--
1.8.1.2
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Pavel Shilovsky
2014-07-21 06:43:18 UTC
Permalink
Post by Shirish Pargaonkar
why bother setting rc (to ENOTSUPP, ENOMEM) if those errors are not returned
once we break out of do while loop?
Instead perhaps log an error message?
Good point, thanks. It seems like we should call
mapping_set_error(mapping, rc) in the end of the function. In this
case setting rc looks ok. Also, I think we don't need to log messages
in this case.
--
Best regards,
Pavel Shilovsky.
Pavel Shilovsky
2014-06-27 09:57:39 UTC
Permalink
Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/file.c | 152 +++++++++++++++++++++++++++++++--------------------------
1 file changed, 82 insertions(+), 70 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 6b6df30..69d1763 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1878,6 +1878,86 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
return rc;
}

+static unsigned int
+wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
+ struct address_space *mapping,
+ struct writeback_control *wbc,
+ pgoff_t end, pgoff_t *index, pgoff_t *next, bool *done)
+{
+ unsigned int nr_pages = 0, i;
+ struct page *page;
+
+ for (i = 0; i < found_pages; i++) {
+ page = wdata->pages[i];
+ /*
+ * At this point we hold neither mapping->tree_lock nor
+ * lock on the page itself: the page may be truncated or
+ * invalidated (changing page->mapping to NULL), or even
+ * swizzled back from swapper_space to tmpfs file
+ * mapping
+ */
+
+ if (nr_pages == 0)
+ lock_page(page);
+ else if (!trylock_page(page))
+ break;
+
+ if (unlikely(page->mapping != mapping)) {
+ unlock_page(page);
+ break;
+ }
+
+ if (!wbc->range_cyclic && page->index > end) {
+ *done = true;
+ unlock_page(page);
+ break;
+ }
+
+ if (*next && (page->index != *next)) {
+ /* Not next consecutive page */
+ unlock_page(page);
+ break;
+ }
+
+ if (wbc->sync_mode != WB_SYNC_NONE)
+ wait_on_page_writeback(page);
+
+ if (PageWriteback(page) ||
+ !clear_page_dirty_for_io(page)) {
+ unlock_page(page);
+ break;
+ }
+
+ /*
+ * This actually clears the dirty bit in the radix tree.
+ * See cifs_writepage() for more commentary.
+ */
+ set_page_writeback(page);
+ if (page_offset(page) >= i_size_read(mapping->host)) {
+ *done = true;
+ unlock_page(page);
+ end_page_writeback(page);
+ break;
+ }
+
+ wdata->pages[i] = page;
+ *next = page->index + 1;
+ ++nr_pages;
+ }
+
+ /* reset index to refind any pages skipped */
+ if (nr_pages == 0)
+ *index = wdata->pages[0]->index + 1;
+
+ /* put any pages we aren't going to use */
+ for (i = nr_pages; i < found_pages; i++) {
+ page_cache_release(wdata->pages[i]);
+ wdata->pages[i] = NULL;
+ }
+
+ return nr_pages;
+}
+
static int cifs_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
@@ -1886,7 +1966,6 @@ static int cifs_writepages(struct address_space *mapping,
pgoff_t end, index;
struct cifs_writedata *wdata;
struct TCP_Server_Info *server;
- struct page *page;
int rc = 0;

/*
@@ -1944,75 +2023,8 @@ retry:
break;
}

- nr_pages = 0;
- for (i = 0; i < found_pages; i++) {
- page = wdata->pages[i];
- /*
- * At this point we hold neither mapping->tree_lock nor
- * lock on the page itself: the page may be truncated or
- * invalidated (changing page->mapping to NULL), or even
- * swizzled back from swapper_space to tmpfs file
- * mapping
- */
-
- if (nr_pages == 0)
- lock_page(page);
- else if (!trylock_page(page))
- break;
-
- if (unlikely(page->mapping != mapping)) {
- unlock_page(page);
- break;
- }
-
- if (!wbc->range_cyclic && page->index > end) {
- done = true;
- unlock_page(page);
- break;
- }
-
- if (next && (page->index != next)) {
- /* Not next consecutive page */
- unlock_page(page);
- break;
- }
-
- if (wbc->sync_mode != WB_SYNC_NONE)
- wait_on_page_writeback(page);
-
- if (PageWriteback(page) ||
- !clear_page_dirty_for_io(page)) {
- unlock_page(page);
- break;
- }
-
- /*
- * This actually clears the dirty bit in the radix tree.
- * See cifs_writepage() for more commentary.
- */
- set_page_writeback(page);
-
- if (page_offset(page) >= i_size_read(mapping->host)) {
- done = true;
- unlock_page(page);
- end_page_writeback(page);
- break;
- }
-
- wdata->pages[i] = page;
- next = page->index + 1;
- ++nr_pages;
- }
-
- /* reset index to refind any pages skipped */
- if (nr_pages == 0)
- index = wdata->pages[0]->index + 1;
-
- /* put any pages we aren't going to use */
- for (i = nr_pages; i < found_pages; i++) {
- page_cache_release(wdata->pages[i]);
- wdata->pages[i] = NULL;
- }
+ nr_pages = wdata_prepare_pages(wdata, found_pages, mapping, wbc,
+ end, &index, &next, &done);

/* nothing to write? */
if (nr_pages == 0) {
--
1.8.1.2
Jeff Layton
2014-07-09 12:59:16 UTC
Permalink
On Fri, 27 Jun 2014 13:57:39 +0400
Post by Pavel Shilovsky
---
fs/cifs/file.c | 152 +++++++++++++++++++++++++++++++--------------------------
1 file changed, 82 insertions(+), 70 deletions(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 6b6df30..69d1763 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1878,6 +1878,86 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to)
return rc;
}
+static unsigned int
+wdata_prepare_pages(struct cifs_writedata *wdata, unsigned int found_pages,
+ struct address_space *mapping,
+ struct writeback_control *wbc,
+ pgoff_t end, pgoff_t *index, pgoff_t *next, bool *done)
+{
+ unsigned int nr_pages = 0, i;
+ struct page *page;
+
+ for (i = 0; i < found_pages; i++) {
+ page = wdata->pages[i];
+ /*
+ * At this point we hold neither mapping->tree_lock nor
+ * lock on the page itself: the page may be truncated or
+ * invalidated (changing page->mapping to NULL), or even
+ * swizzled back from swapper_space to tmpfs file
+ * mapping
+ */
+
+ if (nr_pages == 0)
+ lock_page(page);
+ else if (!trylock_page(page))
+ break;
+
+ if (unlikely(page->mapping != mapping)) {
+ unlock_page(page);
+ break;
+ }
+
+ if (!wbc->range_cyclic && page->index > end) {
+ *done = true;
+ unlock_page(page);
+ break;
+ }
+
+ if (*next && (page->index != *next)) {
+ /* Not next consecutive page */
+ unlock_page(page);
+ break;
+ }
+
+ if (wbc->sync_mode != WB_SYNC_NONE)
+ wait_on_page_writeback(page);
+
+ if (PageWriteback(page) ||
+ !clear_page_dirty_for_io(page)) {
+ unlock_page(page);
+ break;
+ }
+
+ /*
+ * This actually clears the dirty bit in the radix tree.
+ * See cifs_writepage() for more commentary.
+ */
+ set_page_writeback(page);
+ if (page_offset(page) >= i_size_read(mapping->host)) {
+ *done = true;
+ unlock_page(page);
+ end_page_writeback(page);
+ break;
+ }
+
+ wdata->pages[i] = page;
+ *next = page->index + 1;
+ ++nr_pages;
+ }
+
+ /* reset index to refind any pages skipped */
+ if (nr_pages == 0)
+ *index = wdata->pages[0]->index + 1;
+
+ /* put any pages we aren't going to use */
+ for (i = nr_pages; i < found_pages; i++) {
+ page_cache_release(wdata->pages[i]);
+ wdata->pages[i] = NULL;
+ }
+
+ return nr_pages;
+}
+
static int cifs_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
@@ -1886,7 +1966,6 @@ static int cifs_writepages(struct address_space *mapping,
pgoff_t end, index;
struct cifs_writedata *wdata;
struct TCP_Server_Info *server;
- struct page *page;
int rc = 0;
/*
break;
}
- nr_pages = 0;
- for (i = 0; i < found_pages; i++) {
- page = wdata->pages[i];
- /*
- * At this point we hold neither mapping->tree_lock nor
- * lock on the page itself: the page may be truncated or
- * invalidated (changing page->mapping to NULL), or even
- * swizzled back from swapper_space to tmpfs file
- * mapping
- */
-
- if (nr_pages == 0)
- lock_page(page);
- else if (!trylock_page(page))
- break;
-
- if (unlikely(page->mapping != mapping)) {
- unlock_page(page);
- break;
- }
-
- if (!wbc->range_cyclic && page->index > end) {
- done = true;
- unlock_page(page);
- break;
- }
-
- if (next && (page->index != next)) {
- /* Not next consecutive page */
- unlock_page(page);
- break;
- }
-
- if (wbc->sync_mode != WB_SYNC_NONE)
- wait_on_page_writeback(page);
-
- if (PageWriteback(page) ||
- !clear_page_dirty_for_io(page)) {
- unlock_page(page);
- break;
- }
-
- /*
- * This actually clears the dirty bit in the radix tree.
- * See cifs_writepage() for more commentary.
- */
- set_page_writeback(page);
-
- if (page_offset(page) >= i_size_read(mapping->host)) {
- done = true;
- unlock_page(page);
- end_page_writeback(page);
- break;
- }
-
- wdata->pages[i] = page;
- next = page->index + 1;
- ++nr_pages;
- }
-
- /* reset index to refind any pages skipped */
- if (nr_pages == 0)
- index = wdata->pages[0]->index + 1;
-
- /* put any pages we aren't going to use */
- for (i = nr_pages; i < found_pages; i++) {
- page_cache_release(wdata->pages[i]);
- wdata->pages[i] = NULL;
- }
+ nr_pages = wdata_prepare_pages(wdata, found_pages, mapping, wbc,
+ end, &index, &next, &done);
/* nothing to write? */
if (nr_pages == 0) {
Reviewed-by:
--
Jeff Layton <jlayton-***@public.gmane.org>
Pavel Shilovsky
2014-06-27 09:57:42 UTC
Permalink
If a server change maximum buffer size for write (wsize) requests
on reconnect we can fail on repeating with a big size buffer on
-EAGAIN error in writepages. Fix this by checking wsize all the
time before repeating request in writepages.

Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/file.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 4117da0..21f9be0 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2009,19 +2009,17 @@ wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages,
(loff_t)PAGE_CACHE_SIZE);
wdata->bytes = ((nr_pages - 1) * PAGE_CACHE_SIZE) + wdata->tailsz;

- do {
- if (wdata->cfile != NULL)
- cifsFileInfo_put(wdata->cfile);
- wdata->cfile = find_writable_file(CIFS_I(mapping->host), false);
- if (!wdata->cfile) {
- cifs_dbg(VFS, "No writable handles for inode\n");
- rc = -EBADF;
- break;
- }
+ if (wdata->cfile != NULL)
+ cifsFileInfo_put(wdata->cfile);
+ wdata->cfile = find_writable_file(CIFS_I(mapping->host), false);
+ if (!wdata->cfile) {
+ cifs_dbg(VFS, "No writable handles for inode\n");
+ rc = -EBADF;
+ } else {
wdata->pid = wdata->cfile->pid;
server = tlink_tcon(wdata->cfile->tlink)->ses->server;
rc = server->ops->async_writev(wdata, cifs_writedata_release);
- } while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
+ }

for (i = 0; i < nr_pages; ++i)
unlock_page(wdata->pages[i]);
@@ -2073,7 +2071,7 @@ static int cifs_writepages(struct address_space *mapping,
retry:
while (!done && index <= end) {
unsigned int nr_pages, found_pages;
- pgoff_t next = 0, tofind;
+ pgoff_t next = 0, tofind, saved_index = index;

tofind = min((cifs_sb->wsize / PAGE_CACHE_SIZE) - 1,
end - index) + 1;
@@ -2102,6 +2100,11 @@ retry:
rc = wdata_send_pages(wdata, nr_pages, mapping, wbc);
kref_put(&wdata->refcount, cifs_writedata_release);

+ if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN) {
+ index = saved_index;
+ continue;
+ }
+
wbc->nr_to_write -= nr_pages;
if (wbc->nr_to_write <= 0)
done = true;
--
1.8.1.2
Shirish Pargaonkar
2014-07-20 15:17:27 UTC
Permalink
How do we handle EBADF returned by wdata_send_pages() in cifs_writepages()?
Post by Pavel Shilovsky
If a server change maximum buffer size for write (wsize) requests
on reconnect we can fail on repeating with a big size buffer on
-EAGAIN error in writepages. Fix this by checking wsize all the
time before repeating request in writepages.
---
fs/cifs/file.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 4117da0..21f9be0 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2009,19 +2009,17 @@ wdata_send_pages(struct cifs_writedata *wdata, unsigned int nr_pages,
(loff_t)PAGE_CACHE_SIZE);
wdata->bytes = ((nr_pages - 1) * PAGE_CACHE_SIZE) + wdata->tailsz;
- do {
- if (wdata->cfile != NULL)
- cifsFileInfo_put(wdata->cfile);
- wdata->cfile = find_writable_file(CIFS_I(mapping->host), false);
- if (!wdata->cfile) {
- cifs_dbg(VFS, "No writable handles for inode\n");
- rc = -EBADF;
- break;
- }
+ if (wdata->cfile != NULL)
+ cifsFileInfo_put(wdata->cfile);
+ wdata->cfile = find_writable_file(CIFS_I(mapping->host), false);
+ if (!wdata->cfile) {
+ cifs_dbg(VFS, "No writable handles for inode\n");
+ rc = -EBADF;
+ } else {
wdata->pid = wdata->cfile->pid;
server = tlink_tcon(wdata->cfile->tlink)->ses->server;
rc = server->ops->async_writev(wdata, cifs_writedata_release);
- } while (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN);
+ }
for (i = 0; i < nr_pages; ++i)
unlock_page(wdata->pages[i]);
@@ -2073,7 +2071,7 @@ static int cifs_writepages(struct address_space *mapping,
while (!done && index <= end) {
unsigned int nr_pages, found_pages;
- pgoff_t next = 0, tofind;
+ pgoff_t next = 0, tofind, saved_index = index;
tofind = min((cifs_sb->wsize / PAGE_CACHE_SIZE) - 1,
end - index) + 1;
rc = wdata_send_pages(wdata, nr_pages, mapping, wbc);
kref_put(&wdata->refcount, cifs_writedata_release);
+ if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN) {
+ index = saved_index;
+ continue;
+ }
+
wbc->nr_to_write -= nr_pages;
if (wbc->nr_to_write <= 0)
done = true;
--
1.8.1.2
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Pavel Shilovsky
2014-07-20 17:41:23 UTC
Permalink
Post by Shirish Pargaonkar
How do we handle EBADF returned by wdata_send_pages() in cifs_writepages()?
If wdata_send_pages() returns -EBADF we do the same things like
without this patch: walk through the array of pages in wdata and set
page error flags.
--
Best regards,
Pavel Shilovsky.
Shirish Pargaonkar
2014-07-20 18:13:40 UTC
Permalink
Pavel,

That is correct, this patch does not change that functionality. I was
thinking do we
go through the loop again to attempt to acquire handle (only to
perhaps receive EBADF)
or bail out of cifs_writepages() by marking rest of the pages (till
end) with error flags.
Post by Pavel Shilovsky
Post by Shirish Pargaonkar
How do we handle EBADF returned by wdata_send_pages() in cifs_writepages()?
If wdata_send_pages() returns -EBADF we do the same things like
without this patch: walk through the array of pages in wdata and set
page error flags.
--
Best regards,
Pavel Shilovsky.
Pavel Shilovsky
2014-07-21 07:03:59 UTC
Permalink
Post by Jeff Layton
Pavel,
That is correct, this patch does not change that functionality. I was
thinking do we
go through the loop again to attempt to acquire handle (only to
perhaps receive EBADF)
or bail out of cifs_writepages() by marking rest of the pages (till
end) with error flags.
Understand. Probably we don't need to go through the loop since
find_writable_file() has it's own loop inside (it goes to the start of
the search if cifs_reopen_file() fails). So, the existing logic seems
right.
--
Best regards,
Pavel Shilovsky.
Pavel Shilovsky
2014-06-27 09:57:44 UTC
Permalink
Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/file.c | 85 +++++++++++++++++++++++++++++++++-------------------------
1 file changed, 48 insertions(+), 37 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 21f9be0..e2a735a 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2423,11 +2423,56 @@ cifs_uncached_retry_writev(struct cifs_writedata *wdata)
return rc;
}

+static int
+wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from,
+ size_t *len, unsigned long nr_pages)
+{
+ int rc = 0;
+ size_t save_len, copied, bytes, cur_len = *len;
+ unsigned long i;
+
+ save_len = cur_len;
+ for (i = 0; i < nr_pages; i++) {
+ bytes = min_t(const size_t, cur_len, PAGE_SIZE);
+ copied = copy_page_from_iter(wdata->pages[i], 0, bytes, from);
+ cur_len -= copied;
+ /*
+ * If we didn't copy as much as we expected, then that
+ * may mean we trod into an unmapped area. Stop copying
+ * at that point. On the next pass through the big
+ * loop, we'll likely end up getting a zero-length
+ * write and bailing out of it.
+ */
+ if (copied < bytes)
+ break;
+ }
+ cur_len = save_len - cur_len;
+ *len = cur_len;
+
+ /*
+ * If we have no data to send, then that probably means that
+ * the copy above failed altogether. That's most likely because
+ * the address in the iovec was bogus. Return -EFAULT and let
+ * the caller free anything we allocated and bail out.
+ */
+ if (!cur_len)
+ return -EFAULT;
+
+ /*
+ * i + 1 now represents the number of pages we actually used in
+ * the copy phase above. Bring nr_pages down to that, and free
+ * any pages that we didn't use.
+ */
+ for ( ; nr_pages > i + 1; nr_pages--)
+ put_page(wdata->pages[nr_pages - 1]);
+ return rc;
+}
+
static ssize_t
cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
{
unsigned long nr_pages, i;
- size_t bytes, copied, len, cur_len;
+ size_t len, cur_len;
ssize_t total_written = 0;
loff_t offset;
struct cifsFileInfo *open_file;
@@ -2464,8 +2509,6 @@ cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
pid = current->tgid;

do {
- size_t save_len;
-
nr_pages = get_numpages(cifs_sb->wsize, len, &cur_len);
wdata = cifs_writedata_alloc(nr_pages,
cifs_uncached_writev_complete);
@@ -2480,46 +2523,14 @@ cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
break;
}

- save_len = cur_len;
- for (i = 0; i < nr_pages; i++) {
- bytes = min_t(size_t, cur_len, PAGE_SIZE);
- copied = copy_page_from_iter(wdata->pages[i], 0, bytes,
- from);
- cur_len -= copied;
- /*
- * If we didn't copy as much as we expected, then that
- * may mean we trod into an unmapped area. Stop copying
- * at that point. On the next pass through the big
- * loop, we'll likely end up getting a zero-length
- * write and bailing out of it.
- */
- if (copied < bytes)
- break;
- }
- cur_len = save_len - cur_len;
-
- /*
- * If we have no data to send, then that probably means that
- * the copy above failed altogether. That's most likely because
- * the address in the iovec was bogus. Set the rc to -EFAULT,
- * free anything we allocated and bail out.
- */
- if (!cur_len) {
+ rc = wdata_fill_from_iovec(wdata, from, &cur_len, nr_pages);
+ if (rc) {
for (i = 0; i < nr_pages; i++)
put_page(wdata->pages[i]);
kfree(wdata);
- rc = -EFAULT;
break;
}

- /*
- * i + 1 now represents the number of pages we actually used in
- * the copy phase above. Bring nr_pages down to that, and free
- * any pages that we didn't use.
- */
- for ( ; nr_pages > i + 1; nr_pages--)
- put_page(wdata->pages[nr_pages - 1]);
-
wdata->sync_mode = WB_SYNC_ALL;
wdata->nr_pages = nr_pages;
wdata->offset = (__u64)offset;
--
1.8.1.2
Shirish Pargaonkar
2014-07-21 03:59:35 UTC
Permalink
Like put_page() for rc = 0 unused pages was moved to wdata_fill_from_iovec(),
probably move put_page in case of -EFAULT also to wdata_fill_from_iovec()?
So just kfree and break if(rc)?
Also, wdata_fill_from_iovec() does not need rc, it can just return 0
in case of no error?
Post by Pavel Shilovsky
---
fs/cifs/file.c | 85 +++++++++++++++++++++++++++++++++-------------------------
1 file changed, 48 insertions(+), 37 deletions(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 21f9be0..e2a735a 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2423,11 +2423,56 @@ cifs_uncached_retry_writev(struct cifs_writedata *wdata)
return rc;
}
+static int
+wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from,
+ size_t *len, unsigned long nr_pages)
+{
+ int rc = 0;
+ size_t save_len, copied, bytes, cur_len = *len;
+ unsigned long i;
+
+ save_len = cur_len;
+ for (i = 0; i < nr_pages; i++) {
+ bytes = min_t(const size_t, cur_len, PAGE_SIZE);
+ copied = copy_page_from_iter(wdata->pages[i], 0, bytes, from);
+ cur_len -= copied;
+ /*
+ * If we didn't copy as much as we expected, then that
+ * may mean we trod into an unmapped area. Stop copying
+ * at that point. On the next pass through the big
+ * loop, we'll likely end up getting a zero-length
+ * write and bailing out of it.
+ */
+ if (copied < bytes)
+ break;
+ }
+ cur_len = save_len - cur_len;
+ *len = cur_len;
+
+ /*
+ * If we have no data to send, then that probably means that
+ * the copy above failed altogether. That's most likely because
+ * the address in the iovec was bogus. Return -EFAULT and let
+ * the caller free anything we allocated and bail out.
+ */
+ if (!cur_len)
+ return -EFAULT;
+
+ /*
+ * i + 1 now represents the number of pages we actually used in
+ * the copy phase above. Bring nr_pages down to that, and free
+ * any pages that we didn't use.
+ */
+ for ( ; nr_pages > i + 1; nr_pages--)
+ put_page(wdata->pages[nr_pages - 1]);
+ return rc;
+}
+
static ssize_t
cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
{
unsigned long nr_pages, i;
- size_t bytes, copied, len, cur_len;
+ size_t len, cur_len;
ssize_t total_written = 0;
loff_t offset;
struct cifsFileInfo *open_file;
@@ -2464,8 +2509,6 @@ cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
pid = current->tgid;
do {
- size_t save_len;
-
nr_pages = get_numpages(cifs_sb->wsize, len, &cur_len);
wdata = cifs_writedata_alloc(nr_pages,
cifs_uncached_writev_complete);
@@ -2480,46 +2523,14 @@ cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
break;
}
- save_len = cur_len;
- for (i = 0; i < nr_pages; i++) {
- bytes = min_t(size_t, cur_len, PAGE_SIZE);
- copied = copy_page_from_iter(wdata->pages[i], 0, bytes,
- from);
- cur_len -= copied;
- /*
- * If we didn't copy as much as we expected, then that
- * may mean we trod into an unmapped area. Stop copying
- * at that point. On the next pass through the big
- * loop, we'll likely end up getting a zero-length
- * write and bailing out of it.
- */
- if (copied < bytes)
- break;
- }
- cur_len = save_len - cur_len;
-
- /*
- * If we have no data to send, then that probably means that
- * the copy above failed altogether. That's most likely because
- * the address in the iovec was bogus. Set the rc to -EFAULT,
- * free anything we allocated and bail out.
- */
- if (!cur_len) {
+ rc = wdata_fill_from_iovec(wdata, from, &cur_len, nr_pages);
+ if (rc) {
for (i = 0; i < nr_pages; i++)
put_page(wdata->pages[i]);
kfree(wdata);
- rc = -EFAULT;
break;
}
- /*
- * i + 1 now represents the number of pages we actually used in
- * the copy phase above. Bring nr_pages down to that, and free
- * any pages that we didn't use.
- */
- for ( ; nr_pages > i + 1; nr_pages--)
- put_page(wdata->pages[nr_pages - 1]);
-
wdata->sync_mode = WB_SYNC_ALL;
wdata->nr_pages = nr_pages;
wdata->offset = (__u64)offset;
--
1.8.1.2
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Pavel Shilovsky
2014-07-21 06:48:25 UTC
Permalink
Post by Shirish Pargaonkar
Like put_page() for rc = 0 unused pages was moved to wdata_fill_from_iovec(),
probably move put_page in case of -EFAULT also to wdata_fill_from_iovec()?
So just kfree and break if(rc)?
Also, wdata_fill_from_iovec() does not need rc, it can just return 0
in case of no error?
Yes, or may be move the last loop from wdata_fill_from_iovec() to the caller.
Post by Shirish Pargaonkar
Post by Pavel Shilovsky
---
fs/cifs/file.c | 85 +++++++++++++++++++++++++++++++++-------------------------
1 file changed, 48 insertions(+), 37 deletions(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 21f9be0..e2a735a 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2423,11 +2423,56 @@ cifs_uncached_retry_writev(struct cifs_writedata *wdata)
return rc;
}
+static int
+wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from,
+ size_t *len, unsigned long nr_pages)
+{
+ int rc = 0;
+ size_t save_len, copied, bytes, cur_len = *len;
+ unsigned long i;
+
+ save_len = cur_len;
+ for (i = 0; i < nr_pages; i++) {
+ bytes = min_t(const size_t, cur_len, PAGE_SIZE);
+ copied = copy_page_from_iter(wdata->pages[i], 0, bytes, from);
+ cur_len -= copied;
+ /*
+ * If we didn't copy as much as we expected, then that
+ * may mean we trod into an unmapped area. Stop copying
+ * at that point. On the next pass through the big
+ * loop, we'll likely end up getting a zero-length
+ * write and bailing out of it.
+ */
+ if (copied < bytes)
+ break;
+ }
+ cur_len = save_len - cur_len;
+ *len = cur_len;
+
+ /*
+ * If we have no data to send, then that probably means that
+ * the copy above failed altogether. That's most likely because
+ * the address in the iovec was bogus. Return -EFAULT and let
+ * the caller free anything we allocated and bail out.
+ */
+ if (!cur_len)
+ return -EFAULT;
+
+ /*
+ * i + 1 now represents the number of pages we actually used in
+ * the copy phase above. Bring nr_pages down to that, and free
+ * any pages that we didn't use.
+ */
+ for ( ; nr_pages > i + 1; nr_pages--)
+ put_page(wdata->pages[nr_pages - 1]);
+ return rc;
+}
Ah, nr_pages is updated here but we don't return it to the caller!
That should be fixed at first.
Post by Shirish Pargaonkar
Post by Pavel Shilovsky
+
static ssize_t
cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
{
unsigned long nr_pages, i;
- size_t bytes, copied, len, cur_len;
+ size_t len, cur_len;
ssize_t total_written = 0;
loff_t offset;
struct cifsFileInfo *open_file;
@@ -2464,8 +2509,6 @@ cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
pid = current->tgid;
do {
- size_t save_len;
-
nr_pages = get_numpages(cifs_sb->wsize, len, &cur_len);
wdata = cifs_writedata_alloc(nr_pages,
cifs_uncached_writev_complete);
@@ -2480,46 +2523,14 @@ cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
break;
}
- save_len = cur_len;
- for (i = 0; i < nr_pages; i++) {
- bytes = min_t(size_t, cur_len, PAGE_SIZE);
- copied = copy_page_from_iter(wdata->pages[i], 0, bytes,
- from);
- cur_len -= copied;
- /*
- * If we didn't copy as much as we expected, then that
- * may mean we trod into an unmapped area. Stop copying
- * at that point. On the next pass through the big
- * loop, we'll likely end up getting a zero-length
- * write and bailing out of it.
- */
- if (copied < bytes)
- break;
- }
- cur_len = save_len - cur_len;
-
- /*
- * If we have no data to send, then that probably means that
- * the copy above failed altogether. That's most likely because
- * the address in the iovec was bogus. Set the rc to -EFAULT,
- * free anything we allocated and bail out.
- */
- if (!cur_len) {
+ rc = wdata_fill_from_iovec(wdata, from, &cur_len, nr_pages);
+ if (rc) {
for (i = 0; i < nr_pages; i++)
put_page(wdata->pages[i]);
kfree(wdata);
- rc = -EFAULT;
break;
}
- /*
- * i + 1 now represents the number of pages we actually used in
- * the copy phase above. Bring nr_pages down to that, and free
- * any pages that we didn't use.
- */
- for ( ; nr_pages > i + 1; nr_pages--)
- put_page(wdata->pages[nr_pages - 1]);
-
wdata->sync_mode = WB_SYNC_ALL;
wdata->nr_pages = nr_pages;
wdata->offset = (__u64)offset;
--
1.8.1.2
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Best regards,
Pavel Shilovsky.
Pavel Shilovsky
2014-06-27 09:57:45 UTC
Permalink
Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/file.c | 79 ++++++++++++++++++++++++++++++++++------------------------
1 file changed, 47 insertions(+), 32 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index e2a735a..06080e0 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2468,41 +2468,17 @@ wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from,
return rc;
}

-static ssize_t
-cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
+static int
+cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
+ struct cifsFileInfo *open_file,
+ struct cifs_sb_info *cifs_sb, struct list_head *wdata_list)
{
+ int rc = 0;
+ size_t cur_len;
unsigned long nr_pages, i;
- size_t len, cur_len;
- ssize_t total_written = 0;
- loff_t offset;
- struct cifsFileInfo *open_file;
- struct cifs_tcon *tcon;
- struct cifs_sb_info *cifs_sb;
- struct cifs_writedata *wdata, *tmp;
- struct list_head wdata_list;
- int rc;
+ struct cifs_writedata *wdata;
pid_t pid;

- len = iov_iter_count(from);
- rc = generic_write_checks(file, poffset, &len, 0);
- if (rc)
- return rc;
-
- if (!len)
- return 0;
-
- iov_iter_truncate(from, len);
-
- INIT_LIST_HEAD(&wdata_list);
- cifs_sb = CIFS_SB(file->f_path.dentry->d_sb);
- open_file = file->private_data;
- tcon = tlink_tcon(open_file->tlink);
-
- if (!tcon->ses->server->ops->async_writev)
- return -ENOSYS;
-
- offset = *poffset;
-
if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
pid = open_file->pid;
else
@@ -2546,11 +2522,50 @@ cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
break;
}

- list_add_tail(&wdata->list, &wdata_list);
+ list_add_tail(&wdata->list, wdata_list);
offset += cur_len;
len -= cur_len;
} while (len > 0);

+ return rc;
+}
+
+static ssize_t
+cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
+{
+ size_t len;
+ ssize_t total_written = 0;
+ loff_t offset;
+ struct cifsFileInfo *open_file;
+ struct cifs_tcon *tcon;
+ struct cifs_sb_info *cifs_sb;
+ struct cifs_writedata *wdata, *tmp;
+ struct list_head wdata_list;
+ int rc;
+
+ len = iov_iter_count(from);
+ rc = generic_write_checks(file, poffset, &len, 0);
+ if (rc)
+ return rc;
+
+ if (!len)
+ return 0;
+
+ iov_iter_truncate(from, len);
+
+ INIT_LIST_HEAD(&wdata_list);
+ cifs_sb = CIFS_SB(file->f_path.dentry->d_sb);
+ open_file = file->private_data;
+ tcon = tlink_tcon(open_file->tlink);
+
+ if (!tcon->ses->server->ops->async_writev)
+ return -ENOSYS;
+
+ offset = *poffset;
+
+ rc = cifs_write_from_iter(*poffset, len, from, open_file, cifs_sb,
+ &wdata_list);
+
/*
* If at least one write was successfully sent, then discard any rc
* value from the later writes. If the other write succeeds, then
--
1.8.1.2
Shirish Pargaonkar
2014-07-21 05:05:06 UTC
Permalink
Post by Pavel Shilovsky
---
fs/cifs/file.c | 79 ++++++++++++++++++++++++++++++++++------------------------
1 file changed, 47 insertions(+), 32 deletions(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index e2a735a..06080e0 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2468,41 +2468,17 @@ wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from,
return rc;
}
-static ssize_t
-cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
+static int
+cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
+ struct cifsFileInfo *open_file,
+ struct cifs_sb_info *cifs_sb, struct list_head *wdata_list)
{
+ int rc = 0;
+ size_t cur_len;
unsigned long nr_pages, i;
- size_t len, cur_len;
- ssize_t total_written = 0;
- loff_t offset;
- struct cifsFileInfo *open_file;
- struct cifs_tcon *tcon;
- struct cifs_sb_info *cifs_sb;
- struct cifs_writedata *wdata, *tmp;
- struct list_head wdata_list;
- int rc;
+ struct cifs_writedata *wdata;
pid_t pid;
- len = iov_iter_count(from);
- rc = generic_write_checks(file, poffset, &len, 0);
- if (rc)
- return rc;
-
- if (!len)
- return 0;
-
- iov_iter_truncate(from, len);
-
- INIT_LIST_HEAD(&wdata_list);
- cifs_sb = CIFS_SB(file->f_path.dentry->d_sb);
- open_file = file->private_data;
- tcon = tlink_tcon(open_file->tlink);
-
- if (!tcon->ses->server->ops->async_writev)
- return -ENOSYS;
-
- offset = *poffset;
-
if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
pid = open_file->pid;
else
@@ -2546,11 +2522,50 @@ cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
break;
}
- list_add_tail(&wdata->list, &wdata_list);
+ list_add_tail(&wdata->list, wdata_list);
offset += cur_len;
len -= cur_len;
} while (len > 0);
+ return rc;
+}
+
+static ssize_t
+cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
+{
+ size_t len;
+ ssize_t total_written = 0;
+ loff_t offset;
+ struct cifsFileInfo *open_file;
+ struct cifs_tcon *tcon;
+ struct cifs_sb_info *cifs_sb;
+ struct cifs_writedata *wdata, *tmp;
+ struct list_head wdata_list;
+ int rc;
+
+ len = iov_iter_count(from);
+ rc = generic_write_checks(file, poffset, &len, 0);
+ if (rc)
+ return rc;
+
+ if (!len)
+ return 0;
+
+ iov_iter_truncate(from, len);
+
+ INIT_LIST_HEAD(&wdata_list);
+ cifs_sb = CIFS_SB(file->f_path.dentry->d_sb);
+ open_file = file->private_data;
+ tcon = tlink_tcon(open_file->tlink);
+
+ if (!tcon->ses->server->ops->async_writev)
+ return -ENOSYS;
+
+ offset = *poffset;
we do not use offset in this function after assignment
Post by Pavel Shilovsky
+
+ rc = cifs_write_from_iter(*poffset, len, from, open_file, cifs_sb,
+ &wdata_list);
and do not act on the value of rc (if ENOMEM) here.
Post by Pavel Shilovsky
+
/*
* If at least one write was successfully sent, then discard any rc
* value from the later writes. If the other write succeeds, then
--
1.8.1.2
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Pavel Shilovsky
2014-07-21 06:56:12 UTC
Permalink
Post by Shirish Pargaonkar
Post by Pavel Shilovsky
---
fs/cifs/file.c | 79 ++++++++++++++++++++++++++++++++++------------------------
1 file changed, 47 insertions(+), 32 deletions(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index e2a735a..06080e0 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2468,41 +2468,17 @@ wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from,
return rc;
}
-static ssize_t
-cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
+static int
+cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
+ struct cifsFileInfo *open_file,
+ struct cifs_sb_info *cifs_sb, struct list_head *wdata_list)
{
+ int rc = 0;
+ size_t cur_len;
unsigned long nr_pages, i;
- size_t len, cur_len;
- ssize_t total_written = 0;
- loff_t offset;
- struct cifsFileInfo *open_file;
- struct cifs_tcon *tcon;
- struct cifs_sb_info *cifs_sb;
- struct cifs_writedata *wdata, *tmp;
- struct list_head wdata_list;
- int rc;
+ struct cifs_writedata *wdata;
pid_t pid;
- len = iov_iter_count(from);
- rc = generic_write_checks(file, poffset, &len, 0);
- if (rc)
- return rc;
-
- if (!len)
- return 0;
-
- iov_iter_truncate(from, len);
-
- INIT_LIST_HEAD(&wdata_list);
- cifs_sb = CIFS_SB(file->f_path.dentry->d_sb);
- open_file = file->private_data;
- tcon = tlink_tcon(open_file->tlink);
-
- if (!tcon->ses->server->ops->async_writev)
- return -ENOSYS;
-
- offset = *poffset;
-
if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
pid = open_file->pid;
else
@@ -2546,11 +2522,50 @@ cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
break;
}
- list_add_tail(&wdata->list, &wdata_list);
+ list_add_tail(&wdata->list, wdata_list);
offset += cur_len;
len -= cur_len;
} while (len > 0);
+ return rc;
+}
+
+static ssize_t
+cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
+{
+ size_t len;
+ ssize_t total_written = 0;
+ loff_t offset;
+ struct cifsFileInfo *open_file;
+ struct cifs_tcon *tcon;
+ struct cifs_sb_info *cifs_sb;
+ struct cifs_writedata *wdata, *tmp;
+ struct list_head wdata_list;
+ int rc;
+
+ len = iov_iter_count(from);
+ rc = generic_write_checks(file, poffset, &len, 0);
+ if (rc)
+ return rc;
+
+ if (!len)
+ return 0;
+
+ iov_iter_truncate(from, len);
+
+ INIT_LIST_HEAD(&wdata_list);
+ cifs_sb = CIFS_SB(file->f_path.dentry->d_sb);
+ open_file = file->private_data;
+ tcon = tlink_tcon(open_file->tlink);
+
+ if (!tcon->ses->server->ops->async_writev)
+ return -ENOSYS;
+
+ offset = *poffset;
we do not use offset in this function after assignment
Yes!
Post by Shirish Pargaonkar
Post by Pavel Shilovsky
+
+ rc = cifs_write_from_iter(*poffset, len, from, open_file, cifs_sb,
+ &wdata_list);
and do not act on the value of rc (if ENOMEM) here.
This patch doesn't change the logic of the function: if at least one
write succeeds, we return a short write regardless of any error from
other writes.
--
Best regards,
Pavel Shilovsky.
Pavel Shilovsky
2014-06-27 09:57:46 UTC
Permalink
If a server change maximum buffer size for write (wsize) requests
on reconnect we can fail on repeating with a big size buffer on
-EAGAIN error in iovec write. Fix this by checking wsize all the
time before repeating request in iovec write.

Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/cifssmb.c | 4 ----
fs/cifs/file.c | 62 ++++++++++++++++++++++++++++++++++---------------------
fs/cifs/smb2pdu.c | 4 ----
3 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 41c9067..e411d2e 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -196,10 +196,6 @@ cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command)
if (rc)
goto out;

- /*
- * FIXME: check if wsize needs updated due to negotiated smb buffer
- * size shrinking
- */
atomic_inc(&tconInfoReconnectCount);

/* tell server Unix caps we support */
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 06080e0..f661977 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2401,28 +2401,6 @@ cifs_uncached_writev_complete(struct work_struct *work)
kref_put(&wdata->refcount, cifs_uncached_writedata_release);
}

-/* attempt to send write to server, retry on any -EAGAIN errors */
-static int
-cifs_uncached_retry_writev(struct cifs_writedata *wdata)
-{
- int rc;
- struct TCP_Server_Info *server;
-
- server = tlink_tcon(wdata->cfile->tlink)->ses->server;
-
- do {
- if (wdata->cfile->invalidHandle) {
- rc = cifs_reopen_file(wdata->cfile, false);
- if (rc != 0)
- continue;
- }
- rc = server->ops->async_writev(wdata,
- cifs_uncached_writedata_release);
- } while (rc == -EAGAIN);
-
- return rc;
-}
-
static int
wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from,
size_t *len, unsigned long nr_pages)
@@ -2477,13 +2455,19 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
size_t cur_len;
unsigned long nr_pages, i;
struct cifs_writedata *wdata;
+ struct iov_iter saved_from;
+ loff_t saved_offset = offset;
pid_t pid;
+ struct TCP_Server_Info *server;

if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
pid = open_file->pid;
else
pid = current->tgid;

+ server = tlink_tcon(open_file->tlink)->ses->server;
+ memcpy(&saved_from, from, sizeof(struct iov_iter));
+
do {
nr_pages = get_numpages(cifs_sb->wsize, len, &cur_len);
wdata = cifs_writedata_alloc(nr_pages,
@@ -2515,10 +2499,20 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
wdata->bytes = cur_len;
wdata->pagesz = PAGE_SIZE;
wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE);
- rc = cifs_uncached_retry_writev(wdata);
+
+ if (!wdata->cfile->invalidHandle ||
+ !cifs_reopen_file(wdata->cfile, false))
+ rc = server->ops->async_writev(wdata,
+ cifs_uncached_writedata_release);
if (rc) {
kref_put(&wdata->refcount,
cifs_uncached_writedata_release);
+ if (rc == -EAGAIN) {
+ memcpy(from, &saved_from,
+ sizeof(struct iov_iter));
+ iov_iter_advance(from, offset - saved_offset);
+ continue;
+ }
break;
}

@@ -2541,6 +2535,7 @@ cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
struct cifs_sb_info *cifs_sb;
struct cifs_writedata *wdata, *tmp;
struct list_head wdata_list;
+ struct iov_iter saved_from;
int rc;

len = iov_iter_count(from);
@@ -2562,6 +2557,7 @@ cifs_iovec_write(struct file *file, struct iov_iter *from, loff_t *poffset)
return -ENOSYS;

offset = *poffset;
+ memcpy(&saved_from, from, sizeof(struct iov_iter));

rc = cifs_write_from_iter(*poffset, len, from, open_file, cifs_sb,
&wdata_list);
@@ -2594,7 +2590,25 @@ restart_loop:

/* resend call if it's a retryable error */
if (rc == -EAGAIN) {
- rc = cifs_uncached_retry_writev(wdata);
+ struct list_head tmp_list;
+ struct iov_iter tmp_from;
+
+ INIT_LIST_HEAD(&tmp_list);
+ list_del_init(&wdata->list);
+
+ memcpy(&tmp_from, &saved_from,
+ sizeof(struct iov_iter));
+ iov_iter_advance(&tmp_from,
+ wdata->offset - *poffset);
+
+ rc = cifs_write_from_iter(wdata->offset,
+ wdata->bytes, &tmp_from,
+ open_file, cifs_sb, &tmp_list);
+
+ list_splice(&tmp_list, &wdata_list);
+
+ kref_put(&wdata->refcount,
+ cifs_uncached_writedata_release);
goto restart_loop;
}
}
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 0158104..da7aa62 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -245,10 +245,6 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon)
if (rc)
goto out;
atomic_inc(&tconInfoReconnectCount);
- /*
- * BB FIXME add code to check if wsize needs update due to negotiated
- * smb buffer size shrinking.
- */
out:
/*
* Check if handle based operation so we know whether we can continue
--
1.8.1.2
Pavel Shilovsky
2014-06-27 09:57:47 UTC
Permalink
If we negotiate SMB 2.1 and higher version of the protocol and
a server supports large write buffer size, we need to consume 1
credit per 65536 bytes. So, we need to know how many credits
we have and obtain the required number of them before constructing
a writedata structure in writepages and iovec write.

Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/cifsglob.h | 15 ++++++++++++++
fs/cifs/cifsproto.h | 3 +++
fs/cifs/file.c | 38 ++++++++++++++++++++++++++++++------
fs/cifs/smb1ops.c | 1 +
fs/cifs/smb2ops.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++--
fs/cifs/smb2pdu.c | 29 +++++++++++++++++++++++----
fs/cifs/smb2transport.c | 5 +++++
fs/cifs/transport.c | 25 +++++++++++++++++-------
8 files changed, 149 insertions(+), 19 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 8c53f20..54ca2b9 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -406,6 +406,9 @@ struct smb_version_operations {
int);
/* writepages retry size */
unsigned int (*wp_retry_size)(struct inode *);
+ /* get mtu credits */
+ int (*wait_mtu_credits)(struct TCP_Server_Info *, unsigned int,
+ unsigned int *, unsigned int *);
};

struct smb_version_values {
@@ -642,6 +645,16 @@ add_credits(struct TCP_Server_Info *server, const unsigned int add,
}

static inline void
+add_credits_and_wake_if(struct TCP_Server_Info *server, const unsigned int add,
+ const int optype)
+{
+ if (add) {
+ server->ops->add_credits(server, add, optype);
+ wake_up(&server->request_q);
+ }
+}
+
+static inline void
set_credits(struct TCP_Server_Info *server, const int val)
{
server->ops->set_credits(server, val);
@@ -1075,6 +1088,7 @@ struct cifs_writedata {
int result;
unsigned int pagesz;
unsigned int tailsz;
+ unsigned int credits;
unsigned int nr_pages;
struct page *pages[];
};
@@ -1400,6 +1414,7 @@ static inline void free_dfs_info_array(struct dfs_info3_param *param,
#define CIFS_OBREAK_OP 0x0100 /* oplock break request */
#define CIFS_NEG_OP 0x0200 /* negotiate request */
#define CIFS_OP_MASK 0x0380 /* mask request type */
+#define CIFS_HAS_CREDITS 0x0400 /* already has credits */

/* Security Flags: indicate type of session setup needed */
#define CIFSSEC_MAY_SIGN 0x00001
diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index de49d7a..c31ce98 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -90,6 +90,9 @@ extern struct mid_q_entry *cifs_setup_async_request(struct TCP_Server_Info *,
struct smb_rqst *);
extern int cifs_check_receive(struct mid_q_entry *mid,
struct TCP_Server_Info *server, bool log_error);
+extern int cifs_wait_mtu_credits(struct TCP_Server_Info *server,
+ unsigned int size, unsigned int *num,
+ unsigned int *credits);
extern int SendReceive2(const unsigned int /* xid */ , struct cifs_ses *,
struct kvec *, int /* nvec to send */,
int * /* type of buf returned */ , const int flags);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index f661977..2e49a17 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1670,8 +1670,8 @@ cifs_write(struct cifsFileInfo *open_file, __u32 pid, const char *write_data,
break;
}

- len = min((size_t)cifs_sb->wsize,
- write_size - total_written);
+ len = min(server->ops->wp_retry_size(dentry->d_inode),
+ (unsigned int)write_size - total_written);
/* iov[0] is reserved for smb header */
iov[1].iov_base = (char *)write_data + total_written;
iov[1].iov_len = len;
@@ -2046,6 +2046,7 @@ static int cifs_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
struct cifs_sb_info *cifs_sb = CIFS_SB(mapping->host->i_sb);
+ struct TCP_Server_Info *server;
bool done = false, scanned = false, range_whole = false;
pgoff_t end, index;
struct cifs_writedata *wdata;
@@ -2068,23 +2069,30 @@ static int cifs_writepages(struct address_space *mapping,
range_whole = true;
scanned = true;
}
+ server = cifs_sb_master_tcon(cifs_sb)->ses->server;
retry:
while (!done && index <= end) {
- unsigned int nr_pages, found_pages;
+ unsigned int nr_pages, found_pages, wsize, credits;
pgoff_t next = 0, tofind, saved_index = index;

- tofind = min((cifs_sb->wsize / PAGE_CACHE_SIZE) - 1,
- end - index) + 1;
+ rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
+ &wsize, &credits);
+ if (rc)
+ break;
+
+ tofind = min((wsize / PAGE_CACHE_SIZE) - 1, end - index) + 1;

wdata = wdata_alloc_and_fillpages(tofind, mapping, end, &index,
&found_pages);
if (!wdata) {
rc = -ENOMEM;
+ add_credits_and_wake_if(server, credits, 0);
break;
}

if (found_pages == 0) {
kref_put(&wdata->refcount, cifs_writedata_release);
+ add_credits_and_wake_if(server, credits, 0);
break;
}

@@ -2094,10 +2102,16 @@ retry:
/* nothing to write? */
if (nr_pages == 0) {
kref_put(&wdata->refcount, cifs_writedata_release);
+ add_credits_and_wake_if(server, credits, 0);
continue;
}

+ wdata->credits = credits;
+
rc = wdata_send_pages(wdata, nr_pages, mapping, wbc);
+ if (rc)
+ add_credits_and_wake_if(server, wdata->credits, 0);
+
kref_put(&wdata->refcount, cifs_writedata_release);

if (wbc->sync_mode == WB_SYNC_ALL && rc == -EAGAIN) {
@@ -2469,17 +2483,26 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
memcpy(&saved_from, from, sizeof(struct iov_iter));

do {
- nr_pages = get_numpages(cifs_sb->wsize, len, &cur_len);
+ unsigned int wsize, credits;
+
+ rc = server->ops->wait_mtu_credits(server, cifs_sb->wsize,
+ &wsize, &credits);
+ if (rc)
+ break;
+
+ nr_pages = get_numpages(wsize, len, &cur_len);
wdata = cifs_writedata_alloc(nr_pages,
cifs_uncached_writev_complete);
if (!wdata) {
rc = -ENOMEM;
+ add_credits_and_wake_if(server, credits, 0);
break;
}

rc = cifs_write_allocate_pages(wdata->pages, nr_pages);
if (rc) {
kfree(wdata);
+ add_credits_and_wake_if(server, credits, 0);
break;
}

@@ -2488,6 +2511,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
for (i = 0; i < nr_pages; i++)
put_page(wdata->pages[i]);
kfree(wdata);
+ add_credits_and_wake_if(server, credits, 0);
break;
}

@@ -2499,12 +2523,14 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
wdata->bytes = cur_len;
wdata->pagesz = PAGE_SIZE;
wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE);
+ wdata->credits = credits;

if (!wdata->cfile->invalidHandle ||
!cifs_reopen_file(wdata->cfile, false))
rc = server->ops->async_writev(wdata,
cifs_uncached_writedata_release);
if (rc) {
+ add_credits_and_wake_if(server, wdata->credits, 0);
kref_put(&wdata->refcount,
cifs_uncached_writedata_release);
if (rc == -EAGAIN) {
diff --git a/fs/cifs/smb1ops.c b/fs/cifs/smb1ops.c
index 8a96342..5e8c22d 100644
--- a/fs/cifs/smb1ops.c
+++ b/fs/cifs/smb1ops.c
@@ -1025,6 +1025,7 @@ struct smb_version_operations smb1_operations = {
.set_credits = cifs_set_credits,
.get_credits_field = cifs_get_credits_field,
.get_credits = cifs_get_credits,
+ .wait_mtu_credits = cifs_wait_mtu_credits,
.get_next_mid = cifs_get_next_mid,
.read_data_offset = cifs_read_data_offset,
.read_data_length = cifs_read_data_length,
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index e35ce5b..fecc2de 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -112,6 +112,53 @@ smb2_get_credits(struct mid_q_entry *mid)
return le16_to_cpu(((struct smb2_hdr *)mid->resp_buf)->CreditRequest);
}

+static int
+smb2_wait_mtu_credits(struct TCP_Server_Info *server, unsigned int size,
+ unsigned int *num, unsigned int *credits)
+{
+ int rc = 0;
+ unsigned int scredits;
+
+ spin_lock(&server->req_lock);
+ while (1) {
+ if (server->credits <= 0) {
+ spin_unlock(&server->req_lock);
+ cifs_num_waiters_inc(server);
+ rc = wait_event_killable(server->request_q,
+ has_credits(server, &server->credits));
+ cifs_num_waiters_dec(server);
+ if (rc)
+ return rc;
+ spin_lock(&server->req_lock);
+ } else {
+ if (server->tcpStatus == CifsExiting) {
+ spin_unlock(&server->req_lock);
+ return -ENOENT;
+ }
+
+ scredits = server->credits;
+ /* can deadlock with reopen */
+ if (scredits == 1) {
+ *num = SMB2_MAX_BUFFER_SIZE;
+ *credits = 0;
+ break;
+ }
+
+ /* leave one credit for a possible reopen */
+ scredits--;
+ *num = min_t(unsigned int, size,
+ scredits * SMB2_MAX_BUFFER_SIZE);
+
+ *credits = DIV_ROUND_UP(*num, SMB2_MAX_BUFFER_SIZE);
+ server->credits -= *credits;
+ server->in_flight++;
+ break;
+ }
+ }
+ spin_unlock(&server->req_lock);
+ return rc;
+}
+
static __u64
smb2_get_next_mid(struct TCP_Server_Info *server)
{
@@ -182,8 +229,6 @@ smb2_negotiate_wsize(struct cifs_tcon *tcon, struct smb_vol *volume_info)
/* start with specified wsize, or default */
wsize = volume_info->wsize ? volume_info->wsize : CIFS_DEFAULT_IOSIZE;
wsize = min_t(unsigned int, wsize, server->max_write);
- /* set it to the maximum buffer size value we can send with 1 credit */
- wsize = min_t(unsigned int, wsize, SMB2_MAX_BUFFER_SIZE);

return wsize;
}
@@ -1120,6 +1165,7 @@ struct smb_version_operations smb20_operations = {
.set_credits = smb2_set_credits,
.get_credits_field = smb2_get_credits_field,
.get_credits = smb2_get_credits,
+ .wait_mtu_credits = cifs_wait_mtu_credits,
.get_next_mid = smb2_get_next_mid,
.read_data_offset = smb2_read_data_offset,
.read_data_length = smb2_read_data_length,
@@ -1196,6 +1242,7 @@ struct smb_version_operations smb21_operations = {
.set_credits = smb2_set_credits,
.get_credits_field = smb2_get_credits_field,
.get_credits = smb2_get_credits,
+ .wait_mtu_credits = smb2_wait_mtu_credits,
.get_next_mid = smb2_get_next_mid,
.read_data_offset = smb2_read_data_offset,
.read_data_length = smb2_read_data_length,
@@ -1272,6 +1319,7 @@ struct smb_version_operations smb30_operations = {
.set_credits = smb2_set_credits,
.get_credits_field = smb2_get_credits_field,
.get_credits = smb2_get_credits,
+ .wait_mtu_credits = smb2_wait_mtu_credits,
.get_next_mid = smb2_get_next_mid,
.read_data_offset = smb2_read_data_offset,
.read_data_length = smb2_read_data_length,
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index da7aa62..b0c89ef 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -1892,15 +1892,25 @@ int
smb2_async_writev(struct cifs_writedata *wdata,
void (*release)(struct kref *kref))
{
- int rc = -EACCES;
+ int rc = -EACCES, flags = 0;
struct smb2_write_req *req = NULL;
struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
+ struct TCP_Server_Info *server = tcon->ses->server;
struct kvec iov;
struct smb_rqst rqst;

rc = small_smb2_init(SMB2_WRITE, tcon, (void **) &req);
- if (rc)
+ if (rc) {
+ if (rc == -EAGAIN && wdata->credits) {
+ /* credits was reseted by reconnect */
+ wdata->credits = 0;
+ /* reduce in_flight value since we won't send the req */
+ spin_lock(&server->req_lock);
+ server->in_flight--;
+ spin_unlock(&server->req_lock);
+ }
goto async_writev_out;
+ }

req->hdr.ProcessId = cpu_to_le32(wdata->cfile->pid);

@@ -1933,9 +1943,20 @@ smb2_async_writev(struct cifs_writedata *wdata,

inc_rfc1001_len(&req->hdr, wdata->bytes - 1 /* Buffer */);

+ if (wdata->credits) {
+ req->hdr.CreditCharge = cpu_to_le16(DIV_ROUND_UP(wdata->bytes,
+ SMB2_MAX_BUFFER_SIZE));
+ spin_lock(&server->req_lock);
+ server->credits += wdata->credits -
+ le16_to_cpu(req->hdr.CreditCharge);
+ spin_unlock(&server->req_lock);
+ wake_up(&server->request_q);
+ flags = CIFS_HAS_CREDITS;
+ }
+
kref_get(&wdata->refcount);
- rc = cifs_call_async(tcon->ses->server, &rqst, NULL,
- smb2_writev_callback, wdata, 0);
+ rc = cifs_call_async(server, &rqst, NULL, smb2_writev_callback, wdata,
+ flags);

if (rc) {
kref_put(&wdata->refcount, release);
diff --git a/fs/cifs/smb2transport.c b/fs/cifs/smb2transport.c
index 59c748c..5111e72 100644
--- a/fs/cifs/smb2transport.c
+++ b/fs/cifs/smb2transport.c
@@ -466,7 +466,12 @@ smb2_verify_signature(struct smb_rqst *rqst, struct TCP_Server_Info *server)
static inline void
smb2_seq_num_into_buf(struct TCP_Server_Info *server, struct smb2_hdr *hdr)
{
+ unsigned int i, num = le16_to_cpu(hdr->CreditCharge);
+
hdr->MessageId = get_next_mid64(server);
+ /* skip message numbers according to CreditCharge field */
+ for (i = 1; i < num; i++)
+ get_next_mid(server);
}

static struct mid_q_entry *
diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index 18cd565..9d087f4 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -448,6 +448,15 @@ wait_for_free_request(struct TCP_Server_Info *server, const int timeout,
return wait_for_free_credits(server, timeout, val);
}

+int
+cifs_wait_mtu_credits(struct TCP_Server_Info *server, unsigned int size,
+ unsigned int *num, unsigned int *credits)
+{
+ *num = size;
+ *credits = 0;
+ return 0;
+}
+
static int allocate_mid(struct cifs_ses *ses, struct smb_hdr *in_buf,
struct mid_q_entry **ppmidQ)
{
@@ -531,20 +540,23 @@ cifs_call_async(struct TCP_Server_Info *server, struct smb_rqst *rqst,
{
int rc, timeout, optype;
struct mid_q_entry *mid;
+ unsigned int credits = 0;

timeout = flags & CIFS_TIMEOUT_MASK;
optype = flags & CIFS_OP_MASK;

- rc = wait_for_free_request(server, timeout, optype);
- if (rc)
- return rc;
+ if ((flags & CIFS_HAS_CREDITS) == 0) {
+ rc = wait_for_free_request(server, timeout, optype);
+ if (rc)
+ return rc;
+ credits = 1;
+ }

mutex_lock(&server->srv_mutex);
mid = server->ops->setup_async_request(server, rqst);
if (IS_ERR(mid)) {
mutex_unlock(&server->srv_mutex);
- add_credits(server, 1, optype);
- wake_up(&server->request_q);
+ add_credits_and_wake_if(server, credits, optype);
return PTR_ERR(mid);
}

@@ -572,8 +584,7 @@ cifs_call_async(struct TCP_Server_Info *server, struct smb_rqst *rqst,
return 0;

cifs_delete_mid(mid);
- add_credits(server, 1, optype);
- wake_up(&server->request_q);
+ add_credits_and_wake_if(server, credits, optype);
return rc;
}
--
1.8.1.2
Pavel Shilovsky
2014-06-27 09:57:48 UTC
Permalink
Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/file.c | 107 ++++++++++++++++++++++++++++++++-------------------------
1 file changed, 61 insertions(+), 46 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 2e49a17..ee7c547 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3339,6 +3339,63 @@ cifs_readpages_read_into_pages(struct TCP_Server_Info *server,
return total_read > 0 && result != -EAGAIN ? total_read : result;
}

+static int
+readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
+ unsigned int rsize, struct list_head *tmplist,
+ unsigned int *nr_pages, loff_t *offset, unsigned int *bytes)
+{
+ struct page *page, *tpage;
+ unsigned int expected_index;
+ int rc;
+
+ page = list_entry(page_list->prev, struct page, lru);
+
+ /*
+ * Lock the page and put it in the cache. Since no one else
+ * should have access to this page, we're safe to simply set
+ * PG_locked without checking it first.
+ */
+ __set_page_locked(page);
+ rc = add_to_page_cache_locked(page, mapping,
+ page->index, GFP_KERNEL);
+
+ /* give up if we can't stick it in the cache */
+ if (rc) {
+ __clear_page_locked(page);
+ return rc;
+ }
+
+ /* move first page to the tmplist */
+ *offset = (loff_t)page->index << PAGE_CACHE_SHIFT;
+ *bytes = PAGE_CACHE_SIZE;
+ *nr_pages = 1;
+ list_move_tail(&page->lru, tmplist);
+
+ /* now try and add more pages onto the request */
+ expected_index = page->index + 1;
+ list_for_each_entry_safe_reverse(page, tpage, page_list, lru) {
+ /* discontinuity ? */
+ if (page->index != expected_index)
+ break;
+
+ /* would this page push the read over the rsize? */
+ if (*bytes + PAGE_CACHE_SIZE > rsize)
+ break;
+
+ __set_page_locked(page);
+ if (add_to_page_cache_locked(page, mapping, page->index,
+ GFP_KERNEL)) {
+ __clear_page_locked(page);
+ break;
+ }
+ list_move_tail(&page->lru, tmplist);
+ (*bytes) += PAGE_CACHE_SIZE;
+ expected_index++;
+ (*nr_pages)++;
+ }
+ return rc;
+}
+
static int cifs_readpages(struct file *file, struct address_space *mapping,
struct list_head *page_list, unsigned num_pages)
{
@@ -3393,57 +3450,15 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
* the rdata->pages, then we want them in increasing order.
*/
while (!list_empty(page_list)) {
- unsigned int i;
- unsigned int bytes = PAGE_CACHE_SIZE;
- unsigned int expected_index;
- unsigned int nr_pages = 1;
+ unsigned int i, nr_pages, bytes;
loff_t offset;
struct page *page, *tpage;
struct cifs_readdata *rdata;

- page = list_entry(page_list->prev, struct page, lru);
-
- /*
- * Lock the page and put it in the cache. Since no one else
- * should have access to this page, we're safe to simply set
- * PG_locked without checking it first.
- */
- __set_page_locked(page);
- rc = add_to_page_cache_locked(page, mapping,
- page->index, GFP_KERNEL);
-
- /* give up if we can't stick it in the cache */
- if (rc) {
- __clear_page_locked(page);
+ rc = readpages_get_pages(mapping, page_list, rsize, &tmplist,
+ &nr_pages, &offset, &bytes);
+ if (rc)
break;
- }
-
- /* move first page to the tmplist */
- offset = (loff_t)page->index << PAGE_CACHE_SHIFT;
- list_move_tail(&page->lru, &tmplist);
-
- /* now try and add more pages onto the request */
- expected_index = page->index + 1;
- list_for_each_entry_safe_reverse(page, tpage, page_list, lru) {
- /* discontinuity ? */
- if (page->index != expected_index)
- break;
-
- /* would this page push the read over the rsize? */
- if (bytes + PAGE_CACHE_SIZE > rsize)
- break;
-
- __set_page_locked(page);
- if (add_to_page_cache_locked(page, mapping,
- page->index, GFP_KERNEL)) {
- __clear_page_locked(page);
- break;
- }
- list_move_tail(&page->lru, &tmplist);
- bytes += PAGE_CACHE_SIZE;
- expected_index++;
- nr_pages++;
- }

rdata = cifs_readdata_alloc(nr_pages, cifs_readv_complete);
if (!rdata) {
--
1.8.1.2
Pavel Shilovsky
2014-06-27 09:57:49 UTC
Permalink
If a server changes maximum buffer size for read (rsize) requests
on reconnect we can fail on repeating with a big size buffer on
-EAGAIN error in readpages. Fix this by checking rsize all the
time before repeating requests.

Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/file.c | 41 ++++++++++++++++++++++++++---------------
1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index ee7c547..5f3fa33 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3348,6 +3348,8 @@ readpages_get_pages(struct address_space *mapping, struct list_head *page_list,
unsigned int expected_index;
int rc;

+ INIT_LIST_HEAD(tmplist);
+
page = list_entry(page_list->prev, struct page, lru);

/*
@@ -3403,19 +3405,10 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
struct list_head tmplist;
struct cifsFileInfo *open_file = file->private_data;
struct cifs_sb_info *cifs_sb = CIFS_SB(file->f_path.dentry->d_sb);
- unsigned int rsize = cifs_sb->rsize;
+ struct TCP_Server_Info *server;
pid_t pid;

/*
- * Give up immediately if rsize is too small to read an entire page.
- * The VFS will fall back to readpage. We should never reach this
- * point however since we set ra_pages to 0 when the rsize is smaller
- * than a cache page.
- */
- if (unlikely(rsize < PAGE_CACHE_SIZE))
- return 0;
-
- /*
* Reads as many pages as possible from fscache. Returns -ENOBUFS
* immediately if the cookie is negative
*
@@ -3433,7 +3426,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
pid = current->tgid;

rc = 0;
- INIT_LIST_HEAD(&tmplist);
+ server = tlink_tcon(open_file->tlink)->ses->server;

cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n",
__func__, file, mapping, num_pages);
@@ -3455,8 +3448,17 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
struct page *page, *tpage;
struct cifs_readdata *rdata;

- rc = readpages_get_pages(mapping, page_list, rsize, &tmplist,
- &nr_pages, &offset, &bytes);
+ /*
+ * Give up immediately if rsize is too small to read an entire
+ * page. The VFS will fall back to readpage. We should never
+ * reach this point however since we set ra_pages to 0 when the
+ * rsize is smaller than a cache page.
+ */
+ if (unlikely(cifs_sb->rsize < PAGE_CACHE_SIZE))
+ return 0;
+
+ rc = readpages_get_pages(mapping, page_list, cifs_sb->rsize,
+ &tmplist, &nr_pages, &offset, &bytes);
if (rc)
break;

@@ -3486,15 +3488,24 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
rdata->pages[rdata->nr_pages++] = page;
}

- rc = cifs_retry_async_readv(rdata);
- if (rc != 0) {
+ if (!rdata->cfile->invalidHandle ||
+ !cifs_reopen_file(rdata->cfile, true))
+ rc = server->ops->async_readv(rdata);
+ if (rc) {
for (i = 0; i < rdata->nr_pages; i++) {
page = rdata->pages[i];
lru_cache_add_file(page);
unlock_page(page);
page_cache_release(page);
+ if (rc == -EAGAIN)
+ list_add_tail(&page->lru, &tmplist);
}
kref_put(&rdata->refcount, cifs_readdata_release);
+ if (rc == -EAGAIN) {
+ /* Re-add pages to the page_list and retry */
+ list_splice(&tmplist, page_list);
+ continue;
+ }
break;
}
--
1.8.1.2
Pavel Shilovsky
2014-06-27 09:57:50 UTC
Permalink
Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/file.c | 69 ++++++++++++++++++++++++++++++++++------------------------
1 file changed, 41 insertions(+), 28 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 5f3fa33..6b5b95a 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2931,43 +2931,23 @@ cifs_uncached_read_into_pages(struct TCP_Server_Info *server,
return total_read > 0 && result != -EAGAIN ? total_read : result;
}

-ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
+static int
+cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
+ struct cifs_sb_info *cifs_sb, struct list_head *rdata_list)
{
- struct file *file = iocb->ki_filp;
- ssize_t rc;
- size_t len, cur_len;
- ssize_t total_read = 0;
- loff_t offset = iocb->ki_pos;
+ struct cifs_readdata *rdata;
unsigned int npages;
- struct cifs_sb_info *cifs_sb;
- struct cifs_tcon *tcon;
- struct cifsFileInfo *open_file;
- struct cifs_readdata *rdata, *tmp;
- struct list_head rdata_list;
+ size_t cur_len;
+ int rc;
pid_t pid;

- len = iov_iter_count(to);
- if (!len)
- return 0;
-
- INIT_LIST_HEAD(&rdata_list);
- cifs_sb = CIFS_SB(file->f_path.dentry->d_sb);
- open_file = file->private_data;
- tcon = tlink_tcon(open_file->tlink);
-
- if (!tcon->ses->server->ops->async_readv)
- return -ENOSYS;
-
if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
pid = open_file->pid;
else
pid = current->tgid;

- if ((file->f_flags & O_ACCMODE) == O_WRONLY)
- cifs_dbg(FYI, "attempting read on write only file instance\n");
-
do {
- cur_len = min_t(const size_t, len - total_read, cifs_sb->rsize);
+ cur_len = min_t(const size_t, len, cifs_sb->rsize);
npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);

/* allocate a readdata struct */
@@ -2998,11 +2978,44 @@ error:
break;
}

- list_add_tail(&rdata->list, &rdata_list);
+ list_add_tail(&rdata->list, rdata_list);
offset += cur_len;
len -= cur_len;
} while (len > 0);

+ return rc;
+}
+
+ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
+{
+ struct file *file = iocb->ki_filp;
+ ssize_t rc;
+ size_t len;
+ ssize_t total_read = 0;
+ loff_t offset = iocb->ki_pos;
+ struct cifs_sb_info *cifs_sb;
+ struct cifs_tcon *tcon;
+ struct cifsFileInfo *open_file;
+ struct cifs_readdata *rdata, *tmp;
+ struct list_head rdata_list;
+
+ len = iov_iter_count(to);
+ if (!len)
+ return 0;
+
+ INIT_LIST_HEAD(&rdata_list);
+ cifs_sb = CIFS_SB(file->f_path.dentry->d_sb);
+ open_file = file->private_data;
+ tcon = tlink_tcon(open_file->tlink);
+
+ if (!tcon->ses->server->ops->async_readv)
+ return -ENOSYS;
+
+ if ((file->f_flags & O_ACCMODE) == O_WRONLY)
+ cifs_dbg(FYI, "attempting read on write only file instance\n");
+
+ rc = cifs_send_async_read(offset, len, open_file, cifs_sb, &rdata_list);
+
/* if at least one read request send succeeded, then reset rc */
if (!list_empty(&rdata_list))
rc = 0;
--
1.8.1.2
Pavel Shilovsky
2014-06-27 09:57:51 UTC
Permalink
If a server changes maximum buffer size for read (rsize) requests
on reconnect we can fail on repeating with a big size buffer on
-EAGAIN error in user read. Fix this by checking rsize all the
time before repeating requests.

Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/file.c | 45 ++++++++++++++++++++++-----------------------
1 file changed, 22 insertions(+), 23 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 6b5b95a..021adf6 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2827,26 +2827,6 @@ cifs_uncached_readdata_release(struct kref *refcount)
cifs_readdata_release(refcount);
}

-static int
-cifs_retry_async_readv(struct cifs_readdata *rdata)
-{
- int rc;
- struct TCP_Server_Info *server;
-
- server = tlink_tcon(rdata->cfile->tlink)->ses->server;
-
- do {
- if (rdata->cfile->invalidHandle) {
- rc = cifs_reopen_file(rdata->cfile, true);
- if (rc != 0)
- continue;
- }
- rc = server->ops->async_readv(rdata);
- } while (rc == -EAGAIN);
-
- return rc;
-}
-
/**
* cifs_readdata_to_iov - copy data from pages in response to an iovec
* @rdata: the readdata response with list of pages holding data
@@ -2940,6 +2920,9 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
size_t cur_len;
int rc;
pid_t pid;
+ struct TCP_Server_Info *server;
+
+ server = tlink_tcon(open_file->tlink)->ses->server;

if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD)
pid = open_file->pid;
@@ -2970,11 +2953,15 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
rdata->pagesz = PAGE_SIZE;
rdata->read_into_pages = cifs_uncached_read_into_pages;

- rc = cifs_retry_async_readv(rdata);
+ if (!rdata->cfile->invalidHandle ||
+ !cifs_reopen_file(rdata->cfile, true))
+ rc = server->ops->async_readv(rdata);
error:
if (rc) {
kref_put(&rdata->refcount,
cifs_uncached_readdata_release);
+ if (rc == -EAGAIN)
+ continue;
break;
}

@@ -3022,8 +3009,8 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)

len = iov_iter_count(to);
/* the loop below should proceed in the order of increasing offsets */
+again:
list_for_each_entry_safe(rdata, tmp, &rdata_list, list) {
- again:
if (!rc) {
/* FIXME: freezable sleep too? */
rc = wait_for_completion_killable(&rdata->done);
@@ -3033,7 +3020,19 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
rc = rdata->result;
/* resend call if it's a retryable error */
if (rc == -EAGAIN) {
- rc = cifs_retry_async_readv(rdata);
+ struct list_head tmp_list;
+
+ list_del_init(&rdata->list);
+ INIT_LIST_HEAD(&tmp_list);
+
+ rc = cifs_send_async_read(rdata->offset,
+ rdata->bytes, rdata->cfile,
+ cifs_sb, &tmp_list);
+
+ list_splice(&tmp_list, &rdata_list);
+
+ kref_put(&rdata->refcount,
+ cifs_uncached_readdata_release);
goto again;
}
} else {
--
1.8.1.2
Pavel Shilovsky
2014-06-27 09:57:52 UTC
Permalink
If a server changes maximum buffer size for read requests (rsize)
on reconnect we can fail on repeating with a big size buffer on
-EAGAIN error in cifs_read. Fix this by checking rsize all the
time before repeating requests.

Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/file.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 021adf6..36fd042 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3147,18 +3147,19 @@ cifs_read(struct file *file, char *read_data, size_t read_size, loff_t *offset)

for (total_read = 0, cur_offset = read_data; read_size > total_read;
total_read += bytes_read, cur_offset += bytes_read) {
- current_read_size = min_t(uint, read_size - total_read, rsize);
- /*
- * For windows me and 9x we do not want to request more than it
- * negotiated since it will refuse the read then.
- */
- if ((tcon->ses) && !(tcon->ses->capabilities &
+ do {
+ current_read_size = min_t(uint, read_size - total_read,
+ rsize);
+ /*
+ * For windows me and 9x we do not want to request more
+ * than it negotiated since it will refuse the read
+ * then.
+ */
+ if ((tcon->ses) && !(tcon->ses->capabilities &
tcon->ses->server->vals->cap_large_files)) {
- current_read_size = min_t(uint, current_read_size,
- CIFSMaxBufSize);
- }
- rc = -EAGAIN;
- while (rc == -EAGAIN) {
+ current_read_size = min_t(uint,
+ current_read_size, CIFSMaxBufSize);
+ }
if (open_file->invalidHandle) {
rc = cifs_reopen_file(open_file, true);
if (rc != 0)
@@ -3171,7 +3172,8 @@ cifs_read(struct file *file, char *read_data, size_t read_size, loff_t *offset)
rc = server->ops->sync_read(xid, open_file, &io_parms,
&bytes_read, &cur_offset,
&buf_type);
- }
+ } while (rc == -EAGAIN);
+
if (rc || (bytes_read == 0)) {
if (total_read) {
break;
--
1.8.1.2
Pavel Shilovsky
2014-06-27 09:57:53 UTC
Permalink
If we negotiate SMB 2.1 and higher version of the protocol and
a server supports large read buffer size, we need to consume 1
credit per 65536 bytes. So, we need to know how many credits
we have and obtain the required number of them before constructing
a readdata structure in readpages and user read.

Signed-off-by: Pavel Shilovsky <pshilovsky-***@public.gmane.org>
---
fs/cifs/cifsglob.h | 1 +
fs/cifs/file.c | 35 ++++++++++++++++++++++++++++-------
fs/cifs/smb2ops.c | 2 --
fs/cifs/smb2pdu.c | 30 +++++++++++++++++++++++++++---
4 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index 54ca2b9..f33ff4c 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -1068,6 +1068,7 @@ struct cifs_readdata {
struct kvec iov;
unsigned int pagesz;
unsigned int tailsz;
+ unsigned int credits;
unsigned int nr_pages;
struct page *pages[];
};
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 36fd042..d5f66ba 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2916,7 +2916,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
struct cifs_sb_info *cifs_sb, struct list_head *rdata_list)
{
struct cifs_readdata *rdata;
- unsigned int npages;
+ unsigned int npages, rsize, credits;
size_t cur_len;
int rc;
pid_t pid;
@@ -2930,13 +2930,19 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
pid = current->tgid;

do {
- cur_len = min_t(const size_t, len, cifs_sb->rsize);
+ rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
+ &rsize, &credits);
+ if (rc)
+ break;
+
+ cur_len = min_t(const size_t, len, rsize);
npages = DIV_ROUND_UP(cur_len, PAGE_SIZE);

/* allocate a readdata struct */
rdata = cifs_readdata_alloc(npages,
cifs_uncached_readv_complete);
if (!rdata) {
+ add_credits_and_wake_if(server, credits, 0);
rc = -ENOMEM;
break;
}
@@ -2952,12 +2958,14 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
rdata->pid = pid;
rdata->pagesz = PAGE_SIZE;
rdata->read_into_pages = cifs_uncached_read_into_pages;
+ rdata->credits = credits;

if (!rdata->cfile->invalidHandle ||
!cifs_reopen_file(rdata->cfile, true))
rc = server->ops->async_readv(rdata);
error:
if (rc) {
+ add_credits_and_wake_if(server, rdata->credits, 0);
kref_put(&rdata->refcount,
cifs_uncached_readdata_release);
if (rc == -EAGAIN)
@@ -3457,10 +3465,16 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
* the rdata->pages, then we want them in increasing order.
*/
while (!list_empty(page_list)) {
- unsigned int i, nr_pages, bytes;
+ unsigned int i, nr_pages, bytes, rsize;
loff_t offset;
struct page *page, *tpage;
struct cifs_readdata *rdata;
+ unsigned credits;
+
+ rc = server->ops->wait_mtu_credits(server, cifs_sb->rsize,
+ &rsize, &credits);
+ if (rc)
+ break;

/*
* Give up immediately if rsize is too small to read an entire
@@ -3468,13 +3482,17 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
* reach this point however since we set ra_pages to 0 when the
* rsize is smaller than a cache page.
*/
- if (unlikely(cifs_sb->rsize < PAGE_CACHE_SIZE))
+ if (unlikely(rsize < PAGE_CACHE_SIZE)) {
+ add_credits_and_wake_if(server, credits, 0);
return 0;
+ }

- rc = readpages_get_pages(mapping, page_list, cifs_sb->rsize,
- &tmplist, &nr_pages, &offset, &bytes);
- if (rc)
+ rc = readpages_get_pages(mapping, page_list, rsize, &tmplist,
+ &nr_pages, &offset, &bytes);
+ if (rc) {
+ add_credits_and_wake_if(server, credits, 0);
break;
+ }

rdata = cifs_readdata_alloc(nr_pages, cifs_readv_complete);
if (!rdata) {
@@ -3486,6 +3504,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
page_cache_release(page);
}
rc = -ENOMEM;
+ add_credits_and_wake_if(server, credits, 0);
break;
}

@@ -3496,6 +3515,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
rdata->pid = pid;
rdata->pagesz = PAGE_CACHE_SIZE;
rdata->read_into_pages = cifs_readpages_read_into_pages;
+ rdata->credits = credits;

list_for_each_entry_safe(page, tpage, &tmplist, lru) {
list_del(&page->lru);
@@ -3506,6 +3526,7 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
!cifs_reopen_file(rdata->cfile, true))
rc = server->ops->async_readv(rdata);
if (rc) {
+ add_credits_and_wake_if(server, rdata->credits, 0);
for (i = 0; i < rdata->nr_pages; i++) {
page = rdata->pages[i];
lru_cache_add_file(page);
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index fecc2de..081529f 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -242,8 +242,6 @@ smb2_negotiate_rsize(struct cifs_tcon *tcon, struct smb_vol *volume_info)
/* start with specified rsize, or default */
rsize = volume_info->rsize ? volume_info->rsize : CIFS_DEFAULT_IOSIZE;
rsize = min_t(unsigned int, rsize, server->max_read);
- /* set it to the maximum buffer size value we can send with 1 credit */
- rsize = min_t(unsigned int, rsize, SMB2_MAX_BUFFER_SIZE);

return rsize;
}
diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index b0c89ef..5b4ca0c 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -1748,11 +1748,12 @@ smb2_readv_callback(struct mid_q_entry *mid)
int
smb2_async_readv(struct cifs_readdata *rdata)
{
- int rc;
+ int rc, flags = 0;
struct smb2_hdr *buf;
struct cifs_io_parms io_parms;
struct smb_rqst rqst = { .rq_iov = &rdata->iov,
.rq_nvec = 1 };
+ struct TCP_Server_Info *server;

cifs_dbg(FYI, "%s: offset=%llu bytes=%u\n",
__func__, rdata->offset, rdata->bytes);
@@ -1763,18 +1764,41 @@ smb2_async_readv(struct cifs_readdata *rdata)
io_parms.persistent_fid = rdata->cfile->fid.persistent_fid;
io_parms.volatile_fid = rdata->cfile->fid.volatile_fid;
io_parms.pid = rdata->pid;
+
+ server = io_parms.tcon->ses->server;
+
rc = smb2_new_read_req(&rdata->iov, &io_parms, 0, 0);
- if (rc)
+ if (rc) {
+ if (rc == -EAGAIN && rdata->credits) {
+ /* credits was reseted by reconnect */
+ rdata->credits = 0;
+ /* reduce in_flight value since we won't send the req */
+ spin_lock(&server->req_lock);
+ server->in_flight--;
+ spin_unlock(&server->req_lock);
+ }
return rc;
+ }

buf = (struct smb2_hdr *)rdata->iov.iov_base;
/* 4 for rfc1002 length field */
rdata->iov.iov_len = get_rfc1002_length(rdata->iov.iov_base) + 4;

+ if (rdata->credits) {
+ buf->CreditCharge = cpu_to_le16(DIV_ROUND_UP(rdata->bytes,
+ SMB2_MAX_BUFFER_SIZE));
+ spin_lock(&server->req_lock);
+ server->credits += rdata->credits -
+ le16_to_cpu(buf->CreditCharge);
+ spin_unlock(&server->req_lock);
+ wake_up(&server->request_q);
+ flags = CIFS_HAS_CREDITS;
+ }
+
kref_get(&rdata->refcount);
rc = cifs_call_async(io_parms.tcon->ses->server, &rqst,
cifs_readv_receive, smb2_readv_callback,
- rdata, 0);
+ rdata, flags);
if (rc) {
kref_put(&rdata->refcount, cifs_readdata_release);
cifs_stats_fail_inc(io_parms.tcon, SMB2_READ_HE);
--
1.8.1.2
Jeff Layton
2014-06-27 10:52:22 UTC
Permalink
On Fri, 27 Jun 2014 13:57:38 +0400
Post by Pavel Shilovsky
If we get into read_into_pages() from cifs_readv_receive() and then
loose a network, we issue cifs_reconnect that moves all mids to
a private list and issue their callbacks. The callback of the async
read request sets a mid to retry, frees it and wakes up a process
that waits on the rdata completion.
After the connection is established we return from read_into_pages()
with a short read, use the mid that was freed before and try to read
the remaining data from the a newly created socket. Both actions are
not what we want to do. In reconnect cases (-EAGAIN) we should not
mask off the error with a short read but should return the error
code instead.
I'm not sure I understand what problem this solves. Why is returning a
short read wrong here?
Post by Pavel Shilovsky
---
fs/cifs/file.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index e90a1e9..6b6df30 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2823,7 +2823,7 @@ cifs_uncached_read_into_pages(struct TCP_Server_Info *server,
total_read += result;
}
- return total_read > 0 ? total_read : result;
+ return total_read > 0 && result != -EAGAIN ? total_read : result;
}
ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
@@ -3231,7 +3231,7 @@ cifs_readpages_read_into_pages(struct TCP_Server_Info *server,
total_read += result;
}
- return total_read > 0 ? total_read : result;
+ return total_read > 0 && result != -EAGAIN ? total_read : result;
}
static int cifs_readpages(struct file *file, struct address_space *mapping,
--
Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+***@public.gmane.org>
Steve French
2014-06-27 14:14:26 UTC
Permalink
---------- Forwarded message ----------
From: Steve French <smfrench-***@public.gmane.org>
Date: Fri, Jun 27, 2014 at 9:12 AM
Subject: Re: [PATCH v2 01/16] CIFS: Fix async reading on reconnects
Post by Jeff Layton
On Fri, 27 Jun 2014 13:57:38 +0400
Post by Pavel Shilovsky
If we get into read_into_pages() from cifs_readv_receive() and then
loose a network, we issue cifs_reconnect that moves all mids to
a private list and issue their callbacks. The callback of the async
read request sets a mid to retry, frees it and wakes up a process
that waits on the rdata completion.
After the connection is established we return from read_into_pages()
with a short read, use the mid that was freed before and try to read
the remaining data from the a newly created socket. Both actions are
not what we want to do. In reconnect cases (-EAGAIN) we should not
mask off the error with a short read but should return the error
code instead.
I'm not sure I understand what problem this solves. Why is returning a
short read wrong here?
It will oops since the mid has already been freed and the caller uses
the mid (cifs_readv_receive calls dequeue_mid with the mid).
rdata->read_into_pages succeeded with a short read so length > 0 but
the mid was freed due to the reconnect
--
Thanks,

Steve
--
Thanks,

Steve
Pavel Shilovsky
2014-06-27 16:06:48 UTC
Permalink
Post by Jeff Layton
On Fri, 27 Jun 2014 13:57:38 +0400
Post by Pavel Shilovsky
If we get into read_into_pages() from cifs_readv_receive() and then
loose a network, we issue cifs_reconnect that moves all mids to
a private list and issue their callbacks. The callback of the async
read request sets a mid to retry, frees it and wakes up a process
that waits on the rdata completion.
After the connection is established we return from read_into_pages()
with a short read, use the mid that was freed before and try to read
the remaining data from the a newly created socket. Both actions are
not what we want to do. In reconnect cases (-EAGAIN) we should not
mask off the error with a short read but should return the error
code instead.
I'm not sure I understand what problem this solves. Why is returning a
short read wrong here?
This patch solves several problems in cifs_readv_receive() when
read_into_pages() call ended up with cifs_reconnect inside it:

1) prevents use-after-free and a possible double free for "mid"
variable; this can cause kernel oopses.

2) prevents updating rdata->bytes with a short read length that causes
the cifs_async_readv() caller to retry write request with a shorten
wrong length; this can cause data coherency problems.

3) prevents reading an actual data from a newly created socket through
cifs_readv_discard() -- this data can be a part of a negotiate
response received from the server; this can cause the client to retry
negotiate or hang.

Oopses from the 1st scenario was not observed since
cifs_readv_discard() returns negative value while trying to get a
remaining data. The 2nd and 3rd one were caught during testing: issue
reading with "dd", then switch off network on the server, wait 120 sec
and switch on network. The bug is caught when we are lucky enough to
stop the link while the client is in read_into_pages().

The patch fixes all these problems for me. It stays on the following
logic if a reconnect has happened:
1) all mids are already marked to retry and their callbacks are issued
- we shouldn't do anything with our mid/rdata since the requests will
be retried anyway.
2) the socket is newly created - we shouldn't try to get any remaining
data of the previous connection.

While -EAGAIN error code will be able to indicate something else in
possible future patches I prefer to leave it that way since the patch
is small and easy to apply that is critical for stable series. More
accurate way is to add a flag that is passed through
cifs_readv_from_socket() and switched on if cifs_reconnect() was
issued (like Steve suggested).
--
Best regards,
Pavel Shilovsky.
Jeff Layton
2014-07-03 10:45:49 UTC
Permalink
On Fri, 27 Jun 2014 20:06:48 +0400
Post by Pavel Shilovsky
Post by Jeff Layton
On Fri, 27 Jun 2014 13:57:38 +0400
Post by Pavel Shilovsky
If we get into read_into_pages() from cifs_readv_receive() and then
loose a network, we issue cifs_reconnect that moves all mids to
a private list and issue their callbacks. The callback of the async
read request sets a mid to retry, frees it and wakes up a process
that waits on the rdata completion.
After the connection is established we return from read_into_pages()
with a short read, use the mid that was freed before and try to read
the remaining data from the a newly created socket. Both actions are
not what we want to do. In reconnect cases (-EAGAIN) we should not
mask off the error with a short read but should return the error
code instead.
I'm not sure I understand what problem this solves. Why is returning a
short read wrong here?
This patch solves several problems in cifs_readv_receive() when
1) prevents use-after-free and a possible double free for "mid"
variable; this can cause kernel oopses.
2) prevents updating rdata->bytes with a short read length that causes
the cifs_async_readv() caller to retry write request with a shorten
wrong length; this can cause data coherency problems.
3) prevents reading an actual data from a newly created socket through
cifs_readv_discard() -- this data can be a part of a negotiate
response received from the server; this can cause the client to retry
negotiate or hang.
Oopses from the 1st scenario was not observed since
cifs_readv_discard() returns negative value while trying to get a
remaining data. The 2nd and 3rd one were caught during testing: issue
reading with "dd", then switch off network on the server, wait 120 sec
and switch on network. The bug is caught when we are lucky enough to
stop the link while the client is in read_into_pages().
The patch fixes all these problems for me. It stays on the following
1) all mids are already marked to retry and their callbacks are issued
- we shouldn't do anything with our mid/rdata since the requests will
be retried anyway.
2) the socket is newly created - we shouldn't try to get any remaining
data of the previous connection.
While -EAGAIN error code will be able to indicate something else in
possible future patches I prefer to leave it that way since the patch
is small and easy to apply that is critical for stable series. More
accurate way is to add a flag that is passed through
cifs_readv_from_socket() and switched on if cifs_reconnect() was
issued (like Steve suggested).
Hmm, ok. I think the best fix would be to use a different error code
than -EAGAIN for the case where we have or are going to reconnect the
socket.

My main concern here is that you may end up doing 99% of a valid read
into the set of pages, only to get a disconnect right at the end. If
you eventually end up returning a non-retryable error back to userland,
it may not expect that the pages it passed in were written to. If you
don't return a short read in that situation then you're potentially not
communicating what actually happened to userland.
--
Jeff Layton <jlayton-***@public.gmane.org>
Pavel Shilovsky
2014-07-03 16:12:57 UTC
Permalink
Post by Jeff Layton
On Fri, 27 Jun 2014 20:06:48 +0400
Post by Pavel Shilovsky
Post by Jeff Layton
On Fri, 27 Jun 2014 13:57:38 +0400
Post by Pavel Shilovsky
If we get into read_into_pages() from cifs_readv_receive() and then
loose a network, we issue cifs_reconnect that moves all mids to
a private list and issue their callbacks. The callback of the async
read request sets a mid to retry, frees it and wakes up a process
that waits on the rdata completion.
After the connection is established we return from read_into_pages()
with a short read, use the mid that was freed before and try to read
the remaining data from the a newly created socket. Both actions are
not what we want to do. In reconnect cases (-EAGAIN) we should not
mask off the error with a short read but should return the error
code instead.
I'm not sure I understand what problem this solves. Why is returning a
short read wrong here?
This patch solves several problems in cifs_readv_receive() when
1) prevents use-after-free and a possible double free for "mid"
variable; this can cause kernel oopses.
2) prevents updating rdata->bytes with a short read length that causes
the cifs_async_readv() caller to retry write request with a shorten
wrong length; this can cause data coherency problems.
3) prevents reading an actual data from a newly created socket through
cifs_readv_discard() -- this data can be a part of a negotiate
response received from the server; this can cause the client to retry
negotiate or hang.
Oopses from the 1st scenario was not observed since
cifs_readv_discard() returns negative value while trying to get a
remaining data. The 2nd and 3rd one were caught during testing: issue
reading with "dd", then switch off network on the server, wait 120 sec
and switch on network. The bug is caught when we are lucky enough to
stop the link while the client is in read_into_pages().
The patch fixes all these problems for me. It stays on the following
1) all mids are already marked to retry and their callbacks are issued
- we shouldn't do anything with our mid/rdata since the requests will
be retried anyway.
2) the socket is newly created - we shouldn't try to get any remaining
data of the previous connection.
While -EAGAIN error code will be able to indicate something else in
possible future patches I prefer to leave it that way since the patch
is small and easy to apply that is critical for stable series. More
accurate way is to add a flag that is passed through
cifs_readv_from_socket() and switched on if cifs_reconnect() was
issued (like Steve suggested).
Hmm, ok. I think the best fix would be to use a different error code
than -EAGAIN for the case where we have or are going to reconnect the
socket.
My main concern here is that you may end up doing 99% of a valid read
into the set of pages, only to get a disconnect right at the end. If
you eventually end up returning a non-retryable error back to userland,
it may not expect that the pages it passed in were written to. If you
don't return a short read in that situation then you're potentially not
communicating what actually happened to userland.
In that case we should prevent cifs_reconnect from removing the MID we
are actually using in cifs_readv_receive() from the mid list. For
example by calling dequeue_mid() before read_into_pages() and then
returning a short read if reconnect happened?
--
Best regards,
Pavel Shilovsky.
Jeff Layton
2014-07-03 16:39:30 UTC
Permalink
On Thu, 3 Jul 2014 06:45:49 -0400
Post by Jeff Layton
On Fri, 27 Jun 2014 20:06:48 +0400
Post by Pavel Shilovsky
Post by Jeff Layton
On Fri, 27 Jun 2014 13:57:38 +0400
Post by Pavel Shilovsky
If we get into read_into_pages() from cifs_readv_receive() and then
loose a network, we issue cifs_reconnect that moves all mids to
a private list and issue their callbacks. The callback of the async
read request sets a mid to retry, frees it and wakes up a process
that waits on the rdata completion.
After the connection is established we return from read_into_pages()
with a short read, use the mid that was freed before and try to read
the remaining data from the a newly created socket. Both actions are
not what we want to do. In reconnect cases (-EAGAIN) we should not
mask off the error with a short read but should return the error
code instead.
I'm not sure I understand what problem this solves. Why is returning a
short read wrong here?
This patch solves several problems in cifs_readv_receive() when
1) prevents use-after-free and a possible double free for "mid"
variable; this can cause kernel oopses.
2) prevents updating rdata->bytes with a short read length that causes
the cifs_async_readv() caller to retry write request with a shorten
wrong length; this can cause data coherency problems.
3) prevents reading an actual data from a newly created socket through
cifs_readv_discard() -- this data can be a part of a negotiate
response received from the server; this can cause the client to retry
negotiate or hang.
Oopses from the 1st scenario was not observed since
cifs_readv_discard() returns negative value while trying to get a
remaining data. The 2nd and 3rd one were caught during testing: issue
reading with "dd", then switch off network on the server, wait 120 sec
and switch on network. The bug is caught when we are lucky enough to
stop the link while the client is in read_into_pages().
The patch fixes all these problems for me. It stays on the following
1) all mids are already marked to retry and their callbacks are issued
- we shouldn't do anything with our mid/rdata since the requests will
be retried anyway.
2) the socket is newly created - we shouldn't try to get any remaining
data of the previous connection.
While -EAGAIN error code will be able to indicate something else in
possible future patches I prefer to leave it that way since the patch
is small and easy to apply that is critical for stable series. More
accurate way is to add a flag that is passed through
cifs_readv_from_socket() and switched on if cifs_reconnect() was
issued (like Steve suggested).
Hmm, ok. I think the best fix would be to use a different error code
than -EAGAIN for the case where we have or are going to reconnect the
socket.
My main concern here is that you may end up doing 99% of a valid read
into the set of pages, only to get a disconnect right at the end. If
you eventually end up returning a non-retryable error back to userland,
it may not expect that the pages it passed in were written to. If you
don't return a short read in that situation then you're potentially not
communicating what actually happened to userland.
After chatting with Pavel, I'll go ahead and Ack this patch. I think
it's probably the lesser evil...

It does seem to me though it seems to me that if you got part of a read
request and then hit an error, then it would be:

a) more efficient to not reissue the reads that were successful

...and...

b) more correct to go ahead and return a short read. There's no
guarantee that you'll be able to reach the server in order to do the
reconnect, so it seems like it would be better to go ahead and return
what we have than keep retrying in the kernel.

So...with that in mind...

Acked-by: Jeff Layton <jlayton-***@public.gmane.org>
Pavel Shilovsky
2014-07-21 15:41:58 UTC
Permalink
Post by Jeff Layton
After chatting with Pavel, I'll go ahead and Ack this patch. I think
it's probably the lesser evil...
It does seem to me though it seems to me that if you got part of a read
a) more efficient to not reissue the reads that were successful
...and...
b) more correct to go ahead and return a short read. There's no
guarantee that you'll be able to reach the server in order to do the
reconnect, so it seems like it would be better to go ahead and return
what we have than keep retrying in the kernel.
So...with that in mind...
I think the patch [01/16] of the series should go to stable since it
fixes the existing problem. Thoughts?
--
Best regards,
Pavel Shilovsky.
Loading...